Voozh

Most of the worries I had before going local turned out to be smaller than expected. The quality of local LLMs hold up better than reviews led me to believe, and performance on 8GB of VRAM has been workable for the kind of stuff I use it for (which is not anything code-related). There's one thing that does feel different though, and that's persistent context. Every session starts fresh and whatever you told the model last time is gone.

I actually don't mind that as much as I thought I would. There's actually something nice about an AI that doesn't track every detail about you across months of conversations. For one-off questions, the blank slate is fine and even preferable because nothing can "muddy" the response. But for ongoing projects with their own shape and constraints, restating the same context every session gets annoying pretty quickly. I'd been working around it with a manual context journal in LM Studio, which holds up but does require maintenance.

Then I added AnythingLLM to my stack, and the re-explaining mostly stopped…

Want to stay in the loop with the latest in AI? The XDA AI Insider newsletter drops weekly with deep dives, tool recommendations, and hands-on coverage you won't find anywhere else on the site. Subscribe by modifying your newsletter preferences!

AnythingLLM is more than just a barebones chat interface

The frontend my local stack was missing

AnythingLLM is a free and open source app from Mintplex Labs. The simplest way to describe it is a frontend for your model runner that handles everything the runner doesn't bother with. The model still runs where you want it to (in my case, either LM Studio, Jan AI, or llama.cpp), and then AnythingLLM just wraps a fuller chat interface around it.

Why use AnythingLLM if you're either going to use cloud AI or already have a local setup anyway? For me, it's having one interface for everything, local or cloud (it connects to cloud APIs too). The workspace management and persistent memory features work regardless of which model is doing the actual thinking. For local stacks especially, this is where it pays off since the runner alone ships with next to none of the stuff AnythingLLM offers.

The feature that actually got me to install AnythingLLM is its memory system and the fact that it works automatically. Every few hours, a background process runs through your recent chats and pulls out useful facts about you or your work, saving them as memories that get re-injected into future conversations. Manual memory entries are also possible if you want to be more precise. And there are two scopes: Workspace memories stay tied to a single project, while Global memories apply across every workspace. The only requirement, as per AnythingLLM's documentation, is that you need to have at least five chats going with enough information it can save.

Compared to my LM Studio journal setup, this is much closer to what cloud AI does. I don't have to curate and keep the journal updated myself, which is a text document, and can just let the system keep itself updated. My journal still has its place for hard rules, but for the long-term context of who I am and what I'm working on, AnythingLLM's memory is a much better home for it.

👁 qwen in lm studio on desktop pc

I ran Gemma 4 and Qwen 3.5 for the same local tasks, and one pulled miles ahead

Pitting them against each other to find the best one for my workflow

By Nolen Jonker

Pointing AnythingLLM at my local AI

It's just one settings tab

I got started by connecting it to LM Studio since that's still my favorite one to spin up because of how quick and easy it is, but the basic setup applies to any supported runner. AnythingLLM has support for plenty of other local runners too, plus the option to connect to cloud APIs like Anthropic or OpenAI if you want to mix in cloud models. But I kept it local.

The setup itself lives under Settings > AI Providers > LLM, then pick your runner or provider from the dropdown. If you're using LM Studio like I am, AnythingLLM auto-populates the base URL with http://localhost:1234/v1, which is the default endpoint LM Studio's server runs on. The Selected Model dropdown then lists whatever you have loaded in LM Studio, so just pick the one you want.

A few caveats though. Your runner actually has to be running with your model loaded for any of this to work, since AnythingLLM is just the frontend talking to its API. The Model Context Window can stay on Automatically Managed. As for what carries over from LM Studio - anything you've set at the model load level (context length, GPU offload) still applies. But the contents of any LM Studio preset, like system prompt and sampling parameters, get bypassed, and you'll need to configure those on the AnythingLLM side instead.

👁 gemma 4 on llama.cpp on desktop pc, clock and lamp in view

I finally found an open-source local LLM that actually competes with cloud AI

Open-source is catching up

By Nolen Jonker

Memory in real use

What I tested and kept

The memory feature lives inside any workspace, not in the global app settings. You'll find it by hitting the slider icon at the top of any chat window, then picking Memories from the dropdown. The sidebar slides in with two toggles at the top and both will be off by default. I'd recommend keeping both on, since Personalization is what actually injects memories into your chats while Automatic Memories handles the extraction in the background.

Deals

Save on AI tools and subscriptions — deals on software

Explore discounts on AI software, model runners, and subscriptions to lower the cost of local LLM workflows. Check deals on developer tools, cloud credits, plugins, and companion accessories to save while building a reliable AI setup.

Deals Explore Software, AI & Subscriptions Deals

Memories are single-sentence facts that get appended to the system prompt at chat time. I added a handful manually to test it at first, both workspace and global, and the test itself was simple. In a fresh thread I asked the model a question that required knowing some of those memories without restating them, and the response referenced all of them accurately. The model's reasoning trace even quoted the "Things I Remember About You" section by name.

Workspace memories handle the obvious project-specific stuff. But where I've gotten more value is globally, where I've been using memories to make my local LLM feel a bit more like cloud AI. Paste an older chatbot conversation into your local model, and the model can describe your communication style well enough that you can save the description as a global memory. The local model then starts responding more in line with how the cloud one does. In fact, if you're switching over from something like ChatGPT or Claude, you can even drop in your full personalization or memory log.

👁 claude free on desktop

I downgraded to free Claude for a week, and it changed how I feel about paying $20 a month

The usage cap wasn't the biggest issue like I expected it to be

By Nolen Jonker

Memory alone justified the install

AnythingLLM solves a real problem for local LLM users, and the memory feature is the closest thing to cloud-AI memory I've used locally. There's a lot more to this tool that I've barely touched, but the memory side alone makes it worth keeping in my stack.

AnythingLLM

See at AnythingLLM

URL: https://www.xda-developers.com/added-open-source-tool-to-local-ai-stack-local-llm-finally-has-persistent-memory/

⇱ I added this open-source tool to my local AI stack, and my local LLM finally has persistent memory

AnythingLLM is more than just a barebones chat interface

The frontend my local stack was missing

I ran Gemma 4 and Qwen 3.5 for the same local tasks, and one pulled miles ahead

Pointing AnythingLLM at my local AI

It's just one settings tab

I finally found an open-source local LLM that actually competes with cloud AI

Memory in real use

What I tested and kept

Save on AI tools and subscriptions — deals on software

I downgraded to free Claude for a week, and it changed how I feel about paying $20 a month

Memory alone justified the install

AnythingLLM