I've been running local LLMs for several months now, and I'll be honest, it's more capable than I expected, but also more hands-on than I wanted. There is a performance drop compared to cloud AI, however, that's surprisingly manageable - the other thing is the interface and infrastructure. Local models don't ship with a polished chat app; you run them through a runner like LM Studio or llama.cpp, and even the ones with GUIs are a different experience from something like Claude or Gemini. The controls are more exposed and can be a little overwhelming, but at the same time it also lacks a lot of workspace features and functions.
Persistent context is probably the biggest one. For example, Claude has Projects and Gemini has Gems, which load your background info and instructions into every chat automatically so you never have to re-explain yourself. Local runners don't have a direct equivalent- every session starts blank so you have to re-brief the model from scratch every time. But I've found a workaround, through context journals.
Want to stay in the loop with the latest in AI? The XDA AI Insider newsletter drops weekly with deep dives, tool recommendations, and hands-on coverage you won't find anywhere else on the site. Subscribe by modifying your newsletter.
I replaced Claude Pro with a local 9B model for a week, and finally found out what I was paying $20 a month for
The gap was smaller than I expected
What is a "journal" in a local LLM?
No, it's not my personal diary
A journal in this context has nothing to do with writing down your thoughts. It's a reference document you build and give to the model so it stops starting from scratch every time you open a chat. Think of it as the answer to the question your model would ask if it could: who am I talking to, what do they need, how should I respond?
It can hold whatever the model keeps getting wrong or keeps needing explained. Name, occupation, how you want responses formatted, corrections you've had to make more than once, tools you use, things you explicitly don't want. Basically, if you've had to tell the model the same thing three times already, it belongs in that journal.
The model doesn't "remember" any of it the way cloud AI tools do; it just has access to the information at the start of each session, either through a system prompt or a document upload, and acts accordingly.
I finally found an open-source local LLM that actually competes with cloud AI
Open-source is catching up
How I get persistent context into my runner
There are two simple ways
LM Studio is my runner of choice, so that's what I'm working with here - but the core idea translates to other GUI-based runners like Ollama, even if the steps look a little different.
The most immediate method is presets. In LM Studio, a preset bundles your system prompt and parameters into a named config you can load into any chat instantly. The system prompt is the first thing the model reads before you say anything, so it sets the tone, rules, and context for the whole session. You can also bind a preset to a specific model so it loads automatically every time that model opens.
I recommend keeping a system prompt lean though, because they eat into your context window. So this is the place for preferences and hard rules, not paragraphs of background. How you want responses formatted, your occupation, your tools, etc. Once it gets too long it starts competing with your actual conversation window and recall can degrade noticeably.
The second and more useful method is document uploads, which is better for longer and richer journals to live. LM Studio's built-in document support lets you attach a file to a chat, and with the rag-v1 plugin enabled it chunks and embeds the document and retrieves relevant sections as you go - so you're not dumping the whole thing into the prompt at once. However, if the total token count of the doc fits the context window, the whole thing will just get added to the prompt. This document is where detailed background goes. Project history, past decisions, domain context, anything the system prompt doesn't have room for. This is where you actually build the journal to ensure your model stops making the same mistakes.
The downside to documents is that it's per-session. So if you want to scale beyond a single file - say your journal lives across multiple documents in a folder - the Big RAG plugin can index entire directories and handle that at scale. For one or two focused files though, the built-in upload is all you need. The two methods work best together anyway: preset system prompt for your always-on preferences, document upload for the deeper context.
Building the context journal
Correcting and informing the model
For me, the system prompt has one rule: if I have to scroll it, it's too long. LM Studio's own guidance is to treat the system prompt as purposeful and task-matched rather than exhaustive - think of it as behavioral rules and not a biography. Here's an example of one of my system prompts for Qwen 3.5 9B:
You are a helpful assistant. I am a hobbyist designer and tech enthusiast on Windows. Respond in plain prose, no bullet points unless I ask. Be direct and skip the preamble. Flag uncertainty clearly rather than hedging throughout. If I ask about health or finances, answer directly and note once if professional advice is warranted - don't repeat the disclaimer.
The text document is where the actual journal lives though, and it can get as long as you need it to - what matters more is what you add to it. I recommend writing it in Markdown so you can structure properly with headers, and if you have multiple docs to keep them all in one local folder you can find easily. A division like Background, Current Projects, Corrections, and Domain Context covers most of it.
Corrections is probably the most valuable section over time because unlike other sections, it compounds. Every time the model gives you a bad answer because it didn't know something, or confidently gets something wrong about your situation, or defaults to a format you hate, that goes in here. "I am not a developer, do not default to technical explanations". "I have already decided on X, do not keep suggesting alternatives". "When I ask about Y, assume Z". The model doesn't learn between sessions, but the journal can so long as you keep it updated.
There's a version of this that goes further, and it's closer to what a journal actually is. LM Studio has a conversation export function - you can save entire sessions as text files. If a session went well or produced something you'd want the model to remember the shape of, export it and add a short note at the top: what the session was, what worked, what didn't, and so on. A bad session is just as useful - "on this date I asked about X, it went sideways because the model assumed Y, and I corrected by telling it Z".
After some time, you end up building a record of how the model behaves with you specifically, what prompting approaches land, and where it tends to go wrong. I started feeding this back into my journals as dedicated log sections, which means my models aren't just reading rules but also evidence.
To demonstrate how effective document uploads are, I added some very random instructions to a document, "speak like a retired Victorian-era sea captain who now works as a part-time astrologer", and the model completely obeyed with an equally unhinged response.
I finally found a local LLM I want to use every day (and it's not for coding)
Local AI that actually fits into my day
Stopping reexplaining myself
None of this is true memory like with cloud AI tools - it's literally just a document with some text. But what you write in it can make all the difference. It can steer the conversation in the exact direction you need, instruct the model to take on a specific tone or avoid specific thought processes, whatever you need it to do to avoid the same mistakes.
