Voozh

I use AI tools almost every day now, and after way too many tools in rotation, I’ve finally settled on a handful of go-tos. A couple of cloud AI tools when I need horsepower, and my local LLM for anything more private. But I’ve never actually sat down and tested all of them on the same prompts; I didn’t see a need to since I reach for each one for different tasks.

Thing is, I know local LLMs can handle more than just basic questions - I’ve used them for work tasks, research, and brainstorming, so I know they’re capable. The question was never really “cloud vs local”, but moreso “is the cloud gap actually worth it, or am I just defaulting to it out of habit”.

To actually answer that question, I put my local LLM head-to-head with my cloud AI. I ran the same prompts through all of them to see if going fully local was even an option for me. Here’s what happened…

What I’m actually working with

My models and my prompts

I’m in the process of studying UX design, and even though I’m enrolled in a course, it still involves a lot of self-teaching. Of course, AI is my first stop for that these days, so it made sense to do this experiment with something design-related instead of just throwing generic prompts at it. I wanted to go with a topic I’m somewhat familiar with but know the least about, which is the research part of UX.

My goal was to get each model to build me a self-study curriculum. Something structured enough to actually follow, but practical enough for someone who’s mostly learning solo online. I wanted industry-standard research methods covered, ways to practice them independently, and enough depth to actually work on real projects, not just theory.

For cloud AI, I’m sticking with my top AI tool right now, Claude with Sonnet 4.6. For local, I’m going with my tried-and-true gpt-oss20b model. I honestly didn’t change my system prompt or any of my other settings in either one, because that’s how I actually use them, and a sanitized lab setup would have made the results less useful to me. Not to mention, it would have wiped the setup I already had.

Here’s the initial prompt:

Create a structured self-study course for UX research. Follow this structure exactly:
Goal: Build practical UX research knowledge from scratch, with enough depth to apply research methods to real projects independently.
Learner profile: Has design background, enrolled in UX design course, no formal UX training, studying around 2 hours per day alongside full-time job.
Duration: 3-4 weeks.
Format: Day-by-day schedule. Each day should have a clear topic, a task or exercise, and at least one free resource (article, video, tool, etc.). Keep daily tasks doable in 1 hour.
Week 1 - Foundations: What UX research is, where it fits in the design process, qualitative vs quantitative methods, and the difference between UX research and general design thinking. No methods yet, just focus on mindset and vocabulary.
Week 2 - Core methods part 1: User interviews and usability testing. For each method cover: what it is, when it's used, how to practice it solo without a team or real client, and what the output looks like.
Week 3 - Core methods part 2: Surveys, competitive analysis, and card sorting. Same format as week 2. Include at least one exercise per method that can be done independently.
Week 4 - Synthesis and application: How to analyze and present research findings, how to document the process portfolio-ready, and how to tie all methods together into a research plan. End the week with a mini capstone: a simple research plan the learner can actually execute.
Tone: Practical over academic. Industry-standard methods only. No filler.

And for my local LLM specifically, I’m adding this line: “For all recommended resources, only include ones you are confident actually exist. If unsure, describe the type of resource instead” because local LLMs can be more prone to hallucinating sources. I also have a Brave Search MCP plugin hooked up to my local LLM, so it has web access, which should help, but we'll see if it actually uses it properly.

I knew going into this that cloud would likely outperform local. But the point was to see if local could actually keep up at all, and if it’s worth ditching cloud for local at some point beyond just using it for more private topics.

Claude handled it without a hitch

It generated an interactive course for me, which was more than I expected

Claude didn’t just answer my prompt, it built an interactive module (without me asking for one) - I was actually able to download the course as an html file and use it in my browser. It was tabbed by week, each day collapsible, with a goal or summary for every week. It clearly parsed the learner profile too. The structure wasn’t even the most impressive part…

Every single day had three layers: a topic explanation, a concrete task, and named resources with links. And I mean specific - Day 9 doesn’t say “practice writing an interview guide”, it tells me to write a 2-minute intro script, add 3 warm-up questions, and write 4 follow-up probes. The resources also pointed to real places: Nielsen Norman Group, Interaction Design Foundation, Maze, etc. Week 1 was deliberately methods-free, which shows it understood the pedagogical logic of the prompt, not just the structure. And week 4 ends with a two-day capstone I could actually run.

Before moving onto my local LLM, I ran a few follow-up prompts - and I kept them deliberately neutral and broad so I could reuse the same ones on my local LLM without adjusting for Claude’s specific outputs. The goal was to test the same follow-up tasks across all models, not just the initial prompt. Here are a couple of my follow-up prompts:

Expand Week 1 of the course you just created into a full day-by-day schedule. Each day should have a topic, a concrete 1-hour task, and one free resource. Keep it beginner-friendly.

Review the course you created. Identify any tasks or exercises that would be difficult or impossible to complete solo, without a team, real users, or a client. For each one, suggest a practical workaround a self-learner could actually do.

Based on this course, what are 3-5 portfolio pieces a self-learner could realistically produce by the end? For each one describe what it is, which week it comes from, and what skills it demonstrates.

My local LLM eventually pulled through

It performed as expected at first, but pivoting my approach gave me surprising results

I kind of expected my local LLM to reach the token window, and that’s exactly what happened, especially with Brave Search MCP enabled. It only made it to week 3, and the overall results were very lackluster. So I had to pivot and cut my prompt down to Week 1 of the course at first.

This second attempt gave me much better results and my local model handled the trimmed prompts better than the full one. It completed all 7 days without cutting off, stuck to the format (topic, task, resources), didn’t repeat itself, and even added a notes section at the bottom unprompted. Structurally, it did the job.

The quality gap showed up in the details. The resource citations were vague, but that was expected as I had turned off Brave Search and Fetch at this point. And the tasks were shallower overall, for example, it snuck a full user interview into Day 4 already when this was supposed to be a methods-free week.

Then I added the follow-up prompt, and that ended up being the trick I needed. Asking it to expand Week 1 gave me more detailed results; instructions that were more practical and actionable with things like numbered steps. Turns out the trick with my local LLM wasn’t a better prompt, but a smaller one.

It wasn’t really a fair fight

Claude pulled ahead, hands down, as expected. But the question was never “which one is best”, it was whether I could actually fully switch to a local LLM for all my daily tasks, including studying. And it turns out, I could, with a caveat - the context window will always be a limiting factor, so I will have to break down long instructions into small, focused chunks.

This could be a benefit or a downside depending on how you work. If you’re someone who needs the full picture and wants an AI that gets you, it’s going to be a cloud LLM. If you like to break your work into small chunks and are fine with navigating some limits, then local could absolutely work (likely with a 12B or higher model).

URL: https://www.xda-developers.com/ran-same-prompts-through-claude-and-local-llm-unexpected-results/