Claude is typically considered the best agentic coding tool on the market right now. There are plenty of developers I know who use it daily, and Anthropic has spent the last year and a half giving plenty of reasons for that confidence. Opus 4.7 just launched last month, Claude Code continues to be one of the go-to agentic tools on the market (despite the proliferation of alternatives), and the likes of Sonnet and even Haiku are still valid models to use for different tasks.

However, the access conditions around Claude have moved in a single direction over the last year, and not the direction subscribers want. Weekly limits were announced on top of the existing five-hour windows in July 2025. The consumer terms changed in the autumn so that, for Free, Pro, and Max users, conversations are used for training by default, with retention bumping from thirty days to five years for anyone who doesn't opt out, all behind a default-on UI toggle most people never saw before the October 8 deadline. Peak-hour quotas were silently tightened, Claude Code's prompt-cache TTL was seemingly cut, and third-party harnesses were locked out of subscription billing. Each change has a defensible reason behind it, but together, they describe a relationship that honestly feels hostile towards developers.

There's one thing all of this has proven to me: the terms of your relationship with a cloud model can shift under you at any time, and the part of an AI stack you actually own is the part you can rely on. Claude isn't bad, nor am I saying everyone should go out and buy an RTX 4090 to start running a 27 billion parameter model locally. The cloud relationship itself is the liability, though, and local models have become good enough that they're now considered to be meaningful alternatives for a lot of everyday work.

Claude users have been experiencing a ton of changes

And they're all going in one direction

The way Anthropic has approached this hasn't been the usual "boil the frog" strategy we often see from SaaS products. The company has been moving fast while announcing many of its changes openly, though some of them triggered weeks of "is Claude getting nerfed?" coverage before Anthropic acknowledged the cause. Claude Code is where you see it most clearly: Anthropic changed the default reasoning effort from high to medium on March 4, introduced an idle-session thinking-clearing bug on March 26, reverted the reasoning-effort change on April 7, added a verbosity-reduction system prompt on April 16 that hurt coding quality, and reverted that on April 20. None of it was a model swap, but a stack of server-side settings users couldn't see and couldn't opt out of. Dario Amodei has said Anthropic planned for ten times growth and got eighty times, which goes some way toward explaining the squeeze.

Outside the harness itself, the squeeze took a different shape. In March 2026, Anthropic adjusted five-hour session limits during weekday peak hours so users burned through them faster, with weekly limits unchanged. The May 6 SpaceX-capacity bump removed the peak-hour reduction. Service conditions changed, then they changed back, and the change-back only arrived when capacity did. Inside Claude Code, Anthropic reduced the prompt-cache TTL from one hour to five minutes without an announcement, and the difference only became visible after users plotted their own session logs by date. Anthropic answered by saying five minutes was always the default and the one-hour cache was the experiment, which is technically defensible and rhetorically convenient at the same time. It seems this may have been reverted for some users, but that's the worst part: the most consistent communication from Anthropic comes via its employees on X, not through official announcements.

That's not all, either. On January 9, 2026, Anthropic silently deployed server-side checks that blocked Free, Pro, and Max OAuth tokens from working in any third-party harness, and OpenCode, Cursor, Cline, RooCode, alongside others others broke overnight with no announcement. xAI engineers were blocked from Claude through Cursor under Section D.4 of the Commercial Terms. Anthropic formalized the policy in a ToS update around February 17th, with a new "Authentication and credential use" section. The detection layer Anthropic built to enforce it produced the HERMES.md billing incident two months later: Claude Code pulls git status into its system prompt, and the harness-detection layer pattern-matched the contents, so a file called HERMES.md anywhere in your history was enough to flip you onto pay-as-you-go pricing. One Max user posted an overage of $200.98 on top of their $200 plan with 86% of the subscription still untouched, and engineer Thariq Shihipar acknowledged "a bug with the third-party harness detection and how we pull git status into the system prompt."

Finally, a recently announced change will see claude -p and Agent SDK usage billed as programmatic usage from June 15th. Users of Claude Pro, Max x5, and Max x20 can claim $20, $100, and $200 respectively of credit for these workflows a month, but this is yet another example of Anthropic clawing back some of its control over what end users can do with Claude Code.

What makes all of this harder to understand is how the company had generated a lot of community goodwill up until this point. In July 2025, Anthropic landed a $200M Pentagon contract that made Claude the first frontier model approved on classified networks. When the DoD then demanded "all lawful purposes" access, Anthropic refused to drop its prohibitions on mass domestic surveillance and fully autonomous weapons systems. On February 27, Defense Secretary Pete Hegseth designated the company a "supply chain risk" and federal agencies were directed to stop using Anthropic's AI. It's a principled stand, and it earned Anthropic a lot of positive coverage in relation to ethics. But only five days later, on March 4, the same company quietly dropped Claude Code's default reasoning effort from high to medium and didn't tell anyone for weeks.

To be fair, Anthropic reversed parts of this in May after striking a deal with SpaceX, signing for all compute at Colossus I, more than 300 megawatts and over 220,000 NVIDIA GPUs, and tying it directly to higher Claude limits. A May 13 announcement raised weekly Claude Code limits by 50% until July 13, apparently amid competitive pressure from OpenAI's Codex push. It doesn't change the overall direction, though. Your access depends on Anthropic's compute, Anthropic's commercial pressure from OpenAI and xAI, and Anthropic's view of fairness across the user base, none of which are yours to influence.

Local models have been growing up this whole time

They're no longer universally worse than the cloud

During the same period of time that Anthropic seemed to be tightening the squeeze on its users, the capabilities of open-weights models grew substantially. Gemma 4 landed on April 2, 2026 with a properly permissive Apache 2.0 license, four sizes (E2B, E4B, 31B, and 26B A4B), 256K context, and the 26B A4B variant running on consumer hardware with genuinely impressive speeds. Alibaba's Qwen 3.6-27B arrived the same month as a dense workhorse that fits on a single consumer 24 GB GPU with quantization. Qwen 3.6-35B-A3B, the sparse MoE sibling with only 3B active parameters per token, is fantastic as well for the size class it sits in.

For example, in one controlled reverse-engineering test, I ran Qwen 3.6-27B on a 7900 XTX and it pulled around 37 tokens per second of generation at Q4_K_M with roughly 90,000 tokens of usable context. With two planted vulnerabilities in the target, it found both, surfaced two additional ones, and outperformed GPT-5.4 on that specific run, which missed a timing oracle and hallucinated values. I'm not saying it's better in general than GPT-5.4, but a result like that was unthinkable just a year ago.

Zyphra's ZAYA1-8B, meanwhile, is the most interesting local model that I've seen yet. It's a Mixture of Experts model with 8 billion total parameters and 700 million active, licensed as Apache 2.0, and trained entirely on AMD MI300X with no Nvidia in the loop. Zyphra's published comparisons against GPT-5-High lean heavily on Markovian RSA and test-time compute rather than a normal single-pass local run, so I wouldn't read ZAYA1-8B as proof that tiny models beat frontier systems. I'd read it as proof that architecture, active-parameter efficiency, and test-time compute are making local-capable models more interesting faster than most people expected.

Deals

Score deals on PCs, GPUs, and work-setup gear

Explore discounts on workstation hardware, GPUs, storage, and networking to build or upgrade a local-model workflow. Find savings on desktops, laptops, monitors, SSDs, cooling, and accessories to boost performance without breaking the bank.

It's not all perfect for local models, and that's why the cloud still has its place. The hardware to run anything in the 27B-and-up range costs a fair bit, and both VRAM and system RAM are constant constraints made even harder to reconcile given the increasing costs of both over the past year. There's also more setup involved than just using a cloud model, planning over long horizons is weaker, and long-context reliability can be rough. Local isn't beating top-end Claude on hard reasoning, long multi-step planning, or broad world knowledge either. Paired with search capabilities and other ways to ingest live data from outside its knowledge base, though, local models are a real option to reach for rather than a hobbyist novelty like they once were.

Vendors can pull the plug at any time

But your local model is your local model

Anthropic can relax limits, and the May reversal proved it. But they can also tighten them again the moment Codex's pricing settles and the SpaceX capacity gets absorbed, and the July 13 expiration on the weekly bump tells you everything about how durable the current state is. The point isn't that any one of those decisions are wrong on their own terms, but that all of them are theirs to make, and yours to react to. Model quality, pricing, reasoning effort, cache behavior, training defaults, quotas, and availability are all controlled server-side, and the past year has shown how many of those dials a cloud vendor can turn in either direction.

A local model you've downloaded can't be retired, or throttled during peak hours, or be opt-in'd into a training pipeline. It also can't have its reasoning effort silently dropped from high to medium because the compute budget isn't there. Once you have the weights and are hosting it yourself, the access conditions are fixed for as long as you keep the model running. Anthropic's behavior over the past year and a half, including the parts they've been transparent about and the parts they've reversed, makes that contrast hard to miss.

Claude is still going to be the best tool to use for this kind of thing for a long time to come, and if you depend on Opus 4.7, I'm not saying you should cancel your subscription and switch to a local model instead. What I will say is that, more than ever, it's a service first, with limits, cache policy, reasoning settings, retention rules, and availability all sitting on a dial that Anthropic can turn. Meanwhile, local models can handle many of the day-to-day tasks that were previously only possible for the cloud to process. The gap has been continuously closing, and so long as more of these advancements keep coming, it's unlikely that's going to change.