Voozh

Over the past few months, I've been trying to replace as many of the browser extensions I use daily as possible with a local LLM. To my surprise, I've managed to replace quite a few of them while also adding functionality that those extensions never offered in the first place.

The first extension I got rid of was Grammarly. If you have ever used this extension, you know how annoying the constant pop-ups and distractions can be. Once I set up a local LLM, Grammarly became one of the easiest tools to replace. I also replaced several video summarizers, browser assistants, and research tools.

👁 Ollama in terminal

My local LLM can call Claude when it's stuck, and it changed everything about my local-first setup

Local LLMs aren't very good on their own

By Anurag Singh

Setting my local LLM up for browser use

Turning the browser into an AI client

You can set up a local LLM for use in your browser in two main ways. The first is by connecting a browser extension to a local server running on your machine. The second is to use extensions that run models directly in the browser.

I went with the first option and connected my browser to a local Ollama instance. This is the more flexible approach because it allows you to run larger and more capable models. In this setup, Ollama acts as the backend server that handles all the heavy lifting, while the browser extension serves as the user interface.

Start by installing Ollama and downloading a model. I used Qwen 3, but Llama 3.2 and Gemma 3 also work well. As a general rule, stick to models with fewer than 14 billion parameters if your system has less than 16GB of RAM. If you have a more powerful GPU with additional memory available, you can comfortably run larger models.

Once the model is running, you'll need to configure browser access through CORS. Modern browsers restrict direct communication with local servers for security reasons. To allow extensions to connect to your Ollama instance, you must configure the OLLAMA_ORIGINS environment variable.

The second approach is to use standalone browser extensions that run models entirely within the browser. These tools rely on technologies such as WebGPU and WebAssembly to execute models inside the browser sandbox without requiring any external software.

With these extensions, setup is usually as simple as selecting a model and waiting for it to download. The model weights are stored locally inside the browser, allowing them to work offline after the initial installation. NativeMind, for example, can run smaller models directly in the browser, making it easy to experiment without first setting up Ollama. Modern Chromium-based browsers are also beginning to expose built-in AI capabilities through APIs such as the Chrome Prompt API.

The extensions to replace

The great extension cleanup

Since I went with the first method, which was using a local server, I then installed a compatible browser extension and pointed it to my local Ollama endpoint. From here, you have several options depending on how you want to use the model.

Most AI browser extensions can be replaced with tools like PageAssist, which provides a sidebar interface for chatting with local models directly inside your browser. It includes support for webpage analysis, document understanding, and RAG workflows. There are also alternatives, such as Open WebUI, that offer a more polished experience and additional features.

If you're looking for something more capable, Nano Browser takes things a step further by adding agentic capabilities. Instead of simply answering questions about a webpage, it can perform actions across websites, automate repetitive tasks, and interact with web applications on your behalf.

I also wanted to see how far I could push this idea beyond browser extensions. As an experiment, I attempted to build a simple price tracker using Python, scheduled jobs, and a local model. A script periodically checked product pages, extracted pricing information, stored historical data, and sent notifications when certain conditions were met, such as a product dropping below a target price, being sold by a specific retailer, or reaching a discount threshold that I had defined in advance.

Though I couldn't replace every extension directly

Not every hill is worth climbing

One thing I realized during this experiment is that replacing an extension does not necessarily mean building a one-to-one replacement for it. For many browser extensions, key features can be recreated using a local LLM, browser automation, scripts, and workflow tools such as n8n. If you get this right, the result is actually more capable because you are no longer limited by the feature set that an extension developer decided to ship.

There are still situations where dedicated software makes more sense. Browsers already include built-in features for many common tasks, such as password managers.

You can always self-host your extension

If your goal is simply to have a more private alternative, you can always self-host mature extensions instead of building them from scratch. For example, Bitwarden is a popular option for password management, while Joplin can serve as a self-hosted note-taking solution. There are many other open-source tools that can replace common browser extensions without requiring you to reinvent functionality that already exists.

Replacing every extension with an LLM is also not very feasible. If you're running a local model and don't have a fairly powerful machine, it will add unnecessary load to your device.

👁 Accessing Open Notebook from a web UI

4 reasons Open Notebook is the best self-hosted NotebookLM alternative

No need to share your research data with Google anymore

By Ayush Pande

URL: https://www.xda-developers.com/replaced-entire-browser-extension-stack-local-llm-not-going-back/

⇱ I replaced my entire browser extension stack with one local LLM, and I'm not going back