Model Context Protocol has been an incrasingly more important piece of technology in the large language model (LLM) space, and with good reason. MCP essentially acts as a translator between a language model and the digital world, abstracting data from a particular service and presenting it in a way that a language model can access as a "tool" or platform. What this means is that, instead of manually pasting snippets from your browser, music player, or notes app, a model can choose to collect that information if required. There's one MCP server that I've been using the most, though, and that's SearXNG.
SearXNG is a self-hosted metasearch search engine, meaning that it combines the results of multiple search engines in order to surface the results to you in a particular order. You control what search engines that it uses, what data it looks for, and you can even install and set up plugins that will filter or modify URLs that were collected to go through a privacy-preserving proxy. However, there's also a fantastic MCP server that can present itself as a search tool to your local LLM, giving it one of the most powerful tools normally only available to cloud-based alternatives: search.
Why is search so important for a local LLM?
It bypasses a few problems
Search is one of the most important tools of any language model these days, but particularly so in the case of a local language model. When a model is trained, it uses a "knowledge cut-off," meaning that the corpus of data that it used to build its knowledge only goes to a certain point. If an album from your favorite artist just came out, and you want to ask about it, a model without search capabilities will not even know that it exists and will either give a bogus answer, or say that it has no knowledge about it. Even the biggest and most powerful language models suffer from these cut-off dates, with GPT-5's cut-off being in October 2024 and Gemini 2.5 Pro's cut-off being in January 2025. That's why both of these models can search the web to fill those knowledge gaps, even if it requires you to tell them to search the web sometimes.
A local language model, by default, doesn't have that capability. Not only that, given their small size comparatively, they often lack a lot of information that might normally be in the training data of significantly larger models, so even asking about an album from your favorite artist (that may have been released a few years ago) could still result in completely incorrect information. This problem becomes even more apparent when using a language model in a rapidly changing or growing space, such as development. Best practices in the language of your choosing could have evolved significantly since the training data of your chosen model was compiled, and it could provide incorrect or out of date guidance as a result.
As a result, search is one of the most powerful tools you can give your local language model. Not only does it allow you to find information that's current, but it can also fill in the gaps of what the language model does and doesn't know. The only thing you need to be aware of is context length: essentially, the search tool will supply the chosen page as text for the LLM to parse and pull the information out of. This uses up a lot of tokens, and a search request can burn a couple of thousand tokens in order to collect all of the information required to answer your query. The two queries above used up 6,825 tokens, which is above the 4,096 default token length that LM Studio chooses. You can increase the context window yourself, but it's something that you'll need to keep in mind.
It's important to stress that search capabilities are not a foolproof way to prevent hallucinations or incorrect information being provided. You should still always sanity check the responses that you get, but it's a quick and easy way to massively improve the quality of your responses.
Setting up SearXNG MCP
It's really quick and easy
First and foremost, you'll need to either find a SearXNG server that will allow you to use it as an MCP server, or you can host your own. It's not an intensive application at all, so you could host it using Docker on the same PC that you're running your local LLM on. I'm running it in a Proxmox LXC, but you'll need to make sure that SearXNG's JSON output format is enabled. This is in SearXNG's settings.yml file, and all you need to do is add the "json" line and restart the container. You can subvert all of that by just using a public instance of SearXNG instead, though this mitigates some of the privacy benefits of self-hosting and also requires finding an instance that has JSON enabled.
Once you have it set up and ready to go, you'll need to set up the tool in LM Studio, Ollama, or whatever provider you use to run your language models. In LM Studio, you can add the following to mcp.json, while ensuring that you use proper indentation:
{
"mcpServers": {
"searxng": {
"command": "npx",
"args": [
"-y",
"mcp-searxng"
],
"env": {
"SEARXNG_URL": "http://192.168.2.109:8888"
}
}
}
}
That's literally all you need! Once you've got it added, you can enable the SearXNG MCP server in the sidebar of LM Studio, or enable it on a per-conversation basis. From there, language models that support tool calling can invoke it and use it to find up-to-date information pulled from the web, bypassing the knowledge cut-off and ensuring that the information you get is valid and up to date. If you suspect that something went wrong, you can also click the tool call in the conversation in order to see where exactly the information was pulled from and what it pulled.
SearXNG is incredibly lightweight, to the point that it's honestly a waste not to pair it with your local LLM. It unlocks a lot more possibilities, and paired with even more tools, the privacy benefits alone make it better than any cloud-based provider for me.
