Summary

  • Specialized LLMs like StarCoder2 offer efficiency and performance for specific tasks without the bulk of general tools.
  • Smaller models, like Vicuna-7B, are becoming more popular as they are easier to deploy and consume fewer resources.
  • The future of AI leans towards precise, specialized LLMs, like those focused on coding.

Large Language Models (LLMs) are powerful tools, and ChatGPT, Microsoft Copilot, and Google Gemini consistently manage to blow me away. Their capabilities are extensive, but they're not without their faults. Hallucinations are a big problem with LLMs like these, though companies are aware of them and try to stamp them out whereveer possible. However, I don't think these models are the future of LLMs. I think the future of AI are the smaller, specialized models, rather than the general purpose tools that these are.

👁 ChatGPT, Copilot, and Gemini logo on a background with a weave
ChatGPT vs Microsoft Copilot vs Google Gemini: What are the differences?

If you've been trying to figure out which generative AI tool is better, you've come to the right place

Specialized LLMs have fewer hardware requirements

Smaller models with fewer parameters

Imagine that you're a business, and you want to deploy an internal LLM that can help your developers with coding. You could pay for the full breadth of GPT-4 Turbo, with the costs that are incurred for every transaction... or, you could employ Nvidia, Hugging Face, and ServiceNow's StarCoder2 LLM. It's significantly smaller at just 15 billion parameters, it's free to use (aside from the costs incurred running it locally), and it performs very well on coding tasks.

Taking things a step further, there are other coding LLMs that are specialized just for coding that you can use, too. They may not be fully capable of all that GPT-4 can do just yet, but work is continuously growing in this area, and with these models being so small, there's a lot of good that can be achieved with them. When it comes to the extra small models with 7 billion parameters (or even fewer), then there are even more options.

As an example, while not quite a specialized use model, Vicuna-7B is a model that you can actually run on an Android smartphone if it has enough RAM. Smaller models are more portable, and if they're focused on a single subject, can still be trained to be better than bigger, more versatile LLMs like ChatGPT, Microsoft Copilot, or Google's Gemini.

Less expensive to train a larger model

Easier for companies to build their own

The other benefit of smaller models is that there are far fewer requirements and cost prohibitions for bigger companies looking to build their own language model. With a smaller, hyper-focused dataset on a handful of topics, there's a significantly lower barrier to entry. To take things a step further, Retrieval-Augmented Generation (RAG), like Nvidia's Chat with RTX, allows for the deployment of a smaller language model that doesn't even need to be trained on any particular data. Instead, it can just pull answers from documentation and even tell the user which exact document it found the answer in, just so the user can verify the answer is correct.

As such, while the likes of ChatGPT and others have their place, it's unlikely that those models are the future of where AI will really take us. They're general purpose, but if we're looking to use LLMs as a tool, then those tools need to be experts at the things they're trained to do. GPT-4 will never be an expert in everything, but a language model built for coding may well be. On top of that, you don't need something as powerful as GPT-4 either for a lot of tasks, and it's cheaper and less intensive to use something far simpler.

As an example, imagine an LLM that was used to manage a smart home. Why does that language model need to have parameters filled with information about programming? Something like that, if it were deployed in someone's home, can well be trained on a much smaller dataset with parameters that are actually relevant. It can become a master of smart home management, without wasting precious resources on building networks internally for topics that aren't relevant.

The future of AI is specialized

General purpose LLMs still have their place

All in all, general purpose LLMs will have their place, but the future of hard-hitting AI is truly in the smaller, specialized space. We already have smaller language models like Vicuna-7B capable of running on devices that fit in our pockets. A 7 billion parameter model is capable of a lot when specialized to one particular usage, and that's exactly where I believe the industry is headed. StarCoder2 is an example of that, and with RAG starting to take off too, I suspect we'll see fewer heavy models, and a lot more smaller but precise models instead.

If you want to try out some of those smaller LLMs, you can using tools like LM Studio and a powerful GPU. It's not that difficult to run so long as you have a lot of vRAM, and there are plenty specialized models for all kinds of uses that you can give a try. There's something for everyone, and once you've tried them out, you'll understand why the future of AI is going to be these models that anyone can run, anywhere at any time.

👁 LM Studio home page, showing the download buttons and information about it
Run local LLMs with ease on Mac and Windows thanks to LM Studio

If you want to run LLMs on your PC or laptop, it's never been easier to do thanks to the free and powerful LM Studio. Here's how to use it