VOOZH about

URL: https://zapier.com/blog/open-source-ai/

⇱ What is open source AI? | Zapier


App tips

5 min read

What is open source AI?

By Harry Guinness · September 3, 2024

Despite the name, OpenAI doesn't make open AI models: the various and models are all proprietary or closed source. And ? Regardless of how many times , it isn't open source—though it is open, unlike . Welcome to the strange new world of AI definitions.

Broadly speaking, there are three major categories of AI models: 

  • Proprietary

  • Open source

  • Open

These categories apply to both and . Things are still shaking out, and the is currently developing a strict definition for what's required for an AI model to truly be considered open source, but let's look at how it all stands now.

Table of contents:

What is open source?

Before looking at open source AI models, let's step back and consider . It isn't some random buzzword: the Open Source Initiative (OSI) that fully describes the underlying philosophy and requirements. I could reproduce it here because, of course, it's released under a , but here's the gist.

Open source doesn't just mean that you can freely download something or access the source code. It must be available for anyone to use and modify in any way they like and for any purpose. An open source license specifically cannot restrict any "field of endeavor," which is where a lot of open AI models fall short. 

The OSI maintains a , but some of the big ones are the , the , and the .

What is a proprietary AI model?

Proprietary AI models are some of the most popular and powerful models available. They're developed and operated by private companies, and the source code, training strategies, model weights, and even details like the number of parameters they have are all typically kept secret. The only way to access a proprietary model is through some kind of official service like a , an , or a tool built using the API. 

Take OpenAI's . We have no idea what data it was trained on or how many parameters it has. The only way to access it is through , , or an app that uses GPT-4o, like or .

And, of course, OpenAI charges for access to GPT-4o. If you want to use it—and it is one of the best AI models available—then you can pay $20/month for , or pay to use the API, either by subscribing to another service or building something yourself. You can't just download GPT-4o and run it on your own server.

The same is true for all other proprietary AI models, including:

  • GPT-4o mini and DALL·E 3 from OpenAI

  • from Anthropic

  • and Imagen 3 from Google

  • Command R and R+ from Cohere

What is open source AI?

Open source AI models are AI models released under an open source license, but that isn't necessarily as simple as it sounds. Researchers have found that . The process is called "," and it seriously complicates things…including for people who write about AI models.

This chart shows how "open" a number of AI models are. You'll see Llama 3.1 somewhere near the middle and ChatGPT way down at the bottom. (Source: Opening up ChatGPT)

Right now, the Open Source Initiative is because existing licenses don't really cover the full technicalities of the current generation of AI models. To really fulfill the requirements and philosophy of open source software, not only would a model's source code need to be freely available, but so would . The software parts would need to be shared under open source licenses, while things like training data and descriptions of how it works would need to be shared under Creative Commons licenses—or similarly open ones.

Also, it's hard to overstate how permissive open source licenses are. The strictest licenses essentially require you to make anything you build with it open source as well—and give attribution to the original developers. That's it. If you want to build a multi-billion dollar company off open source software or create a crime chatbot that tells people how to get away with heists, you're absolutely free to do so. The police might have some issues with the latter project, but you wouldn't be breaking any software licenses. 

You can see and on the OSI website. As I write this, it's on version 0.0.9.

What is an open AI model?

Open models fill the gap between closed proprietary AI models and the platonic ideal of truly open source AI models. (Until the OSI releases their definition, the closest model I could find to that ideal is .)

In simple terms, open AI models are freely available in some capacity. Typically, you can download them from and other model platforms and run them on your own devices after agreeing to whatever license terms are offered. You can generally re-train them with your own data to create your own model and build your own chatbots and apps on top of them. In most cases, you can dig deep into things like the model weights and system architecture to understand how they work (as best as anyone can).

Open licenses can still be permissive, but they have some additional limits that an open source model wouldn't. For example, Llama 3's license allows commercial use and . You or I could build something with it, but Apple and Google can't. Similarly, , among other things, bans "facilitating or encouraging users to commit any type of crimes." Understandably, Google doesn't want to see unsavory chatbots "powered by Google Gemma" plastered all over the news.

These restrictions, while understandable, are at odds with the open source philosophy, so you can see why things can get contentious. Various researchers are working on based on how open they are to make things a lot clearer. If any of these becomes mainstream, you can be sure we'll let you know.

The best open and open source AI models

Here's a list of all the open and open source models worth knowing about right now. Where these fall on the scale of open source to open is up for debate until we have a better definition.

AI model

Developer

Model type

License

Parameters

Notes

Meta

LLM

Custom

8B, 70B, 405B

Restricted uses and user numbers

Google

LLM

Custom

2B, 9B, 27B

Restricted users

Microsoft

LLM

MIT

3.8B, 7B, 14B

Mistral

LLM

Apache 2.0

8x7B

Mistral

LLM

Apache 2.0

7B

Databricks/Mosaic

LLM

Custom

36B equivalent

Mixture of Experts, so parameter count is complicated

Allen Institute for AI

LLM

Apache 2.0

7B

Most open source AI model I could find

Black Forest Labs

Image generator

Custom

N/A

Non-commercial use

Black Forest Labs

Image generator

Apache 2.0

N/A

Stability AI

Image generator

Custom

N/A

Prior versions of Stable Diffusion, including 1.5, 2.1, and SDXL, are available under open licenses

Should you use an open or open source AI model?

While there aren't as many top-tier open source AI models as I'd like, the best open models are incredibly competitive with proprietary alternatives. For example, Llama 3 405B and give GPT-4o and DALL·E 3 a serious run for their money. If you have the technical chops to employ an open model, you can get much the same performance at a fraction of the cost and with a lot more freedom. 

Related reading:

Get productivity tips delivered straight to your inbox

We’ll email you 1-3 times per week—and never share your information.

👁 Harry Guinness picture

Harry Guinness

Harry Guinness has been covering AI and security (separately and together) for well over a decade. He's a writer and photographer from Dublin, Ireland whose writing has appeared in the The New York Times, Wired, Popular Science, and Inc. His photos have been published on hundreds of sites—mostly without his permission.

Related articles