VOOZH about

URL: https://thenewstack.io/how-should-we-define-open-ai/

⇱ How Should We Define 'Open' AI? - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-03-31 06:00:12
How Should We Define 'Open' AI?
AI / Data / Tech Culture

How Should We Define ‘Open’ AI?

The term "open" has no agreed-upon definition in the context of AI, and is applied to widely divergent offerings with little reference to a stable meaning.
Mar 31st, 2024 6:00am by David Cassel
👁 Featued image for: How Should We Define ‘Open’ AI?

“Our vision is a world where all benefit from the unfettered exchange of information,” explains the official website for the National Information Standards Organization.

Accredited by the American National Standards Institute (ANSI), it’s the nonprofit standards group for library/publishing/bibliographic applications for both “traditional” and new technologies.

But there’s one very important standard that’s still developing in the world: What exactly constitutes an open artificial intelligence, or “open AI”? It’s a question Archive.org’s deputy director Thomas Padilla tackled this year in his keynote for that year’s NISO Plus conference.

“It’s a talk about the AI,” Padilla promised his audience.

“But like most talks about technology, it ends up being to talk about us and about what we want to achieve.”

Is Llama 2 Open Source?

While the Open Source Initiative is still working on their official definition of “open source AI” — through a consultative community process — some important distinctions are already clear.

Identifying five characteristics that open AI should have, Padilla said he believes “open” AI should be reusable, with users weighing trade-offs between performance and “broader reusability and the potential for others in our community to build upon the bottom model.”

But how does that play out in the real world?

Meta’s Llama 2 declares it’s open source, and free for research and commercial use — but Padilla reviewed the license terms at its GitHub repository, “putting my librarian hat on.”

Under “Additional Commercial Terms,” it warns that if a licensee has more than 700 million active monthly users, “You must request a license from Meta, which Meta may grant to you in its sole discretion.”

And otherwise, “You are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.”

Padilla called that “some shade against the other big tech companies, for the most part” — although Red Monk co-founder Steve O’Grady is fairly clear that that’s not pure open source:

useful clarification, which makes the targets clear. you can use it unless you work at, say, google. imagine if linux was open source, unless you worked at facebook.

and that’s just one of multiple field of use restrictions, any one of which means it’s not open source. https://t.co/x2CYrKLifF

— steve o’grady (@sogrady) July 19, 2023

But Padilla thinks there’s another clause even more deserving of a “Llama side-eye.”

👁 Screenshot from Thomas Padilla keynote at NISO Plus 2024 (Llama Side Eye to Meta's licensing). the license says users "will not use the Llama Materials or any output or results... to improve any other large language model (excluding Llama 2 or derivative works thereof)."

Padilla feels that very much is “… not in the spirit of open source. It feels to me like it sort of — it explodes the whole thing. How is that open source? It’s not!”

But Padilla adds that “other models have come onto the market that do show a degree of promise, I think, for having a stronger alignment with the spirit of open source …” — citing specifically the open language model OLMo.

OLMo describes itself as a “truly open LLM and framework,” and emphasizes that “all code, weights, and intermediate checkpoints are released under the Apache 2.0 License.”

Mysterious Answers

Meta’s Llama isn’t the only game in town.

ChatGPT’s developers even named their technology OpenAI — and here, Padilla tells his audience, “I also believe OpenAI is open.”

But there’s still room for criticism. Transparency is part of Padilla’s definition of openness — it plays a role in the “integrity of knowledge”— defined as making sure authors and “knowledge producers” are credited.

Instead, today content producers are being “ghosted,” as Padilla sees it — with crediting either not existing “… or it’s vaguely referenced to. I think that’s bad.”

With much of today’s generative AI, it’s more like mysterious answers summoned from a magical seance. “You get responses like, ‘I don’t have access to how I provided you this answer.'”

Responses like that would never fly for a peer-reviewed paper, Padilla points out:

I asked Chatgpt how much of its data comes from Wikipedia: I don’t have access to my training data, but I was trained on a diverse range of data from the internet, including sources such as books, websites, and other texts to develop a wide understanding of language.

— James (Jim) McTague (@Mctaguej) December 8, 2023

There’s another issue. In February, Fast Company spoke to the head of the OLMo project, AI2 senior director of research Hanna Hajishirzi, about Meta’s models. Their conclusion? The models were very valuable, but the data behind them still wasn’t available. “We don’t understand the connections starting from the data all the way to capabilities,” the magazine noted.

And even beyond that, “The details of the training code is not available. A lot of things are still hidden.”

So another aspect of transparency is good documentation — and here, Padilla sees promise in Hugging Face‘s model cards.

When users upload a model to Hugging Face’s tools, they’re prompted to specify the model’s parameters and which datasets were used (according to Hugging Face’s definition of model cards) — as well as the model’s intended use, and its “potential limitations, including biases and ethical considerations.”

Padilla says he’d “love to see more uptake of this” — and “possibly even a requirement” to upload them to Hugging Face. “Perhaps that’s around the corner.”

“Without transparency, you have no assurance of knowledge integrity,” Padilla tells the audience. “And if you have no knowledge integrity — then why bother be involved in something like this at all? I really think it weakens the value proposition of the entire thing.”

How Should AI Be Held to Account?

In addition, open AI should also very much be accountable, Padilla believes — meaning that it’s developed (and used) according to specific community needs.

For “accountability,” Padilla applauded policy and regulation and executive orders “attempting to enshrine certain protections and safeties to guide the development and implementation of AI,” including the White House’s October “Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence.”

But Padilla also cited the real-world accountability principles espoused by Distributed AI Research Institute (founded by AI researcher Timnit Gebru), on the organization’s Research Philosophy page.

One of their aims is to reduce “the distance between researchers and community collaborators.” Another aim of theirs is to focus on trust and time, to forge “meaningful relationships with communities.” And they in particular are talking about minoritized or marginalized communities that are not often at the center of AI research, but are often on the receiving end of negative impacts of AI research that does not take into account the lived experience.

“I think DAIR provides us a nice roadmap,” Padilla said.

Sustainability

Padilla said open AI should be adopted in a “sustainable” way, with an awareness of “interdependence, threats and opportunities.”

To that end, Padilla proposes three ways of looking at it. To demonstrate what he calls the “exploded” view of interdependence, Padilla showed a 2018 data visualization exhibited at New York’s Museum of Modern Art which (according to its gallery label) “analyzes the vast networks that underpin the ‘birth, life, and death’ of a single Amazon Echo smart speaker.” (It moves from the periodic table of elements, Padilla told the audience, “up to the strip mining that needs to happen to gain the raw materials to produce a Dot, up to smelters and refiners and then favorable labor conditions, up to the component manufacturing and so forth.

“I think it’s one way to kind of get a sense of the interdependency that’s at play when we adopt, not just AI, but any technology.”

There’s also a “systems” view — Padilla cites the world-systems theory, which explores how different parts of the world get relegated to their roles in the production of what Padilla calls “highly polished commodity products.” And Padilla also provided an example for the kinds of questions explored in the “replacement” view. In October, a U.S. House subcommittee on technology oversight heard testimony from Emily M. Bender, a linguistics professor at the University of Washington.

“I find that the phrase ‘artificial intelligence’ is best understood as a marketing term,” Bender had said, “and one which only muddies the waters. It is clearer to talk about automation.” This leads to different questions, including:

  • What is being automated?
  • Who is automating it and why?
  • Who benefits from that automation?
  • Who is being harmed and what recourse do they need?
  • Who has accountability for the functioning of the automated system?
  • What existing regulations already apply to the activities where the automation is being used?

The arrival of AI shouldn’t just be about worrying about which jobs will be replaced. Ideally, we also want open AI to have an “affirmative” impact, Padilla believes — supporting the evolution of roles in organizations.

Marketing Terms vs. Technical Descriptors

Padilla had opened his talk with a damning observation from a paper written last August. While earning his PhD at Carnegie Mellon, David Gray Widder had teamed up with Sarah West, managing director of the AI Now research institute, and Meredith Whittaker, president of the Signal Foundation (a nonprofit focused on open source privacy technology and developers of the Signal messaging app).

“We find that the terms ‘open’ and ‘open source’ are used in confusing and diverse ways,” the researchers wrote, “often constituting more aspiration or marketing than technical descriptor, and frequently blending concepts from both open source software and open science.”

“This complicates an already complex landscape, in which there is currently no agreed on definition of ‘open’ in the context of AI, and as such the term is being applied to widely divergent offerings with little reference to a stable descriptor.”

So we’re currently living in a world where licensing terms are hard to assess, Padilla said, not to mention the jolt of discovering a tech stack that is highly proprietary — and expensive. “We think we know what open means and then we start experiencing kind of like this M.C. Escher-like experience….”

👁 Screenshot from Thomas Padilla keynote at NISO Plus 2024 (when open AI feels like an M C Escher painting)

“I think it behooves us to have, you know, a more concrete sense of what we want from AI and the knowledge work that we do collectively together.”

TRENDING STORIES
David Cassel is a proud resident of the San Francisco Bay Area, where he's been covering technology news for more than two decades. Over the years his articles have appeared everywhere from CNN, MSNBC, and the Wall Street Journal Interactive...
Read more from David Cassel
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.