
A 500-word document was released several weeks ago that will significantly impact the future of the internet. The Open Source Initiative (OSI) released a near-final definition of open source AI that will free up the broad community of AI developers to create a flourishing movement for AI innovation, much like with the creation of the internet itself. Open source software underpins the internet infrastructure and most applications in use today. That came to be because the open source pioneers defined it as software that would always be free to use and modify. This enabled the widespread adoption of open source software and the innovation that powers our digital lives.
It couldn’t come at a better time. We’re seeing a wave of AI models — many from the biggest tech companies — being touted by “open source” failing to reflect the spirit of the original open source software definition. While this might feel like semantics, words matter. Sloppy language around open source AI can scupper trillions of dollars in future innovation and leave terms of AI in the hands of a few big companies.
There is much to lose if there is not a genuinely open AI development ethos and community. A recent study from Harvard University shows that open source software has created around $8 trillion in economic value. All of this innovation rests on the assurances in the original open source definition written in 1998: that any software calling itself open source will always be free to use, study, modify, and share. This means that you can build a business, a government service — anything, really — on top of open source software without the fear that someone might charge you or change the terms of using that software in the future.
We will see these same benefits with AI, but only if developers can freely use, study, modify, and share all elements of an AI system. The phrase “all elements of an AI system” is critical here. AI and software have some crucial differences. An AI system includes software code built from working AI models and underlying training data used to create the model. The OSI’s new definition asserts that the code and models must be open, and the data must be transparent and reproducible. Suppose we want to unlock another era of creativity and innovation. In that case, we need AI labs — including the big commercial players — to embrace this definition before calling what they release “open source.” Without this, developers may avoid open models, and the whole open source ecosystem may stall early.
There has been a surge of large language models (LLMs) coming from the largest tech companies — Meta’s Llama being most notable — that have been touted as open source. These models make it easy to build AI applications without the exorbitant costs required to build them from scratch. We’ve seen valuable AI applications, from drug discovery to medical education, built on top of these models. This is absolutely a step in the right direction, but there’s a caveat: these AI models are not truly open.
In an Economist opinion piece earlier this week, Mark Zuckerberg and Spotify CEO Daniel Ek define open source AI as “models whose weights are released publicly with a permissive license” and cite Llama as an example. This narrow definition leaves the door open for companies like Meta to change course and stop releasing parts of their AI models if they no longer serve their interests. If that happened, developers who have built on top of these models might find their products inoperable or, at the very least, severely restricted — think disrupted services and stunted innovation. This raises real concerns about the long-term viability of applications built on top of these models — and, overall, about the viability of a vibrant open source AI ecosystem.
In February, Mozilla and Columbia University convened leading experts to explore what openness should mean in the AI era. The resulting paper flagged the risks of narrow and sloppy uses of the term “open source” in AI. It also raised a red flag around “open-ish” licenses, such as the Llama license, which only grants free use for products with fewer than 700 million monthly users. Can you imagine building your startup on open software that would get locked down as soon as your business is successful? That’s what licenses like this would do.
The draft definition aims to address these risks — drawing clear boundaries around what counts as open source AI so that developers know what they can rely on. This will put wind in the sails of AI labs building open source AI models that won’t disappear or eventually close down.
Examples include EleutherAI’s GPT-NeoX-20B, released under the Apache 2.0 license, which allows anyone to use the model. Similarly, the Allen Institute’s OLMo model provides full access to the code, data, weights, and evaluation suite used in developing it, enabling researchers to study and refine it. Unlike Meta’s Llama, these models allow researchers to fully study and test the inner workings of AI systems and adapt them to their own needs.
It’s also worth noting that labs like Eleuther and AI2 are nonprofits, giving developers the confidence that these resources will remain available and up-to-date, ensuring the sustainability of their products built on top of these models. This same principle of enduring support has made open source projects like Linux and Apache so prevalent in servers worldwide. Developers know and trust the Linux and Apache foundations will continue to keep their software operating in the public interest.
The work of these nonprofits has the potential to create an AI future that at once contributes to a broader public good and makes a genuinely open toolbox for the AI era. Policymakers, philanthropists, and the broader tech community should step up and support initiatives like these. The more prominent commercial players should take these projects as a model, changing their approach to one more in line with the new OSI definition. If we get this right, we can empower anyone — and any community — to shape, enjoy and trust AI. The future of our digital infrastructure and our ability to innovate depends on it.
Additional Coverage
The post Defining Open Source AI Will Solve a Million Headaches appeared first on The New Stack.
Words matter. Without a genuinely open AI development ethos and community, there is much to lose.