
RALEIGH, N.C. — The Open Source Initiative (OSI) has officially released version 1.0 of its Open Source AI Definition (OSAID) on Oct. 28 at the 2024 All Things Open conference this week. It’s been a long slow journey to this significant milestone in the effort to establish a clear standard for open source artificial intelligence (AI).
This release, which is unchanged from the first OSAID definition release candidate, comes after a two-year process involving extensive collaboration with academia, industry, and the broader open source community. The OSAID aims to provide a standard by which AI systems can be evaluated to determine whether they truly qualify as open source.
Spoiler alert: Many won’t. The open-wash releases from AI companies such as OpenAI and Meta don’t make the OSAID grade.
According to the new definition, for an AI model to be considered open source, it must:
- Provide sufficient information about its design to allow substantial recreation
- Disclose pertinent details about training data, including provenance and processing methods
- Allow usage for any purpose without permission
- Permit studying of the system’s inner workings
- Enable modification for any purpose
- Allow sharing of the original or modified version.
The definition also addresses the contentious issue of training data. While it doesn’t require the full release of datasets, it mandates “sufficiently detailed information about the data used to train the system” to enable recreation by skilled individuals.
-
Carlo Piana, OSI Chairman
That won’t be good enough for some people. For an AI project to be open source, they want all the data to be open as well.
The problem, as RedMonk analyst Stephen O’Grady recently pointed out in a post is “Source code … is a precisely and narrowly bounded subject area. AI projects are not. Their scope blends software, data, techniques, biases and more. AI is inarguably a fundamentally different asset than software alone.”
This has led, O’Grady wrote, “Idealists seeking to preserve and protect the bedrock principles of open source, for example, argue that any model that doesn’t require training data is compromising the four key freedoms that the original open source definition satisfies.
The OSI, for its part, contends that in discussions with various AI researchers, their consensus opinion is that the weights are more important than the original training data. That position may or may not be correct. What is definitely true is that even if that assertion is correct, it is a nuanced position that is counterintuitive and requires a lengthy explanation.”
No matter where you land on this issue, this new standard could have significant implications for companies that have been marketing their AI models as “open source.”
In his All Things Open keynote, Stefano Maffulli, OSI’s Executive Director, explained the OSI had started its long hard journey to create the OSAID because “Companies and projects were calling themselves open or open source with the word AI next to it, and these have absolutely nothing to do with open source principles. We were forced into making decisions and taking action because also no one knew what open source AI was in the space, and regulators were even introducing the term open source AI into laws without providing any definition or any hint of what that meant. We were successful in explaining to regulators that open source needs a special treatment. But with that success comes also the responsibility to act.”
The European Union (EU)’s Artificial Intelligence Act is the most important open source AI to data. In an interview, Carlo Piana, the OSI’s chairman and an attorney, explained, “The AI Act has a definition of open source, but it hinges on the older definition of open source software; the OSAID should close the gap, increasing the requirements and establishing what is necessary for something to be really open.”
That said, while this release marks a stable version of the definition, the OSI acknowledges that further refinements may be necessary. The organization has established a committee to monitor the OSAID and propose amendments for future versions.
OSAID 1.0
That does not mean that OSAID 1.0 is a beta release. It’s not.
Piana explained, being open to change is an “acknowledgment that our collective understanding of what AI does, what’s required to modify language models is limited now. The more we use it, the more we’ll understand. Right now our understanding is limited, and we don’t know yet what the technology will look like in one year, two years, or three years.” Thus the OSAID is leaving room for future flexibility.
Other organizations have already endorsed the OSAID, including the Mozilla Foundation, the OpenInfra Foundation, Bloomberg Engineering, and SUSE.
Looking ahead, this release represents a significant step toward clarifying what constitutes open source AI. While not everyone will agree with the OSAID, and the field of AI is evolving rapidly, this definition provides a framework for developers, researchers, and policymakers to evaluate and create truly open AI systems. The impact of this standard on the AI industry and open source community will unfold in the coming months and years.
The post The Open Source AI Definition Is Out appeared first on The New Stack.
But, the work of defining open source AI is far from done.