The definition of open AI (the term used internationally is “open source”) is an important issue in the regulation of the sector. The European AI Act relies in particular on this term to exempt their designers from certain obligations:
« Third parties that make AI tools, services, processes or components other than general-purpose AI models publicly available should not be required to comply with requirements for responsibilities across the AI value chain. AI, in particular with regard to the supplier who used or integrated them, when these AI tools, services, processes or components are made accessible under a free and open license ».
In October, the Open Source Initiative (OSI) released version 1.0 of its definition of AI. During the different phases of developing this definition, information on the data used to train AIs was at the center of discussions. Companies like Meta, which considers its Llama models to be the leaders in open-source AI, are absolutely unwilling to give details of the data they use. On the other hand, some open source players like Linagora insisted that only AIs for which the learning data are known should be considered open.
A definition that is causing a stir on Debian’s side
Ultimately, the OSI 1.0 definition emphasizes the need for a “ full description » data used to train the model without requiring that this data be completely known. This definition convinced certain actors like Hugging Face, Linagora, :problabl. or Mozilla who have officially approved it.
But Debian developers have had quite virulent positions against this definition on their discussion list. One of them, Mo Zhou, explained for example that “ AI systems are software and (to quote Bruce Perens, author of DFSG [Debian Free Software Guidelines] and OSD [Open source definition] and founder of OSI) the training data is the source, so the OSAID [open source artificial intelligence definition] is fundamentally incompatible with the OSD “. Let us also remember that the definition of open source was derived from the Debian Free Software Guidelines (DFSG), as the OSI also explains at the end of its text.
Another Debian developer opposed to the OSI definition, Australian Sam Johnston, has taken steps to build an organization around a definition that would clearly include the publication of training data. In particular, he proposes the draft of another definition, which is also based on the original definition of open source. It begins by asserting:
« Open source does not just mean access to the source, but the freedom for users to study, use, modify and share the program, for any purpose and without having to ask permission . In cases where the software relies on data – including databases, models or media – for its creation, modification or operation, such data is considered an integral part of the program and is subject to the same requirements . »
A new structure: the Alliance for Open Source
Our colleagues from Context obtained his proposal to participate in the AI Action Summit (Summit for action on Artificial Intelligence) organized by the Élysée on February 10 and 11 in which he explains that an “Alliance for Open Source » (AOS) is being implemented. This would notably focus on “ on France as a strategic center to encourage collaboration and innovation in the field of Open Source ».
And surprisingly, while Linagora and :probabl. approved the OSI text, Sam Johnston explains “ recent productive meetings with Yann Lechelle, CEO of :probabl. and Alexandre Zapolsky of LINAGORA, helped initiate an organized awareness effort to involve national free software groups as first-time adopters and collaborators, thus helping to lay the foundations for the formalization of AOS in the arena. public ».
He also hopes to be able to rally the April association, the Debian project, the Free Software Foundation Europe, OpenForum Europe, Linux Australia, the Free Software and Open Source Foundation for Africa, OpenUK or even international structures such as the Software Freedom Conservancy, the Free Software Foundation, Digital Public Goods Alliance, the Linux Foundation or the Apache Foundation.
With this new organization, Sam Johnston wants “ maintain the integrity of Open Source principles ».