James W. Marshall and ChatGPT 3.5 have one thing in common: both started a “rush” that changed the world. The discovery of the first gold nugget in 1848 rushed 300,000 people to California. The launch of ChatGPT 3.5 in November 2022 brought AI and large language models (LLM) to the forefront and attracted millions of users around the world.
The rise of AI has quickly raised crucial questions: respect for copyright, algorithmic bias, ethical issues, data confidentiality, security and impact on employment. The EU’s desire to regulate AI via the AI Act is timely. In this context, businesses around the world are exploring the possibilities offered by AI to optimize their operations and drive growth.
Do not hide the risks and side effects
AI is the new gold rush, but be careful not to fall into the trap of a digital Wild West! Too many companies are rushing in without understanding the dangers, even though they have a responsibility to use AI responsibly and ethically.
The risks are real: data leaks, algorithmic bias, reputational damage. The example of chatbot Tay in 2016, with racist and misogynistic excesses, is a reminder of the potential dangers. Consumer concerns, reflected in a recent study
(78% are concerned about their data being used by AI), highlights the importance of a careful and transparent approach.
AI is already widely used, but often without governance, as was the case with the rushed adoption of the cloud. This lack of control can lead to costly errors. To avoid repeating the same mistakes, companies must absolutely supervise the use of AI. This requires internal regulation, strict control of access and the establishment of clear usage policies. Companies like Amazon and JPMC have already taken action by restricting access to ChatGPT, planning a gradual and controlled reintroduction once
safeguards in place.
It is essential that companies clearly determine what data their AI projects can use and how. A role-based access control system, associating each role with specific tasks and permissions for each data source, provides a scalable solution. This system ensures that only individuals with the necessary privileges can access data, in compliance with legal regulations and geographic requirements, including sovereignty
data.
An often overlooked, but crucial, aspect is the traceability of the data used to train AI models. Knowing what data was used and in what order is essential to understanding how AI works and the potential biases. This lack of transparency can have considerable legal, moral and ethical consequences, particularly if the AI makes a decision with serious consequences. In the event of a dispute, the traceability of AI learning will be a key element. Maintaining a complete history of training versions is therefore imperative.
-Promote the transparency of learning processes and “reversibility”
Classifying and documenting training data is essential for the transparency and quality of AI learning. But despite the best intentions, the complexity and time taken to implement AI learning processes can lead to risks and abuses.
Take the example of Tesla, which has been training its AI for years for autonomous driving. How can we protect it effectively against errors, loss, theft or manipulation? How can we guarantee respect for intellectual property in AI training, as illustrated by the New York Times lawsuit, whose articles were used without authorization to train LLMs? A responsible and governed approach is essential.
To date, there is no technology that can accurately record the changes an AI model undergoes as it trains with new data. If a model is trained with bad data, for example copyrighted content, it is impossible to directly restore it to a previous state.
Workarounds, inspired by IT security practices, are necessary. System snapshots in particular, allowing you to revert to an earlier version, offer an alternative, even if some recent data is lost. Businesses should consider this approach to managing AI risks.
Par Laurent GarciaSales Director Southern Europe at Cohesity