Alexis Conneau, the “voice” of ChatGPT, conquering emotional AI

Here is a newcomer in the artificial intelligence arena. WaveForms AI, a young company founded by Alexis Conneau, formerly of OpenAI, and Coralie Lemaitre, formerly of Google, announced this Monday a first fundraising of 40 million dollars from the emblematic Silicon Valley fund, a16z, managed by Marc Andreessen and Ben Horowitz.

The operation already values this start-up made up of only five people at $200 million. It must be said that its project is attractive in the eyes of Silicon Valley: WaveForms AI wants to make our interactions with artificial intelligence as natural and emotional as those between humans.

Alexis Conneau, the “voice” of ChatGPT

If the young company raises such an amount, it is thanks to the pedigree of the management team. Alexis Conneau is the French engineer who gave ChatGPT a voice. During his years at OpenAI, the thirty-year-old was responsible for the development of the “Advanced Voice Mode” of GPT-4o.

This feature, presented last May, impressed users with the fluidity of interaction that it allowed with an AI, very far from our strange coldness with Siri and Alexa. Unlike previous technologies, OpenAI’s audio model works in an “end-to-end” manner. This means it processes voice in a single stream, without intermediate steps like speech recognition or transcription.

Result: extremely short response times and unprecedented fluidity. Other players have succeeded in this, like the French laboratory Kyutai and its vocal AI Moshi. But the major innovation of OpenAI lies, according to Alexis Conneau, in the development of “audio intelligence”, that is to say a model capable of adapting its responses to the emotional fluctuations of users, without specific prior learning. .

Is Moshi, Kyutai’s voice AI, pulling the rug out from under OpenAI?

WaweForms wants to reach the Turing moment of voice AI

However, the engineer believes that there is still enormous room for improvement. “The Turing moment of voice [le moment où la voix des IA sera aussi naturelle que la voix humaine, Ndlr] is not yet reached »he says. Because AIs are still far from having our emotional nuances to be able to completely fool us.

The entrepreneur claims that the $40 million raised will allow him to develop audio models that surpass those of OpenAI on the emotional aspect. “We will see in the coming months and years an increase in the performance of audio models, similar to what we have seen for text models over the last two years, by playing with size and training data”he believes.

Ultimately, WaveForms AI aims to develop what the company calls a “general emotional intelligence” (EGI), distinguishing itself from artificial general intelligence (AGI), an AI surpassing all human capabilities.

“Most companies, especially AI giants [Meta, Google, Microsoft, Ndlr]but also news like Safe Superintelligence [d’Ilya Sutskever, le cofondateur d’OpenAI, Ndlr]are focusing on the development of an AGI, which will remain a cold, logical AI. But what will really make the difference is the quality of interaction, what we today call “UX” (user experience) for websites and applications. This is where a company can gain market share”explains Alexis Conneau.

The departure of Ilya Sutskever, the architect of AI safeguards, marks the end of an era for OpenAI

Reach social media users who are not yet acculturated to AI

Audio is just the first step, he says. Other technological layers will be added to make the experience with AI even more « immersive ». The goal of WaveForms AI is to reach “3 to 5 billion people connected to social networks like Instagram, TikTok and Facebook”but who do not yet use AI tools. “They are very different from the 300 to 500 million users who use ChatGPT, Midjourney and others today”he believes.

The company does not yet have a product. Alexis Conneau, who plans to primarily address consumers rather than businesses, imagines that his technology will make it possible to develop private teachers. In addition to being experts in a subject like certain AI today, they will also be equipped with“empathy, kindness and patience”says the leader. Or at least, they will be able to imitate this human ability to perfection. He is also thinking of applications “entertainment” or in customer service.

How generative AI start-ups are flirting with national education

To explain his work, the engineer often uses the film as a reference. Herby Spike Jonze, where the hero ends up falling in love with the AI assistant to whom he constantly talks. This dystopian film often acts as a holy grail for certain companies. “It’s an inspiring film because it embodies a future that we want to avoid, a future where interactions with AI replace human interactions. What we imagine is rather a complementarity: in the same way that you have an interaction with your television or your streaming platform at certain times of the day.justifies the young entrepreneur. Fine, but Netflix doesn’t try to look as much like a human as possible…

A risky ethical bet

Furthermore, the risk of becoming too attached to virtual “companions” already exists, despite their imperfections. News items regularly remind us of this. Last October, the mother of a young American teenager filed a complaint against Character.ai, a company offering personalized chatbots, which also rides on the concept of empathetic and “emotionally” intelligent AI. Mother accuses company of driving her son to suicide by conceiving “intentionally generative AI systems with anthropomorphic qualities in order to blur the lines between fiction and reality”.

“Emotions are among the characteristics that an algorithm will never master” (Aurélie Jean)

What will happen when the “Turing” moment of the voice is reached, as Alexis Conneau promises? The young engineer, who worked at Facebook, is aware of the risks linked to excessive consumption of digital platforms. He says he is already thinking about mechanisms that would limit the conversation time between a user and an AI. In particular by relying on a business model which does not depend on the time spent on the application. It remains to be seen whether this promise will be kept. Response expected within a few months during the launch of its first product.

Alexis Conneau, the “voice” of ChatGPT

WaweForms wants to reach the Turing moment of voice AI

Reach social media users who are not yet acculturated to AI

A risky ethical bet

Related posts