AI: DeepL tackles voice

The European gem of translation assisted by artificial intelligence, DeepL has just unveiled its first foray into the field of voice.

The German publisher presented two new offers – Deepl Voice Dialogue, and Deepl Voice Réunion – during a promotional event (DeepL Dialogues) on May 13 in Berlin. Both tools are capable, on paper, of translating conversations in different languages in real time.

The Meeting version aims to translate the interventions of participants, who each speak different languages, and to display them in subtitles (in the language chosen individually by each listener). The Dialogue version is a mobile app for face-to-face conversations.

As with its other products (Translation and the Write rewriting tool), DeepL relies on its AI research and its own models, recalls Jarek Kutylowski, managing director and founder of the publisher. The models were trained on datasets with different emphases.

“Real-time speech translation poses other challenges [que ceux de la traduction par écrit] : incomplete information, pronunciation problems and latency are factors that can lead to inaccurate translations,” emphasizes Jarek Kutylowski. “These same elements can lead to misunderstandings […]. So we designed a solution that takes this into account from the start.”

After a beta test phase, DeepL Voice is now officially available. The tool supports around ten spoken languages (English, German, Japanese, Korean, Swedish, Dutch, French, Turkish, Polish, Portuguese, Russian, Spanish and Italian), with translated subtitles available in all 33 languages taken from DeepL Translator.

“I have already tested other tools, but they generally only support one language in meetings,” praises Christine Aubry, internationalization coordinator at Brioche Pasquier, who participated in the DeepL Voice beta phase. . For her, “DeepL Voice is different and by far the most complete tool.”

An increasingly competitive AI translation market

DeepL does not (yet?) do “voice to voice”, but rather “speech to text” with translation.

In this segment, Samsung, in its high-end models with Galaxy AI, Google, in its Translate mobile application, and video editors (WebEx, Zoom) have launched similar translated subtitle functionalities.

Another player, OpenAI, is exploring the new horizon of instant oral translation.

The technical particularity of the “advanced voice mode” (internal name of the functionality at OpenAI) is not to break down the translation process into three parts – speech to text / translation/text to voice – but to entrust everything to a single model to reduce dialog latency.

The philosophy is not exactly the same as those of DeepL and the subtitles of video editors, but the targeted need seems quite close (collaborating in real time with several people in different languages). The future will tell which option will prevail – the one which keeps text or the one which switches to voice – depending on ergonomics and price.

The market is in any case more and more competitive since the major LLMs (GPT 4 o, Claude, Mistral) are today capable of translating texts, keeping in mind the rules set out by users to personalize them. A stone, bigger and bigger, in the historic garden of DeepL.

For its part, to avoid being swallowed up, DeepL has increased its number of new features over the past year, in particular with the release of an LLM to power its translator. With its fundraising, the company is valued at $2 billion.

An increasingly competitive AI translation market

Related posts