Nvidia has just unveiled Fugatto, a new and somewhat unusual AI audio synthesis model; he is apparently capable of creating entirely original sounds that don’t exist anywhere else.
Audio synthesis systems are not new; There are already many generative models capable of producing larger-than-life speeches or very convincing sequences of musical notes from a simple textual query, like ChatGPT and others do with text. But with Fugatto, Nvidia intends to push the limits of the concept. It is in fact based on a new proprietary training method which allows it to “ transform any mix of music, voices and noises » to synthesize “ completely new sounds ».
A meowing saxophone and a singing ambulance
On the project’s GitHub page, Nvidia presents some rather conventional examples, such as a rap song with entirely synthetic lyrics. The second category, called “ Emerging sounds », contains on the other hand some much more… original examples. On the menu: a saxophone barking or meowing, a typewriter whispering, a dog talking, ambulance sirens “singing” in chorus, or even a strange violin sound derived from laughter baby.
Most of these examples are downright strange and, admittedly, not particularly convincing. But from a strictly technical point of view, this is a fairly exciting innovation. There are already tons of models that are capable of hybridizing and transforming images or text in this way, but to our knowledge, This is the first time an AI model has been able to manipulate sound in this way.
Subscribe to Journal du Geek
However, it has not been that long since large language models (LLM) like ChatGPT or image generators like DALL-E or Midjourney have been able to offer convincing results. Just a few years ago, they were more or less at the same stage as Fugatto; Most of the time, they tended to spit out sentences that didn’t make any sense or images that looked more like pixel mush than coherent visuals.
We must therefore see Fugatto as a very interesting proof of concept which is still very far from having revealed its full potential. Ultimately, this new tool could make it possible to create particularly exotic abstract soundscapes, in the same way that modern image generators can create objects and landscapes that do not exist by reworking photographs from the real world.
« We wanted to create a model that could understand and generate sounds like humans do », Explains engineer Rafael Valle in the Nvidia press release. “ Fugatto is our first step towards the future of unsupervised multi-task learning applied to audio synthesis and transformation. »
Unfortunately, the general public does not yet have the opportunity to experiment with Fugatto. For the moment, it is limited to a promotional video and a research paper accompanied by the few examples cited above. It will therefore be advisable to keep an eye on this intriguing tool while waiting for it to be made available to Internet users.
???? To not miss any news on the Journal du Geek, subscribe on Google News. And if you love us, we have a newsletter every morning.
Related News :