The magicians at NVIDIA have just unveiled a technology that will shake the world of audio. His nickname? Fugattoan artificial intelligence model that literally transforms any sound upon simple textual request.
This versatile AI can juggle all types of sounds: voices, Music, sound effects… It can generate new sounds, modify existing sounds or even invent completely new sounds that do not exist in nature.
Did you dream of making a trumpet meow? To give an Italian accent to your voice? Or maybe transform your old acoustic demo into a supercharged electro track? Well Fugatto can do it, and much more!
The principle is surprisingly simple: you provide audio and/or a textual description of what you want, and the AI takes care of the rest. For example, you could ask him “Make this guitar sound like it’s being played underwater” or “Transform that voice into that of a melancholy robot”. And the most fascinating thing is that Fugatto understands these poetic instructions perfectly!
What makes this technology truly incredible is its versatility because unlike other AI models which specialize either in music (hello Suno) or in voice, Fugatto excels in all areas. All tests show that it matches or outperforms specialized models in their respective tasks, while providing excellent flexibility.
The potential applications are endless… Music producers will be able to quickly prototype different arrangements, video game creators will be able to generate dynamic soundscapes that adapt to gameplay, advertising agencies will be able to easily adapt their spots with different accents and app developers will be able to create wizards personalized vocals.
The true technical prowess of Fugatto lies in his ability to compose instructions that he has never seen together during his training. For example, you can ask it to create the sound of a thunderstorm which gradually transforms into birdsong or electro music.
This versatility is based on a sophisticated architecture with 2.5 billion parameters, trained on more than 50,000 hours of audio data. The team of researchers, led by Rafael Valle, developed an innovative approach called ComposableARTwhich allows fine control over every aspect of audio generation.
This technology also benefits from an interpolation functionality which allows the intensity of the effects to be precisely measured. Do you want a light Marseille accent rather than a strong one? Or a voice that gradually changes from happy to sad? This model can do it with remarkable finesse.
The diversity of the international team that developed this technology, with researchers from India, Brazil, China, Jordan and South Korea, has greatly contributed to the model’s multilingual and multi-accent capabilities. . I would have liked to test this thing but NVIDIA hasn’t announced a public release date yet… YES!
However, other alternatives already exist: Meta offers an open source audio development kit, and Google has its own text-to-music model called MusicLM.
You will have understood it, Fugatto is a major breakthrough that will certainly transform the way we create and manipulate sound. I’m really looking forward to trying it!
Learn more about Fugatto