a powerful new tool for sound synthesis and audio processing

Wednesday 27th November 2024 11:19 AM

Nvidia has just unveiled a brand new AI model, capable of generating and manipulating sound using simple text queries. If the tool is not yet accessible to the public, a first glimpse suggests enormous potential in terms of sound design.

Chip giant Nvidia continues to blaze its trail in the field of generative artificial intelligence. For several years now, the company has been at the forefront thanks to its graphics cards and data center chips, which are particularly popular for training and inferring the different models underlying AI applications. generative general public.

But the company is not just a hardware designer, far from it. On the contrary, at least as important a part of its success is due to the vast software ecosystem that the firm has developed over the years. In the graphic design, 3D modeling, animation and special effects sector, its RTX application platform is omnipresent and largely dominant.

If Nvidia already made intensive use of different artificial intelligence technologies to improve graphics rendering in video games, with its famous DLSS, the company is no longer confined to images. After announcing, last June, a suite of tools to “bring to life” virtual characters that are larger than life, Nvidia has just unveiled a project that could shake up another sector: sound.

Nvidia Fugatto: an AI model for generating and manipulating sound

The new arrival in the large family of Nvidia software is therefore called Fugatto, short for Foundational Generative Audio Transformer Opus 1. This poetic name is also most certainly a reference to the having fledword designating a musical section written in the style of fugue, a composition technique whose principles have some resonance with those of artificial intelligence models.

Fugatto therefore presents itself as a foundation model dedicated to sound generation and transformation, based on textual queries expressed in natural language. This principle is reminiscent of other applications oriented towards musical creation, such as Suno. But where other solutions mainly aim to create complete, ready-to-use songs, Fugatto takes a slightly different direction.

Nvidia’s project actually seems to be looking more towards audio synthesis (audio synthesis), sound design (sound desing) and sound processing in general. Rather than a sort of autonomous digital audio workstation powered by AI, Fugatto is positioned more as a new ultra-flexible tool in the sound and music production chain, alongside plugins and other virtual instruments.

For example, Fugatto allows you to extract certain sound components from an audio file, in order to isolate voices, instruments or background noises from a recording, to rework them separately or integrate them into another project. But the model can also transform audio files in an astonishing way, by applying a specific accent or intonation to a vocal recording, or by modulating the timbre of an instrument to make it “meow”, “howl” or even “roar”. .

And obviously, Fugatto is capable of generating entirely new sounds from verbal instructions written in natural language. In the presentation video, we see (or rather hear) that the model can generate complex and evolving soundscapes, such as an approaching train that gradually transforms into a symphony orchestra, or a thunderstorm that slowly fades to become a birdsong.

These few examples should be enough to arouse the interest of any lover of musical creation or sound design. While some enjoy spending hours manipulating their favorite wavetable synth to create unique sound textures, others prefer to focus on aspects like composition or arrangement, and the arrival of a tool like Fugatto should therefore sound like a blessing to their ears.

But professional sound engineers and amateur wave tinkerers could also find what they’re looking for. After creating a patch complex on his favorite synth and wrote some patterns well-felt melodic sounds, it would be enough to send everything to Fugatto and give him some instructions to radically transform his sound samples, before reimporting everything into his sequencer.

Great possibilities in perspective therefore, but which remain hypothetical for the moment. Everything will depend on the distribution model chosen by Nvidia: will the model be able to run locally, on an RTX graphics card for example, or will it only work online? Will it just be an app? standalone (standalone) or will it be possible to integrate it in the form of plugins into your sequencer? And if so, what formats will be offered (CLAP, VST, AAX, etc.)?

So many questions that have no answer at this stage. Fugatto is currently an impressive generative artificial intelligence model project, but with no announced release date. We will therefore have to wait a while longer, and follow Nvidia’s future announcements to learn more about it, perhaps during CES in January 2025.