Thanks to AI, Nvidia will make you listen to sounds never heard before

Tuesday 26th November 2024 12:07 PM

A team of generative AI researchers at Nvidia has created a veritable audio Swiss army knife, which allows users to control audio output simply using a text command.

While some AI models can compose a song or modify a voice, none have the dexterity of this new model.

Baptized Fugatto (abbreviation of Foundational Generative Audio Transformer Opus 1)it generates or transforms any mixture of music, voices and sounds described using text commands using any combination of texts and audio files.

Imagine a meowing trumpet!

For example, it can create a sample of music from text, remove or add instruments to an existing song, change the emphasis or emotion of a voice – and even allow people to produce sounds that they have never heard before.

Nvidia says its new AI music editor can create “never-heard-before sounds,” like a meowing trumpet. The tool, called Fugatto, is capable of generating music, sounds and speech from text and audio input that it has never been trained on.

Screenshot of a simple text command to create crazy melodies!

Nvidia (YouTube)

Or a saxophone that screams, barks, then electronic music with barking dogs

As shown in the video below, this allows Fugatto to compose songs based on completely whimsical prompts, such as “Create a saxophone that howls, barks, then electronic music with dogs barking” (2 min 38 s).

It can even transform the sound of a person’s voice, changing their accent or giving it a different tone, such as angry or calm. It’s also possible to edit music, as Fugatto can isolate vocals in a song, add instruments, and even change a melody by replacing a piano with an opera singer.

There are already several other AI audio tools, but they cannot create completely new and unique sounds, as shown in this comparison table in a document published by Nvidia.

Comparison of the Fugatto audio generator with its competitors

Nvidia

To create Fugatto, Nvidia researchers had to gather a dataset containing millions of audio samples. They then created instructions “that significantly expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.”

Nvidia does not say when – or if – the tool will be widely available.