When AI puts sound in the image

Google’s DeepMind lab has made a significant breakthrough in the field of generative artificial intelligence. Its researchers have just developed a system called V2A, capable of producing soundtracks, sound effects and dialogue to accompany videos.

Until now, existing AI models were able to generate videos, but remained silent, unable to create any sound to accompany them. DeepMind has managed to fill this gap thanks to its V2A system, to “ video-to-audio“. This technological advance could well revolutionize the world of audiovisual production.

The V2A system is based on an AI model trained on a large dataset consisting of sounds, dialogue transcripts and video footage. Very advanced training which allows him to analyze the raw pixels of a video and generate sound accompaniment perfectly synchronized.

Whether it is a musical soundtrack, sound effects or even dialogues, everything can be created by this AI to match visual content. And the most surprising thing is that this audio generation can be carried out without any prior textual description.

Current limitations

Although this technology opens up promising prospects, particularly in the field of audiovisual heritage preservation, its quality is not yet perfect. DeepMind recognizes that the sound result generated by its AI currently lacks naturalness and realism.

The system particularly struggles to process videos that are degraded or contain artifacts. Improvements are therefore still necessary before possible large-scale distribution. In fact, DeepMind does not plan to make V2A accessible to the general public for the moment.

The company also wants to conduct in-depth assessments of the security and potential ethical impacts of its powerful system. This could easily be diverted to produce parodic content, defamatory or infringing copyright without the consent of the rights holders. Consultations are underway with audiovisual media professionals.

Audiovisual jobs under threat

Beyond the technical challenges, V2A and similar technologies raise questions about their future influence in the film and audiovisual industry. If these tools were to become widespread, they could potentially threaten many creative professions linked to audiovisual production.

Film music composers, sound effects and sound effects creators or even dubbing actors could see their services rendered largely superfluous by AI systems capable of automatically generating these audio elements. A risk of deskilling and massive job losses would then hover over these professions.

Faced with these threats, the industry will have to prepare and think about a regulatory and legal framework governing the use of these technologies. Measures to protect employment and intellectual property must be put in place.

Share the article:

Facebook

Our blog is powered by readers. When you purchase through links on our site, we may earn an affiliate commission.