IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

What does a rapping Mona Lisa look like?

Answer: Thanks to AI, now we know.

Seen from behind, this documentary depiction shows Leonardo da Vinci creating the Mona Lisa in his workshop.
Shutterstock
A new AI model from Microsoft Research Asia has given us something you probably thought you’d never see — the Mona Lisa rapping. This is thanks to the company’s latest AI model, which can generate deepfake videos from a single still image and audio track.

This isn’t the first AI model of this kind, but it is more realistic and accurate than previous versions. Called VASA-1, the model was trained using footage of 6,000 talking faces from the VoxCeleb2 data set. Then, it just needs to be given a single headshot image of a person and an audio clip, and it can create a realistic video of the person in the headshot lip syncing the supplied audio.

It can create the videos at 512x512 pixels at 40 frames per second “with negligible starting latency.” You can also adjust the settings for facial dynamics and head poses to achieve certain effects like specific emotions, expressions and gaze direction. “Such technology holds the promise of enriching digital communication, increasing accessibility for those with communicative impairments, transforming education methods with interactive AI tutoring, and providing therapeutic support and social interaction in health care,” said a paper describing the technology.
Sign up for GovTech Today

Delivered daily to your inbox to stay on top of the latest state & local government technology trends.