News

Watch Microsoft’s new AI create a rapping Mona Lisa from a single photo and audio track

can create realistic human-like deep fakes from a and . Imagine interacting with online avatars that are so realistic that you can’t tell them apart from a human being.  The company says that, “VASA-1 is capable of not only producing lip movements that are exquisitely synchronised with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness”. A user would be able to create a video where the gaze of the avatar faces (forward-facing, leftwards, rightwards, and upwards, respectively), , or even expressing (neutral, happiness, anger, and surprise, respectively) And the proof is certainly in the pudding.  Microsoft just dropped VASA-1. This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba 10 wild examples: 1. Mona Lisa rapping Paparazzi Using a photo of the and an audio track of , VASA-1 created a rapping Mona Lisa, with generating some seven million views.  Microsoft has released a few samples of what VASA-1 can do, and it definitely seems to open the door to the rise of deep fakes. In fact, Microsoft itself acknowledges the possibility of this happening saying that it won’t release an online demo, API, product, additional implementation details, or any related offerings until the company is certain that the technology will be used responsibly. Source: ,