Video Retalking

Announcement: The Sieve team is look for beta testers of a newer, higher quality lipsync app . Please reach out to contact@sievedata.com if you're interested in participating and providing feedback.

This is an optimized version of VideoReTalking, an audio-based lip synchronization model for talking head video editing in the wild. Sync your lips with any video using this model.

Note 1: The processing time depends on video resolution and video length but a general rule of thumb is that it takes ~13 seconds to generate a single second of video.

Note 2: To speedup inference times, you can trigger booleans to cut resolution and fps of the output video by half.

Other tips:

  • Ensure there are no abrupt scene cuts in the video
  • Ensure there is only a single person in the video
  • Ensure the person is facing the camera
  • Ensure the person is not wearing any accessories that cover the mouth (e.g. mask, scarf, etc.)
  • Ensure the person is not moving their head too much
  • Ensure the person is at at most arms length from the camera