Bringing world-class lipsync to developers with sync.
We discuss our partnership with sync. to bring their new 1.9.0-beta model into the Sieve ecosystem.
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla
Cover Image for Bringing world-class lipsync to developers with sync.

sync. has established itself as a leading AI video company currently focused on building the best lipsyncing models in the world. The SF-based startup is the same research team behind the original wav2lip model which has been starred over 11,000 times on GitHub.

Recently, sync. partnered with Sieve to bring their newly announced sync-1.9.0-beta model into the Sieve ecosystem. This model is available in Sieve’s lipsync pipeline as a new backend option as well as in Sieve’s dubbing pipeline as a lipsync_engine.

A new standard for lipsync quality

Prior lip-syncing models typically replace only the lower half of the face to maintain realism. While this makes sense in theory, it often leads to unnatural results because the face moves in complex, interdependent ways when speaking. By contrast, sync-1.9.0-beta is a completely new architecture that replaces the entire face based on the target audio file, allowing for more accurate and lifelike animations.

Below are some results comparing sync-1.9.0-beta (left) with SieveSync (right), a MuseTalk-based approach that focuses solely on the lower half of the face.

It's a significant leap in realism and quality—particularly impressive given that the models are zero-shot, requiring no additional training. Notable improvements include:

  • More accurate visual synchronization with target audio
  • Better handling of complex facial features (e.g., beards and other facial hair)
  • Reduced blurriness and higher output resolution
"We're excited to integrate our models into the Sieve ecosystem. The ability to edit the recorded word in video is powerful, but it compounds massively when composed into workflows with robust video editing primitives. We're excited to see what developers can create with our combined capabilities."
Prady Modukuru, CEO of sync.

Enabling world-class video translation

With the enhanced realism, we're thrilled to integrate these models into our state-of-the-art dubbing pipeline, which now supports sync-1.9.0-beta as a lipsync engine. Below is part of a podcast by Pieter Abbeel dubbed in Spanish.

Multi-speaker support

Many videos feature multiple people on screen, which often confuses most lip-sync models because it's unclear who should be synced to the given audio. Sieve’s lipsync pipeline tackles this with an enable_multispeaker option that uses active speaker detection to automatically identify and apply lipsync to the correct person. Below is a demo where two speakers appear simultaneously, and Sieve automatically selects the right speaker for lipsyncing.

Looking forward

Our partnership with sync. is just one example of our commitment to creating production-ready AI video pipelines for a wide range of use cases, including those powered by high-quality lip-syncing. We’re excited to see developers explore these possibilities, and we look forward to sharing more partnership updates soon!