sync. has established itself as a leading AI video company currently focused on building the best lipsyncing models in the world. The SF-based startup is the same research team behind the original wav2lip model which has been starred over 11,000 times on GitHub.
Recently, sync. partnered with Sieve to bring their newly announced sync-1.9.0-beta
model into the Sieve ecosystem. This model is available in Sieve’s lipsync
pipeline as a new backend option as well as in Sieve’s dubbing
pipeline as a lipsync_engine
.
A new standard for lipsync quality
Prior lip-syncing models typically replace only the lower half of the face to maintain realism. While this makes sense in theory, it often leads to unnatural results because the face moves in complex, interdependent ways when speaking. By contrast, sync-1.9.0-beta
is a completely new architecture that replaces the entire face based on the target audio file, allowing for more accurate and lifelike animations.
Below are some results comparing sync-1.9.0-beta
(left) with SieveSync (right), a MuseTalk-based approach that focuses solely on the lower half of the face.
It's a significant leap in realism and quality—particularly impressive given that the models are zero-shot, requiring no additional training. Notable improvements include:
- More accurate visual synchronization with target audio
- Better handling of complex facial features (e.g., beards and other facial hair)
- Reduced blurriness and higher output resolution
Prady Modukuru, CEO of sync.
Enabling world-class video translation
With the enhanced realism, we're thrilled to integrate these models into our state-of-the-art dubbing pipeline, which now supports sync-1.9.0-beta
as a lipsync engine. Below is part of a podcast by Pieter Abbeel dubbed in Spanish.
Multi-speaker support
Many videos feature multiple people on screen, which often confuses most lip-sync models because it's unclear who should be synced to the given audio. Sieve’s lipsync pipeline tackles this with an enable_multispeaker
option that uses active speaker detection to automatically identify and apply lipsync to the correct person. Below is a demo where two speakers appear simultaneously, and Sieve automatically selects the right speaker for lipsyncing.
Looking forward
Our partnership with sync. is just one example of our commitment to creating production-ready AI video pipelines for a wide range of use cases, including those powered by high-quality lip-syncing. We’re excited to see developers explore these possibilities, and we look forward to sharing more partnership updates soon!