Job TreeNavigate the job tree to view your child job details
Loading job tree...
A comprehensive solution for video lipsyncing with a suite of different model and enhancements options.
ready
Outputs
waiting for outputs
Logs
listening for logs...
README

Lipsync (Alpha Release)

A comprehensive solution for video lipsyncing with a suite of different model and enhancements options.

Scroll down for examples!

Available backends include:

  • MuseTalk: This backend uses the MuseTalk model combined with CodeFormer (optional but recommended) to sync the lips in the driver video/image with the provided audio and restore the face.

  • Video Retalking: This backend uses the Video Retalking model combined with GPEN and GFPGAN to sync the lips in the driver video/image with the provided audio.

Important Notes:

LivePortrait as an enhancer is currently being improved and cannot be used in its current state. If "liveportrait" is passed as the enhance parameter, it will be automatically changed to "default". Improvements are coming soon!

The processing time depends on video resolution and video length along with the amount of time a valid speaker is detected.

MuseTalk is preferred for better overall face fidelity, Video Retalking for better lip movement and resolution. MuseTalk runs @ 25 FPS whereas Video Retalking can handle higher FPS.

We are actively working on a better offering that gives you the best of both worlds, stay tuned!

Tips for better performance:

  • Ensure there is only a single primary speaker in the video
  • Ensure the person is facing the camera
  • Ensure the person is not wearing any accessories that cover the mouth (e.g. mask, scarf, etc.)
  • Ensure the person is not moving their head too much
  • Ensure the person's face is not very small in the frame
  • The MuseTalk backend may perform unreliably in case the person has a lot of facial hair

Information on the cut_by parameter:

  • The duration of the audio file always supersedes the duration of the video file.
  • When audio is selected as the input and the video is shorter than the audio, the video is played until the end then played backwards to the start and so on until it meets the duration of the audio.
  • When video is selected as the input and the video is shorter than the audio, the audio is cut off when the video ends.
  • When shortest is selected, the file with the shorter duration between the two decides the duration and the files are cut off accordingly.

Pricing

Our video processing service uses the following pricing structure:

MuseTalk Backend

ResolutionPrice per Minute (Enhance: True)Price per Minute (Enhance: False)
> 720p (capped at 1080p)$0.50$0.20
≤ 720p$0.40$0.16

Examples:

Video ResolutionDurationPrice (Enhance: True)Price (Enhance: False)
1080p1 minute$0.50$0.20
720p1 minute$0.40$0.16
480p1 minute$0.40$0.16

Notes:

  • Any content above 1080p will be downsampled to 1080p
  • The "Enhance" option applies additional processing for improved quality
  • Prices are subject to change. Please refer to our latest documentation for the most up-to-date pricing information.

Video Retalking Backend

The prices are approximate and may vary. This table corresponds to the duration of the generated content.

EnhancePrice per Second
True$0.015
False$0.01

Examples

Works best on a computer or in landscape

Driving VideoDriving AudioOutputBackendEnhance PriceSieve Job
MuseTalk True $1.02 Here
MuseTalk False $0.043 Here
MuseTalk True $0.126 Here
Video Retalking True $0.084 Here