No Examples Found
EchoMimic
This is an implementation of EchoMimic, a lifelike audio-driven portrait animation model, using the accelerated inference option.
Usage
source_image
anddriving_audio
are required.source_image
can be a video or image. If video, only the first frame will be used.driving_audio
can be a video or audio file. If audio, the audio will be extracted and used as the driving audio. If video, only the audio will be used.output_width
andoutput_height
default to 512. It is recommended that it be kept at 512 to maintain the quality of the animation.video_length
defaults to -1, which will use the length of the driving audio.facemask_dilation_ratio
andfacecrop_dilation_ratio
default to 0.1 and 0.5 respectively. Increasing these can lead to a bigger crop of the source image, but can also lead to artifacts around the face.
Note: The following parameters can be sensitive and require more experimentation for the best results. Default values are recommended.
context_frames
andcontext_overlap
default to 12 and 3 respectively.cfg
defaults to 1.0.steps
defaults to 6.fps
defaults to 24.
Pricing
This model runs on a single A100 40GB GPU which is priced at $4.2 per hour. Check out our pricing page for more information.
Inference time is approximately 5 seconds per second of audio.