No Examples Found
AI Dubbing
Sieve's AI Dubbing is an entirely automated dubbing API built for developers. It can take an input video or audio file and seamlessly dub it across many languages.
Note: The API currently accepts files up to 15 minutes long. If you need to dub longer files, please contact us (sales@sievedata.com) for access.
For pricing, click here.
For more information on Lipsync backends and enhancements, click here.
For further information, click here.
Key Features
- Speaker Style Preservation: Preserve the tone and style of the original speaker.
- Multiple Speaker Support: Multi-speaker support with distinct voices for each speaker.
- Broad Range of Languages: Support for 29 popular languages.
- Background Noise Addition: Add original background noise back to the dubbed audio for a more natural-sounding dub.
- Language Styles: You can specify language styles such as "informal French," "Shakespearean English," "Brazilian Portuguese," etc. (only available with
gpt4
translation). - Multi-Step Dubbing: You can change the dubbing mode to
translation-only
to get the translations as a dictionary, edit them, and then proceed with the dubbing process read more here. - Faster Than Realtime: Faster than realtime processing of dubs.
- Choose Voice Engines: Pick from various voice engines depending on your cost, quality, and speed requirements.
- Safe Words: Specify safe words you don't want to translate, such as names or places.
- Translation Dictionary: Customize translations by specifying mappings for specific words or phrases to control the output dub.
- Multi-language Inputs: Specify multiple target languages as a comma-separated list to get multiple language dubs simultaneously.
- Metadata Output: Option to output transcription, translation, and diarization metadata along with the dubbed video or audio.
- Lipsyncing: The app has an
enable_lipsyncing
option that will sync the lips of the source video to the dubbed audio. This parameter is useful for creating more natural-looking dubs. This feature works only on single-speaker videos and can perform unreliably on videos with multiple speakers. - Preserve Background Audio: You can choose whether or not to preserve the background audio. By default, the
preserve_background_audio
parameter is set toTrue
, however, for content such as educational videos, where the background audio is not as important, this parameter can be set toFalse
to ignore the background audio. - Segment Editing: Edit specific segments of videos by specifying translations of your choice, read more here.
Pricing
A minimum billing duration of 10 seconds applies - any audio shorter than this will be billed at the 10-second rate.
Dubbing (Price per Minute)
Voice Engine | Translation Engine | Standard Price | Transcript Provided | Translation Provided |
---|---|---|---|---|
elevenlabs | gpt4 | $0.535 | $0.487 | $0.350 |
elevenlabs | seamless | $0.518 | $0.470 | $0.333 |
openai | gpt4 | $0.403 | $0.325 | $0.253 |
openai | seamless | $0.386 | $0.308 | $0.236 |
Lipsync is billed separately and is priced at $0.5 per minute of audio on default settings on top of the dubbing price. Pricing information for lipsyncing can be found here.
Notes:
- The prices above are the cost for dubbing per minute of audio.
- The
standard price
column indicates the prices when no transcript is provided in theedit_segments
parameter. - The
transcript provided
column indicates the prices if the entire transcript for the audio is provided in theedit_segments
parameter. - The
translation provided
column indicates the prices if you provide translations for the segments in theedit_segments
parameter. - If you are editing a segment using the
edit_segments
parameter, you are only charged for the duration of the segment you are editing.
Translation Only (Price per Minute)
Translation Engine | Standard Price | Transcript Provided |
---|---|---|
gpt4 | $0.117 | $0.051 |
seamless | $0.100 | $0.034 |
Note:
- The standard price refers to the price when no transcript is provided in the
edit_segments
parameter. - The
transcript provided
column indicates the prices if the entire transcript for the audio is provided in theedit_segments
parameter, in such case we only translate the segments provided.
Using Your Own Keys
By default, you will be billed based on the length of content you submit. However, if you'd like, you can also enter third-party keys to OpenAI and ElevenLabs to be billed at your own rates for those services externally while being charged for the rest of the pipeline via Sieve.
Here's how much it would cost you to use your own keys:
Dubbing Cost (Price per Minute)
Voice Engine | Translation Engine | Standard Price | Transcript Provided | Translation Provided |
---|---|---|---|---|
elevenlabs | gpt4 | $0.378 | $0.300 | $0.238 |
elevenlabs | seamless | $0.361 | $0.280 | $0.221 |
openai | gpt4 | $0.377 | $0.300 | $0.238 |
openai | seamless | $0.361 | $0.280 | $0.221 |
Translation Only Cost (Price per Minute)
Translation Engine | Standard Price | Transcript Provided |
---|---|---|
gpt4 | $0.109 | $0.042 |
seamless | $0.100 | $0.034 |
To use your own keys, enter your API keys in the Secrets section:
- OpenAI Secret:
OPENAI_API_KEY
- ElevenLabs Secret:
ELEVEN_LABS_API_KEY
Translation Dictionary Usage
If you wish to translate certain words to specific words, you can pass a JSON with the required word mappings rather than letting the backend decide for itself.
For Example:
{
"China": "Africa",
"New York": "NYU"
}
Note: You can either specify the desired translated word in the target language or in the source language, and it will be automatically translated for you.
You can also use the translation dictionary to edit your speech by replacing certain words or phrases with desired words or phrases.
For Example:
{
"Oh my God!": "Oh Lord!",
"It's bad news": "It's great news"
}
Output Modes
The API supports two output modes controlled by the output_mode
parameter:
translation-only
: Returns just the translated text without generating audio. This is useful for previewing or validating translations before dubbing.voice-dubbing
: Returns the fully dubbed video/audio with translations spoken by the selected voice engine.
Edit Segments
The edit_segments
parameter allows you to dub or edit specific portions of the media selectively. This is useful for:
- Fixing translations in an already dubbed video
- Only dubbing certain segments while leaving others untouched
- Adding custom translations for specific segments
The parameter accepts a list of segment objects with the following structure:
[
{
"start": 0,
"end": 10,
"translation": "Hello, how are you?"
}
]
Notes:
start
: The start time of the segment in seconds. (required)end
: The end time of the segment in seconds. (required)translation
: The translation for the segment. (optional)text
: The text for the segment. (optional)speaker
: The speaker for the segment. (optional)- The
start
andend
times should specify which segments you'd like to edit. If the translation is too short or too long relative to the duration of the segment, the audio quality may be degraded.
Supported Languages
Sieve's AI Dubbing supports various languages depending on the voice_engine
you use. If you use elevenlabs
for TTS, you can only dub into the 29 following languages. If you use openai
for TTS, you can dub into 100+ languages -- though voice cloning is not supported for openai
.
- 🇺🇸 English
- 🇮🇳 Hindi
- 🇵🇹 Portuguese
- 🇨🇳 Mandarin (Chinese)
- 🇪🇸 Spanish
- 🇫🇷 French
- 🇩🇪 German
- 🇯🇵 Japanese
- 🇦🇪 Arabic
- 🇷🇺 Russian
- 🇰🇷 Korean
- 🇮🇩 Indonesian
- 🇮🇹 Italian
- 🇳🇱 Dutch
- 🇹🇷 Turkish
- 🇵🇱 Polish
- 🇸🇪 Swedish
- 🇵🇭 Tagalog (Filipino)
- 🇲🇾 Malay
- 🇷🇴 Romanian
- 🇺🇦 Ukrainian
- 🇬🇷 Greek
- 🇨🇿 Czech
- 🇩🇰 Danish
- 🇫🇮 Finnish
- 🇧🇬 Bulgarian
- 🇭🇷 Croatian
- 🇸🇰 Slovak
- 🇮🇳 Tamil