How to Build an AI-Powered Dubbing Studio for Your Media Company

AI-powered dubbing is transforming content localization by delivering faster completion times and cost-efficient solutions. By combining advanced tools with a human-in-the-loop approach, media companies can create a customized dubbing studio optimized for their unique production needs. This blog demonstrates how to leverage Sieve's dubbing pipeline and its human-in-loop features to create a streamlined dubbing editor.

Key Human-in-the-Loop Features for Dubbing

1. Multiple Output Modes

The translation-only output mode provides translated text without generating audio. This feature is ideal for reviewing or editing translations before finalizing the dubbing. Use the output_mode parameter to switch between:

translation-only: Text translations only
voice-dubbing: Fully dubbed output with spoken translations

2. Editable Segments with `edit_segments`

The edit_segments parameter enables selective editing of specific media portions. This is particularly useful for:

Fixing translations in pre-dubbed videos
Dubbing specific segments while keeping others intact
Adding custom translations for selected segments

The parameter accepts a list of segment objects with the following structure:

[
  {
    "start": 0,
    "end": 10,
    "translation": "Hello, how are you?"
  }
]

Additional Features for Media Companies

Speaker Style Preservation: Maintain natural voice quality and tone during dubbing using Eleven Labs voice cloning TTS engines for an authentic experience.
Multi-Speaker Support: Handle videos with multiple speakers by assigning distinct voices to each speaker—perfect for movies or interviews.
Scalable Translations: Translate into 29 languages simultaneously, making it easier to localize content for global audiences.
Background Audio Retention: Preserve the original background audio in dubbed content for seamless, natural-sounding results. Background scores are essential for conveying emotions in movies.
Safe Words: Specify words you don't want to translate, such as names, places, or specific terms, ensuring consistency across all outputs.

Multi-Step Dubbing Workflow

Using the features outlined above, we can create a professional-quality dubbing studio with the Sieve Dubbing pipeline in a two-step process:

1. Translation Preview

Use translation-only mode to generate and review translations. Make necessary edits to ensure accuracy.

import sieve

source_file = sieve.File(url="https://storage.googleapis.com/sieve-prod-us-central1-public-file-upload-bucket/99d82ab9-7214-47b3-98f3-05367c2180dc/5ead25ea-a1b5-42be-909e-d4ca179fe9ed-input-source_file.mp4")
target_language = "hindi"
translation_engine = "gpt4"
voice_engine = "elevenlabs (voice cloning)"
transcription_engine = "whisper-zero"

output_mode = "translation-only"
safewords = "Missy, Sheldon, Mary"

dubbing = sieve.function.get("sieve/dubbing")
output = dubbing.run(source_file, target_language, translation_engine,
                     voice_engine, transcription_engine, output_mode,
                     safewords=safewords)

2. Final Dubbing

Feed the edited translations back using the edit_segments parameter and select voice-dubbing as the output mode. Ensure you provide translations for the entire duration of the media, not just the edited portions.

source_file = sieve.File(url="https://sieve-prod-us-central1-persistent-bucket.storage.googleapis.com/0a27f1ed-b241-4a1e-8b3c-e8aff3b8379c/cae53169-5ec9-4ed9-88c1-fc5e3859dccf/f770f4a8-e86b-46f0-ac15-fb1b4d1dc9bb/tmpx6upc6mh.mp4")
target_language = "hindi"

output_mode = "voice-dubbing"
edit_segments = output # specify in same format shown in previous section
preserve_background_audio = True
safewords = "Missy, Sheldon, Mary"

enable_lipsyncing = True
lipsync_backend = "sievesync"

dubbing = sieve.function.get("sieve/dubbing")
output = dubbing.run(source_file, target_language, output_mode=output_mode,
                    edit_segments=edit_segments,
                    preserve_background_audio=preserve_background_audio,
                    safewords=safewords,
                    enable_lipsyncing=enable_lipsyncing,
                    lipsync_backend=lipsync_backend)

for output_object in output:
    print(output_object)

Real-World Applications

Here are examples of web-series and movie clips dubbed using the above pipeline. The original background score was preserved in the dubbed videos, and character names were specified as safe words to prevent translation.

Hindi-dubbed version of a clip from the web series Young Sheldon.

Mandarin-dubbed version of a clip from the movie The Intern.

Conclusion

Building an AI-powered dubbing studio with Sieve's dubbing pipeline enables media companies to streamline localization workflows. By leveraging features like human-in-the-loop translation, multi-speaker support, and background audio integration, you can efficiently produce high-quality dubbed content.

For personalized support or to book a demo, email us at contact@sievedata.com.