Building an AI dubbing app for YouTube videos

YouTube recently introduced a feature in YouTube Studio that allows creators to generate audio tracks for their videos in different languages completely automatically with the help of AI. In this blog we’ll build a high-quality, customizable version of the same concept using Sieve.

Automatically downloading YouTube videos

Disclaimer: Bulk-downloading videos for most content libraries is against TOS. This is for demo purposes only.

There are many open-source tools available for downloading YouTube videos, such as yt-dlp. Below, we’ll use Sieve’s implementation which means we don’t need to worry about constantly breaking dependencies, download speeds, etc. Before you get started, create a Sieve account and install the Python package.

import sieve

youtube_downloader = sieve.function.get("sieve/youtube_to_mp4")
video_url = "https://www.youtube.com/watch?v=3OmfTIf-SOU"
downloaded_video = youtube_downloader.run(video_url)

# print out the video path
print(downloaded_video.path)

Dubbing a YouTube video programmatically

Sieve has an out-of-the-box pipeline to dub video and audio files which can be access via REST API or Python SDK. Unlike consumer-focused solutions, sieve/dubbing comes with the features developers care about.

Comprehensive Feature Set: Offers advanced features like speaker style preservation, multi-speaker support, safe words, and translation dictionaries, allowing developers fine-grained control over dubbing.
Customizable and Scalable: Supports multi-language inputs, multiple voice engines, and flexible output modes, enabling seamless adaptation to diverse use cases.
Cost and Time Efficiency: Faster-than-realtime processing, transparent pricing, and the ability to use third-party API keys reduce costs and accelerate workflows.
Developer-Friendly Tools: API parameters like edit_segments, preserve_background_audio, and multi-step dubbing provide precision and flexibility for tailored outputs.
Extensive Language Support: Covers 100+ languages with OpenAI and 29 high-quality languages with ElevenLabs, ensuring wide applicability across global audiences.

We will write a simple implementation below. To explore the full feature set or integrate via REST API instead of Python, check out the guide here.

dubber = sieve.function.get("sieve/dubbing")
dubbed_video = dubber.run(downloaded_video, target_language="spanish")
print(dubbed_video.path)

Below is the output of the generated dub. Pretty cool!

Conclusion

AI dubbing has made it possible to localize video content at a fraction of the traditional cost and time. With just a few lines of code, developers can now programmatically translate and dub videos into multiple languages while preserving speaker style and audio quality. As video content continues to dominate and platforms like YouTube become more global, AI dubbing will play a crucial role in making this content accessible to global audiences.