Enabling human-in-the-loop video and audio dubbing systems on Sieve
We introduce new features of the Sieve dubbing pipeline that enable human-in-the-loop experiences to be built on top.
/blog-assets/authors/ahmed.jpeg
by Ahmed Hanzala
Cover Image for Enabling human-in-the-loop video and audio dubbing systems on Sieve

Over the past few months, Sieve has seen incredible growth of our developer-focused AI dubbing offering. But a constant request has been around features that enable human-in-the-loop experiences to be built on top.

While completely automated dubbing is great, languages have enough nuance across culture, tone, and context that it’s always helpful to allow humans to edit content as needed. These features now enable product teams looking to integrate more controllable dubbing into their products or software companies at media companies looking to build highly-custom internal tools that enable their human workforce to become more efficient.

Dubbing Editing Features

New Output Modes

The API supports two output modes controlled by the output_mode parameter:

  • translation-only: Returns just the translated text without generating audio. Useful for previewing or validating translations before dubbing.
  • voice-dubbing: Returns the fully dubbed video/audio with translations spoken by the selected voice engine.

Using custom transcripts and translations

The edit_segments parameter allows you to selectively dub or edit specific portions of the media. This is useful for:

  • Fixing translations in an already dubbed video
  • Only dubbing certain segments while leaving others untouched
  • Adding custom translations for specific segments

The parameter accepts a list of segment objects with the following structure:

[
  {
    "start": 0,
    "end": 10,
    "translation": "Hello, how are you?"
  }
]

Note:

  • start: The start time of the segment in seconds. (required)
  • end: The end time of the segment in seconds. (required)
  • translation: The translation for the segment. (optional)
  • text: The text for the segment. (optional)
  • speaker: The speaker for the segment. (optional)
  • The start and end times should be precise of which segments you'd like to edit. If the translation is too short or too long relative to the duration of the segment, the audio quality may be degraded.

Conclusion

Sieve now gives developers the best of both worlds in a single place: the highest quality, automated dubbing pipeline and the most features to build human-in-the-loop augmentations. We’re excited to see what this enables! You can get started with the pipeline here, or book a demo with our team to learn more.