BlogOur latest product updates and thoughts on state-of-the-art AI capabilities.
A comprehensive guide to implementing robust ball tracking in sports videos using SAM 2, with practical solutions for handling scene changes, false positives, and dynamic camera movements.
We discuss a new suite of moderation pipelines available on Sieve designed for ease of use, customization, and cost-effectiveness.
We discuss various approaches to building a high-performance YouTube video summarizer. Some take visual elements into account, while others focus on audio.
Transforming YouTube Videos into NotebookLM-like Conversational Avatars
by Akshara Soman • 5 min read
We introduce new features of the Sieve dubbing pipeline that enable human-in-the-loop experiences to be built on top.
We walk through building a simple app to download and dub YouTube videos.
Bringing world-class audio enhancement to developers with ai|coustics
by Mokshith Voodarla • 2 min read
We discuss a new pipeline for removing backgrounds from video that offers high-quality outputs on complex scenes as well as a fast option for simpler videos.
Introducing Portrait Avatars: generate talking head videos from images and audio
by Gaurang Bharti • 2 min read
A practical guide to removing background noise from videos using traditional signal processing, advanced AI models for noise suppression, and intelligent source separation methods.
Speaker Recognition Guide: How to Detect Speakers in Video and Audio
by Mokshith Voodarla • 2 min read
We discuss the latest updates to Sieve's dubbing pipeline and how it offers the best speech quality, translation controls, and pricing for developers.
We discuss Kaiber's launch of Superstudio and how they use Sieve's infrastructure to power their AI video workloads.
We discuss a new gaze redirection pipeline designed to make the eyes in talking head videos look directly at the camera.
How Sieve powers Kapwing's new AI avatar tool - enabling creators to generate automatic talking head videos in a few clicks.
We discuss a new zero-shot lipsync pipeline built with MuseTalk, LivePortrait, and CodeFormer designed to preserve more realism than existing solutions.
How Scenery approaches human-centric AI video understanding with Sieve
by Mokshith Voodarla • 2 min read
We discuss a partnership between VEED and Sieve to launch VEED Clips, a new AI-powered video clipping tool.
SAM 2 can't natively take in text prompts. We discuss various ways to build pipelines around SAM 2 to accomplish text-prompted segmentation.
Learn about Meta's SAM 2 (Segment Anything Model 2) and how Sieve's optimized implementation runs 2x faster. Explore use cases, benchmarks, and how to use SAM 2.
MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting
by Gaurang Bharti • 4 min read
We walk through using the Sieve API to download and dub an entire Khan Academy course in under 10 minutes.
We discuss the launch of Sieve’s Dubbing API, the first AI dubbing solution purpose-built for developers.
Introducing Autocrop 1.0: Format videos into different aspect ratios with AI editing
by Mokshith Voodarla • 3 min read
We discuss the importance of AI in video communication and why Zight chose Sieve to power their new AI features.
We do a deep dive into building an intricate algorithm on top of LLMs to accurately identify and extract highlights from long-form video content.
We discuss the first time computers drastically changed video creation and how it’s changing once again because of new AI models.
Introducing Describe: Incredibly descriptive audiovisual summaries for videos
by Gaurang Bharti • 5 min read
In this post, we build an app that adds sound effects to stock videos using vision language models and audio generation models.
In this post, we discuss support for GPU sharing on Sieve and how it enables faster, more cost-effective AI models.
In this post, we discuss active speaker detection as a deep learning task and how we built a solution that performs ~90% faster than other solutions.
In this post, we discuss the commoditization of audio transcription and a new Sieve offering around it that is 5x cheaper than other providers while still maintaining speed and accuracy.
We discuss current lip syncing solutions such as OpenRetalker’s Video Retalking and SieveSync to get a performant, production-ready lipsyncing solution.
Learn how to leverage an AI audio enhancement app with open-source for your vlogs and other media, rivaling the best APIs in the market. Try it for yourself!
In this blog post, we go through the process of generating video chapter titles with OpenAI's Whisper + GPT-3 models and an open-source text segmentation technique!
Learn about the specialized pipelines in the Sieve toolkit for creating realistic AI avatars, including Portrait Avatar, LivePortrait, and Lipsync. This blog provides a detailed discussion of strengths, limitations, and use cases.
The explosion of rich data, the Sieve public beta, our ~$4M seed round, and how we enable developers to build amazing experiences with video + AI.