Blog
Our latest product updates and thoughts on state-of-the-art AI capabilities.
Cover Image for Building a robust ball tracking system for sports with SAM 2
Building a robust ball tracking system for sports with SAM 2
/blog-assets/authors/dikshant.jpeg
by Dikshant Shah • 15 min read
A comprehensive guide to implementing robust ball tracking in sports videos using SAM 2, with practical solutions for handling scene changes, false positives, and dynamic camera movements.
Cover Image for Introducing the Sieve Moderation Suite
Introducing the Sieve Moderation Suite
/blog-assets/authors/ahmed.jpeg
by Ahmed Hanzala • 4 min read
We discuss a new suite of moderation pipelines available on Sieve designed for ease of use, customization, and cost-effectiveness.
Cover Image for Building the Fastest YouTube Video Summarizer in the World
Building the Fastest YouTube Video Summarizer in the World
/blog-assets/authors/akshara.jpeg
by Akshara Soman • 5 min read
We discuss various approaches to building a high-performance YouTube video summarizer. Some take visual elements into account, while others focus on audio.
Cover Image for Transforming YouTube Videos into NotebookLM-like Conversational Avatars
Transforming YouTube Videos into NotebookLM-like Conversational Avatars
/blog-assets/authors/akshara.jpeg
by Akshara Soman • 5 min read
Learn how we built a pipeline to turn YouTube videos into conversational podcasts using various functions readily available on Sieve.
Cover Image for Enabling human-in-the-loop video and audio dubbing systems on Sieve
Enabling human-in-the-loop video and audio dubbing systems on Sieve
/blog-assets/authors/ahmed.jpeg
by Ahmed Hanzala • 2 min read
We introduce new features of the Sieve dubbing pipeline that enable human-in-the-loop experiences to be built on top.
Cover Image for Building an AI dubbing app for YouTube videos
Building an AI dubbing app for YouTube videos
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 2 min read
We walk through building a simple app to download and dub YouTube videos.
Cover Image for Bringing world-class audio enhancement to developers with ai|coustics
Bringing world-class audio enhancement to developers with ai|coustics
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 2 min read
We discuss the partnership between Sieve and ai|coustics to bring world-class audio enhancement to developers.
Cover Image for Introducing Vanish: the best way to remove backgrounds from video
Introducing Vanish: the best way to remove backgrounds from video
/blog-assets/authors/jacob.jpg
by Jacob Marshall • 2 min read
We discuss a new pipeline for removing backgrounds from video that offers high-quality outputs on complex scenes as well as a fast option for simpler videos.
Cover Image for Introducing Portrait Avatars: generate talking head videos from images and audio
We discuss a new pipeline for generating talking head videos from images and audio as well as partnerships with Hedra Labs and Infinity AI.
Cover Image for A developer’s guide to background noise removal in video and audio
A developer’s guide to background noise removal in video and audio
/blog-assets/authors/ahmed.jpeg
by Ahmed Hanzala • 3 min read
A practical guide to removing background noise from videos using traditional signal processing, advanced AI models for noise suppression, and intelligent source separation methods.
Cover Image for Speaker Recognition Guide: How to Detect Speakers in Video and Audio
Speaker Recognition Guide: How to Detect Speakers in Video and Audio
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 2 min read
A guide to implementing speaker recognition in video and audio using diarization and active speaker detection techniques.
Cover Image for Dubbing 2.0: The highest quality AI dubbing solution for developers
Dubbing 2.0: The highest quality AI dubbing solution for developers
/blog-assets/authors/ahmed.jpeg
by Ahmed Hanzala • 5 min read
We discuss the latest updates to Sieve's dubbing pipeline and how it offers the best speech quality, translation controls, and pricing for developers.
Cover Image for Kaiber partners with Sieve to launch Superstudio
Kaiber partners with Sieve to launch Superstudio
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 3 min read
We discuss Kaiber's launch of Superstudio and how they use Sieve's infrastructure to power their AI video workloads.
Cover Image for Eye Contact 1.0: eye gaze redirection for developers
Eye Contact 1.0: eye gaze redirection for developers
/blog-assets/authors/gaurang.jpeg
by Gaurang Bharti • 2 min read
We discuss a new gaze redirection pipeline designed to make the eyes in talking head videos look directly at the camera.
Cover Image for Kapwing partners with Sieve to launch AI Personas
Kapwing partners with Sieve to launch AI Personas
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 3 min read
How Sieve powers Kapwing's new AI avatar tool - enabling creators to generate automatic talking head videos in a few clicks.
Cover Image for SieveSync: realistic, zero-shot lipsync for developers
SieveSync: realistic, zero-shot lipsync for developers
/blog-assets/authors/gaurang.jpeg
by Gaurang Bharti • 5 min read
We discuss a new zero-shot lipsync pipeline built with MuseTalk, LivePortrait, and CodeFormer designed to preserve more realism than existing solutions.
Cover Image for How Scenery approaches human-centric AI video understanding with Sieve
How Scenery approaches human-centric AI video understanding with Sieve
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 2 min read
We discuss how Scenery uses Sieve to run human-centric video understanding workloads that power features like AI Shorts.
Cover Image for VEED partners with Sieve to launch VEED Clips
VEED partners with Sieve to launch VEED Clips
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 3 min read
We discuss a partnership between VEED and Sieve to launch VEED Clips, a new AI-powered video clipping tool.
Cover Image for Exploring ways to text prompt SAM 2
Exploring ways to text prompt SAM 2
/blog-assets/authors/lachlan.jpeg
by Lachlan Gray • 6 min read
SAM 2 can't natively take in text prompts. We discuss various ways to build pipelines around SAM 2 to accomplish text-prompted segmentation.
Cover Image for The fastest way to run Meta's SAM 2 (Segment Anything Model 2)
The fastest way to run Meta's SAM 2 (Segment Anything Model 2)
/blog-assets/authors/jacob.jpg
by Jacob Marshall • 6 min read
Learn about Meta's SAM 2 (Segment Anything Model 2) and how Sieve's optimized implementation runs 2x faster. Explore use cases, benchmarks, and how to use SAM 2.
Cover Image for MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting
In this blog, we dive into MuseTalk, a state-of-the-art zero-shot lipsyncing model. We cover how it works, its pros and cons, and how to run it on Sieve.
Cover Image for Dubbing an entire Khan Academy course in 10 minutes
Dubbing an entire Khan Academy course in 10 minutes
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 5 min read
We walk through using the Sieve API to download and dub an entire Khan Academy course in under 10 minutes.
Cover Image for Introducing Sieve Dubbing 1.0: AI Dubbing for Developers
Introducing Sieve Dubbing 1.0: AI Dubbing for Developers
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 4 min read
We discuss the launch of Sieve’s Dubbing API, the first AI dubbing solution purpose-built for developers.
Cover Image for Introducing Autocrop 1.0: Format videos into different aspect ratios with AI editing
We discuss the launch of Autocrop 1.0, a new API that allows you to format videos into different aspect ratios with AI editing.
Cover Image for Zight and Sieve: Using AI to build better video communication
Zight and Sieve: Using AI to build better video communication
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 4 min read
We discuss the importance of AI in video communication and why Zight chose Sieve to power their new AI features.
Cover Image for Finding highlights in long-form video content automatically
Finding highlights in long-form video content automatically
/blog-assets/authors/gaurang.jpeg
by Gaurang Bharti • 4 min read
We do a deep dive into building an intricate algorithm on top of LLMs to accurately identify and extract highlights from long-form video content.
Cover Image for How developers are changing video creation once again with AI
How developers are changing video creation once again with AI
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 5 min read
We discuss the first time computers drastically changed video creation and how it’s changing once again because of new AI models.
Cover Image for Introducing Describe: Incredibly descriptive audiovisual summaries for videos
We discuss the launch of Describe along with the challenges and approaches to generating audiovisual descriptions of videos.
Cover Image for Adding Sound Effects to Stock Videos with AI
Adding Sound Effects to Stock Videos with AI
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 2 min read
In this post, we build an app that adds sound effects to stock videos using vision language models and audio generation models.
Cover Image for Introducing GPU sharing on Sieve
Introducing GPU sharing on Sieve
/blog-assets/authors/gaurav.jpg
by Gaurav Rao • 4 min read
In this post, we discuss support for GPU sharing on Sieve and how it enables faster, more cost-effective AI models.
Cover Image for Fast, efficient active speaker detection on videos
Fast, efficient active speaker detection on videos
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 5 min read
In this post, we discuss active speaker detection as a deep learning task and how we built a solution that performs ~90% faster than other solutions.
Cover Image for Announcing the most cost-effective audio transcription API
Announcing the most cost-effective audio transcription API
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 5 min read
In this post, we discuss the commoditization of audio transcription and a new Sieve offering around it that is 5x cheaper than other providers while still maintaining speed and accuracy.
Cover Image for Improving on open-source for fast, high-quality AI lipsyncing
Improving on open-source for fast, high-quality AI lipsyncing
/blog-assets/authors/abhi.jpg
by Abhinav Ayalur • 3 min read
We discuss current lip syncing solutions such as OpenRetalker’s Video Retalking and SieveSync to get a performant, production-ready lipsyncing solution.
Cover Image for State of the art audio enhancement in 5 minutes
State of the art audio enhancement in 5 minutes
/blog-assets/authors/abhiu.jpg
by Abhi Upadhyay • 4 min read
Learn how to leverage an AI audio enhancement app with open-source for your vlogs and other media, rivaling the best APIs in the market. Try it for yourself!
Cover Image for Automatically generating video chapter titles with AI
Automatically generating video chapter titles with AI
/blog-assets/authors/abhiu.jpg
by Abhi Upadhyay • 4 min read
In this blog post, we go through the process of generating video chapter titles with OpenAI's Whisper + GPT-3 models and an open-source text segmentation technique!
Cover Image for Building realistic video AI avatars in an hour from scratch
Building realistic video AI avatars in an hour from scratch
/blog-assets/authors/akshara.jpeg
by Akshara Soman • 4 min read
Learn about the specialized pipelines in the Sieve toolkit for creating realistic AI avatars, including Portrait Avatar, LivePortrait, and Lipsync. This blog provides a detailed discussion of strengths, limitations, and use cases.
Cover Image for Sieve's Video AI API Beta and ~$4M Raise
Sieve's Video AI API Beta and ~$4M Raise
/blog-assets/authors/mokshith.jpg
by Mokshith Voodarla • 2 min read
The explosion of rich data, the Sieve public beta, our ~$4M seed round, and how we enable developers to build amazing experiences with video + AI.