How Scenery approaches human-centric AI video understanding with Sieve

This partnership was also announced on Scenery's product launch blog. You can read it here.

Since first launching in 2022, Scenery has become a popular cloud video editing and collaboration platform. Backed by Greylock, they are trusted by Hubspot, Ecamm, and universities like UCSB to help teams work smarter together on video.

Scenery recently partnered with Sieve to build various human-centric video understanding workloads to launch features like AI Shorts. This collaboration enables Scenery to implement AI systems that perform tasks like active speaker detection, face tracking, scene detection, and more - all of which are then used to automate various editing tasks.

The launch

Scenery's new AI Shorts feature automates the process of editing social clips by integrating directly with the content pipeline post-recording or broadcast. To enable quick and accurate processing of these clips, Scenery recently tapped Sieve as its AI partner.

Scenery's AI Shorts Screenshot

“Special thanks to our partners at Sieve, whose video AI infrastructure played a key part in strengthening the power and speed of this update to Scenery.”
Mike Folgner, CEO and Co-founder, Scenery

Human-centric video AI analysis with Sieve

Before landing on Sieve, Scenery assessed various open source models and third party vendors for active speaker detection, face detection, scene detection and other AI-driven outcomes. With Sieve, Scenery is able to meet these needs in a way that allows for rapid testing, iteration, deployment and scale in a cost effective manner.

Scenery's AI Shorts Screenshot

Sieve’s platform helped Scenery create a proof of concept quickly and gave them a solution that was faster, higher quality, and more cost-effective than running similar pipelines using off-the-shelf open-source models.

This was because Sieve’s platform combined multiple models into pipelines that performed each of these tasks more effectively. Specifically, Scenery saw over 90% speedup in processing times compared to a self-hosted approach using active speaker detection and face tracking pipelines on Sieve.

Sieve’s purpose-built infrastructure for video AI meant burst capacity allowing for parallel video processing and native GPU sharing that made it more efficient to run models like TalkNet. You can read this blog to better understand the technical details of these optimizations.

What's next?

A key benefit of the Sieve experience is that it isn’t a single solution to a single problem; it’s an AI toolkit for developers and product teams solving all sorts of problems around video, from flexible deployment infrastructure to readily-available building blocks. To this end, Scenery is continuing to partner with Sieve on new AI use cases using Sieve-optimized models like SAM 2 along with a few custom pipelines! The future of collaborative video creation is exciting!