Eye Contact 1.0: eye gaze redirection for developers
We discuss a new gaze redirection pipeline designed to make the eyes in talking head videos look directly at the camera.
/blog-assets/authors/gaurang.jpeg
by Gaurang Bharti
Cover Image for Eye Contact 1.0: eye gaze redirection for developers
If you just want to try it out, you can do so here.

We’ve just released the first generally available API for eye contact correction. Unlike other solutions that are either low-quality or require desktop SDKs, our pipeline is easy to integrate with just one API call and runs incredibly fast.

This kind of capability is especially useful for applications like screen recording, video editing, broadcasting, and more.

How gaze redirection works

Gaze redirection operates by isolating a region around the eyes, known as the eye patch, and using face tracking to identify 2D facial landmarks and the head's position in 3D space. Once the head pose is determined, the system normalizes the face and crops the eye patch from the video frame. This patch is fed into a gaze redirection network that estimates the current gaze angle and modifies the eye image to simulate direct eye contact. The modified eye patch is then seamlessly blended back into the original frame using inverse transformations.

This is very similar to the process we follow in SieveSync, our flagship zero-shot lipsync pipeline.

To maintain natural behavior, gaze redirection is adjusted as the user's gaze moves farther from center. The redirection effect diminishes when head rotations go beyond a certain limit, where continuing redirection would appear unnatural. The system also accounts for blinks and occlusions, like when an object briefly covers the eyes, by temporarily turning off the redirection effect when the confidence of eye tracking is low. The entire process is designed for real-time application with minimal latency, providing smooth and natural gaze adjustments in video feeds.

Much of this work took heavy inspiration from a combination of NVIDIA Broadcast and LivePortrait’s ability to control specific regions of a face.

Conclusion

Developers — you can try it on you own videos here via our playground or find an API guide here to integrate into your own applications. We’re excited by the use cases this API enables for companies working with talking head videos. If you have any questions, feel free to email us at contact@sievedata.com or join our Discord!