Building an Automated Background and Caption Effects Pipeline
Learn how to build a production-ready pipeline that programmatically enhances videos with AI-powered background replacement and dynamic captions.
/blog-assets/authors/dikshant.jpeg
by Dikshant Shah
Cover Image for Building an Automated Background and Caption Effects Pipeline

While reels are crucial for social media presence, manually editing them is time-consuming and repetitive. This tutorial walks through building an automated video processing pipeline that leverages AI to handle two key aspects:

  1. Programmatic background replacement using computer vision
  2. Dynamic caption generation and rendering with precise timing control

Technical Benefits of Automation

  1. Eliminates repetitive video editing tasks through code
  2. Ensures consistent visual quality across all outputs
  3. Provides precise control over caption timing and styling
  4. Enables batch processing of multiple videos
  5. Creates a reusable foundation for future video processing needs

Here are a few previews of the pipeline in action:

Running the pipeline

The complete code for this pipeline is available in a GitHub repository. To get started running it locally or deploying the pipeline as an API on Sieve, you can follow the steps in the README.

Building the pipeline from scratch

Let's break down the implementation of each major component.

Step 1: AI-Powered Background Enhancement

To replace a reel's background with an abstract video or image, it's crucial to ensure the background matches the reel's aspect ratio. In this example, we're working with reels of 9:16 aspect ratio, so the abstract video or image must be cropped accordingly for a seamless fit. If you need to edit reels that aren't already in 9:16 aspect ratio, you can use the Sieve function sieve/autocrop to automatically crop them while maintaining focus on faces.

# Crops background image into 9:16 resolution
def crop_image_vertical(image_path):
    image = cv2.imread(image_path)
    height, width = image.shape[:2]
    if width / height == 9 / 16:
        return image_path

    new_width = int(height * 9 / 16)
    start_x = (width - new_width) // 2
    start_y = 0
    cropped_image = image[start_y:start_y + height, start_x:start_x + new_width]
    output_path = "background.png"
    cv2.imwrite(output_path, cropped_image)
    return output_path

# Crops background video into 9:16 resolution
def crop_video_vertical(file_path):
    ffprobe_cmd = [
        "ffprobe",
        "-v", "error",
        "-select_streams", "v:0",
        "-show_entries", "stream=width,height",
        "-of", "csv=p=0",
        file_path
    ]
    result = subprocess.run(ffprobe_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
    if result.returncode != 0:
        raise ValueError(f"Error getting video info: {result.stderr}")

    width, height = map(int, result.stdout.strip().split(","))
    if abs(width / height - 9 / 16) < 0.01:
        return file_path

    new_width = min(width, height * 9 // 16)
    new_height = min(height, width * 16 // 9)

    x_offset = (width - new_width) // 2
    y_offset = (height - new_height) // 2
    output_file = "background.mp4"
    ffmpeg_cmd = [
        "ffmpeg",
        "-y",
        "-i", file_path,
        "-vf", f"crop={new_width}:{new_height}:{x_offset}:{y_offset}",
        "-c:v", "libx264",
        "-c:a", "aac",
        "-strict", "experimental",
        output_file
    ]
    result = subprocess.run(ffmpeg_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
    if result.returncode != 0:
        raise ValueError(f"Error processing video: {result.stderr}")
    return output_file


background_media_path = 'path_to_your_video_or_image_background'
background_media_type = 'Image' if (mime := mimetypes.guess_type(background_media_path)[0]) and mime.startswith('image') else ('Video' if mime and mime.startswith('video') else 'Unknown')

if background_media_type == 'Image':
    background_media_file = crop_image_vertical(background_media_path) # Convert wide images to a 9:16 background image
elif background_media_type == 'Video':
    background_media_file = crop_video_vertical(background_media_path) # Convert wide videos into to a 9:16 background video

Now after we've cropped the provided background video or image to a 9:16 aspect ratio, we'll use the sieve/background-removal function to change the background of our reel, based on provided background video or image.

file = sieve.File(path="path_to_your_short_form_content")

background_removal = sieve.function.get("sieve/background-removal")
background_removal_output = background_removal.push(
    file,
    backend="parallax",
    background_media=sieve.File(path = background_media_file),
    output_type="masked_frame",
    video_output_format="mp4",
    vanish_allow_scene_splitting=True
)

print('Changing up the background...')

background_removal_output_object = next(background_removal_output.result()) # Video with backgo

Step 2: AI-Powered Transcript Generation

Captions can significantly enhance a reel's engagement. We'll generate them using the transcript extracted from the reel using the Sieve function sieve/transcribe, which offers fast, high-quality speech transcription with word-level timestamps. These timestamps are then used to seamlessly overlay captions onto the video.

transcribe = sieve.function.get("sieve/transcribe")

transcription_output = transcribe.push(file = file, backend = "stable-ts-whisper-large-v3-turbo", word_level_timestamps = True, segmentation_backend = 'none')
print('Transcription starting...')

transcript = next(transcription_output.result())
print('Transcription completed')

Step 3: Prepare the Transcript

We'll use the React-based framework Remotion to overlay captions on video reels. Remotion enables programmatic video creation with React components, animations, and effects, offering features like interpolation, keyframes, and customizable rendering for a seamless code-driven workflow.

Since Remotion renders videos frame by frame, word-level timestamps (in seconds) must be converted into frame units to ensure accurate caption synchronization. A caption is displayed when the current frame being rendered falls within the start and end timestamp range of a word, with the displayed caption being the one that satisfies this condition.

The following code processes the transcript by creating a unified list of words, converting their timestamps into frame units, merging consecutive words into single caption object, and adjusting their timings . Merging consecutive words into a single caption object prevents rapid caption flashing that can occur when displaying captions word by word. By grouping consecutive words for display, the captions become smoother and more visually readable. The grouping is controlled by the following parameters:

  • max_subtitle_words: Max words per caption
  • max_subtitle_words_overlap: Max start time gap for grouping words
  • max_subtitle_characters: Max characters per caption for clarity
# Merges consecutive words together as a single caption object
def merge_consecutive_subtitles(subtitles, max_subtitle_words, max_subtitle_words_overlap, max_subtitle_characters):
    merged_data = []
    current_segment = subtitles[0]
    consecutive_count = 1

    for i in range(1, len(subtitles)):
        next_segment = subtitles[i]

        if (next_segment['start'] - current_segment['start'] <= max_subtitle_words_overlap and
            len(current_segment['word'] + " " + next_segment['word']) <= max_subtitle_characters and
            consecutive_count < max_subtitle_words and
            not current_segment['word'].strip().endswith('.')):
            # End timestamp of the last merged word is the new end timestamp of the new caption object
            current_segment['end'] = next_segment['end']
            current_segment['word'] += f" {next_segment['word']}"
            consecutive_count += 1
        else:
            merged_data.append(current_segment)
            current_segment = next_segment
            consecutive_count = 1

    merged_data.append(current_segment)
    return merged_data
# Prepares the transcript for Remotion captioning
def prepare_transcript(transcript, fps, max_subtitle_words, max_subtitle_words_overlap, max_subtitle_characters):
    subtitles = []
    for segment in transcript['segments']:
        for segment_word in segment['words']:
            subtitles.append(segment_word)

    subtitles = merge_consecutive_subtitles(subtitles, max_subtitle_words, max_subtitle_words_overlap, max_subtitle_characters)

    # convert subtitles timestamps into frames
    subtitles_in_frames = []
    for subtitle in subtitles:
        temp = {
            'start': int(subtitle['start'] * fps),
            'end': int(subtitle['end'] * fps),
            'word': subtitle['word'].strip()
        }
        subtitles_in_frames.append(temp)

    return subtitles_in_frames

video_file = background_removal_output_object.path
fps = int(get_fps(video_file))
duration = int(get_duration(video_file) * fps)

# Merges up to two consecutive words spoken within 0.6 seconds of each other, provided the combined length does not exceed 12 characters.
data_subtitles = prepare_transcript(transcript = transcript, fps = fps, max_subtitle_words = 2, max_subtitle_words_overlap = 0.6, max_subtitle_characters = 12)

Step 4: Setup Remotion

The code snippet below demonstrates how to set up Remotion. Ensure that Node.js and npm are installed on your system before proceeding.

npx create-video@latest

When the code prompts What would you like to name your video?, enter captions.

When prompted with Choose a template, select the Hello World template.

When asked Add TailwindCSS?, choose No.

After completing these steps, the following file structure will be created.

captions/
    public/
    src/
    .eslintrc
    .gitignore
    .prettierrc
    package.json
    README.md
    remotion.config.ts
    tsconfig.json

Now you need to install the dependencies. Follow the below code:

cd captions
npm i

Step 5: Create a Remotion Composition

In Remotion, each video is represented as a composition, consisting of various components. These compositions and components are typically organized within the src/ directory. For our project, the captioned reel—our desired final output—will be represented as a composition, with the captions and background video included as components.

To begin, we need to create a composition for the captioned reel. Start by cleaning up the src/ directory: delete all files and folders, leaving only the index.ts and Root.tsx files. This will create a clean structure for adding our custom composition. Next, create a new file named CaptionedVideo.tsx to represent the captioned reel, and add the following code inside it.

import { AbsoluteFill, OffthreadVideo, staticFile } from 'remotion';
import { Caption } from './Caption';

export type CaptionedVideoProps = {
  data_subtitles: any[];
  video_file: string;
  fps: number;
  durationInFrames: number;
};

export const CaptionedVideo = ({
  data_subtitles,
  video_file,
}: CaptionedVideoProps) => {
  return (
    <AbsoluteFill>
      <Caption data_subtitles={data_subtitles} />
      <OffthreadVideo src={staticFile(video_file)} />
    </AbsoluteFill>
  );
};

In the code above, OffthreadVideo is a component provided by the Remotion package, representing our background changed video as a component. Meanwhile, Caption is a custom component that handles the captions.

Next, create a new file named Caption.tsx inside the src/ directory and add the following code.

import { AbsoluteFill, interpolate, useCurrentFrame } from 'remotion';
import './fonts.css';

export type CaptionProps = {
  data_subtitles: Record<string, unknown>[];
};

export const Caption = ({ data_subtitles }: CaptionProps) => {
  const frame = useCurrentFrame();
  const typed_data_subtitles: any[] = data_subtitles;

  let subtitle = '';
  let startFrame = 0;
  let subtitleIndex = -1;

  for (let i = 0; i < typed_data_subtitles.length; i++) {
    const data_subtitle = typed_data_subtitles[i];
    if (frame >= data_subtitle['start'] && frame <= data_subtitle['end']) {
      subtitle = data_subtitle['word'];
      startFrame = data_subtitle['start'];
      subtitleIndex = i;
      break;
    }
  }

  const scalingFrequency = 4;
  const shouldScale = subtitleIndex % scalingFrequency === 0;

  const scale = shouldScale
    ? interpolate(frame, [startFrame, startFrame + 3], [0.5, 1], {
        extrapolateLeft: 'clamp',
        extrapolateRight: 'clamp',
      })
    : 1;

  const colors = [
    '#b6e243', // Conifer
    '#43d2e2', // Picton Blue
    '#00FF00', // Green
    '#FFFF00', // Yellow
    '#FFFFFF', // White
  ];
  const color = colors[subtitleIndex % colors.length];

  return (
    <AbsoluteFill style={{ zIndex: 1 }}>
      <div
        style={{
          justifyContent: 'center',
          alignItems: 'center',
          display: 'flex',
          flexDirection: 'column',
        }}
      >
        <span
          style={{
            position: 'fixed',
            paddingLeft: '10px',
            paddingRight: '10px',
            bottom: '150px',
            color: color,
            fontSize: '120px',
            fontFamily: 'Crimson Text',
            textShadow: `
                            4px 4px 4px black,
                            0 0 40px ${color}, 
                            0 0 50px ${color}, 
                            0 0 60px ${color}
                        `,
            transform: `scale(${scale})`, // Apply the conditional scale
          }}
        >
          <i>{subtitle}</i>
        </span>
      </div>
    </AbsoluteFill>
  );
};

The code above uses the data_subtitles prop, which is the processed transcript containing the captions to display along with their corresponding frame numbers. Based on this data, the code iterates through the prop to determine the appropriate subtitle string to render for each frame, effectively adding captions to the video.

Also create a fonts.css inside src/ to keep your relevant fonts, as shown below:

@import url('https://fonts.googleapis.com/css2?family=Crimson+Text:ital,wght@0,400;0,600;0,700;1,400;1,600;1,700&display=swap');

Step 6: Registering the Remotion Composition

Remotion enables you to render any composition within a Node.js runtime environment. To achieve this, you need to add the composition to the Root.tsx file, which is accessible to Remotion through Node.js.

import { Composition } from 'remotion';
import { CaptionedVideo } from './CaptionedVideo';

export const RemotionRoot: React.FC = () => {
  return (
    <>
      <Composition
        id="CaptionedVideo"
        component={CaptionedVideo}
        durationInFrames={2}
        fps={2}
        width={720}
        height={1280}
        defaultProps={{
          video_file: 'test.mp4',
          data_subtitles: [],
          fps: 2,
          durationInFrames: 2,
        }} // placeholder props gets overwritten by code below
        calculateMetadata={async ({ props }) => {
          return {
            fps: props.fps,
            durationInFrames: props.durationInFrames,
          };
        }}
      />
    </>
  );
};

To render the CaptionedVideo composition, all the necessary props must be passed to it.

Step 7: Create an API endpoint to Remotion

To transfer data from Python to the Remotion ecosystem, we use REST APIs. Specifically, we create a Node.js script (server.js) within the captions/ directory, which sets up an API to enable the Python code to send the required parameters to the composition and trigger the rendering process. The parameters to be sent are the props needed by the CaptionedVideo to render the captioned reel. These include the name of the background-replaced reel video, the previously generated transcript, the frames per second (FPS), and the duration of the reel in frames.

The server.js script accepts all these prop values via an API endpoint available at: http://localhost:4505/caption-video.

const { bundle } = require('@remotion/bundler');
const { renderMedia, selectComposition } = require('@remotion/renderer');
const path = require('path');
const express = require('express');
const cors = require('cors');
const bodyParser = require('body-parser');

const app = express();
app.use(bodyParser.json({ limit: '50mb' }));
app.use(cors());

app.post('/caption-video', async (req, res) => {
  try {
    const { video_file, data_subtitles, fps, durationInFrames } = req.body;

    if (!video_file || !data_subtitles || !fps || !durationInFrames) {
      return res
        .status(400)
        .send({ error: 'Missing required input properties' });
    }

    const bundleLocation = await bundle({
      entryPoint: path.resolve('./src/index.ts'),
      webpackOverride: (config) => config,
    });

    console.log(
      `Rendering a video at ${fps}fps of length ${durationInFrames}frames`
    );

    //Code to render the composition
    const compositionId = 'CaptionedVideo';
    const inputProps = { video_file, data_subtitles, fps, durationInFrames };

    const composition = await selectComposition({
      serveUrl: bundleLocation,
      id: compositionId,
      inputProps,
    });

    const outputLocation = `out/${video_file}`;
    await renderMedia({
      composition,
      serveUrl: bundleLocation,
      codec: 'h264',
      outputLocation,
      inputProps,
    });
    console.log(`Rendering done!`);

    res.send({ message: 'Render done!', outputLocation });
  } catch (error) {
    console.error('Error rendering video:', error);
    res
      .status(500)
      .send({ error: 'Failed to render video', details: error.message });
  }
});

const PORT = 4505;
app.listen(PORT, '0.0.0.0', () => {
  console.log(`Server is running on http://localhost:${PORT}`);
});

Since we are using the express and cors packages to create the API, we need to install them as dependencies and run the script.

npm install express
npm install cors
node server.js

Step 8: Rendering through the Remotion API

The API at **http://localhost:4505/caption-video**is now active and handles rendering a video based on the props passed to the Remotion composition. To use this API, send a POST request with a payload containing the necessary props for the CaptionedVideo composition. The composition accesses the background-replaced reel file by its name, which must be saved in the public/ directory for rendering. Therefore, ensure the reel file is moved to this directory. Additionally, verify that your Python script is located alongside the captions folder in the same directory.

import os
import shutil
import requests

payload = {
        "video_file" : os.path.basename(video_file),
        "fps" : fps,
        "durationInFrames" : duration,
        "data_subtitles" : data_subtitles,
    }

# Move the background changed video to remotion public folder for captioning
remotion_folder = 'captions'
remotion_public_folder = f"{remotion_folder}/public/"
os.makedirs(remotion_public_folder, exist_ok=True)
shutil.move(background_removal_output_object.path, remotion_public_folder)

# Make post request to remotion server
headers = {
    "Content-Type": "application/json"
}
url = "http://localhost:4505/caption-video"

print("Remotion server called")
response = requests.post(url=url, json=payload, headers=headers)

if response.status_code == 200:
    print("Request captioning successful")
    print("Reel creation completed")
    print(f"File Saved at {remotion_folder}/{response.json().get('outputLocation')}")
else:
    print(f"Remotion captioning failed {response.status_code}")
    print(response.text)
    print("Reel captioning failed")

The finalized captioned video is then rendered out to the captions/out folder.

Overview of the Technologies Used

Pipeline Task Technology
changes the reel background sieve/background-removal
extracts the video transcript sieve/transcribe
communication between Python and Remotion express
burns the caption onto the reel remotion

Output Preview

Below is the output (on the right) generated by the script, displayed alongside its original form (on the left) for comparison.

Variants of Caption Component

The example above illustrates one type of caption that can be created using the Remotion framework. Additional examples of captions produced with the same technique are provided below. By leveraging Remotion's capabilities, we can utilize React to design captions in virtually any style or format.

Variant 1: Typing Caption

Below is the output generated using the above technique, but with a redesign of the caption component. It is displayed on the right and compared with its original form shown on the left.

Variant 2: Background Tracking Caption

Below is another output generated using the above technique, but with a redesign of the caption component. It is displayed on the right and compared with its original form shown on the left.

Code for the Variants

To generate these new variants, update the code for the prepare_transcript and merge_consecutive_subtitles functions as shown below:

def merge_consecutive_subtitles(subtitles, max_subtitle_words, max_subtitle_words_overlap, max_subtitle_characters):
    merge_consecutive_subtitles_tuples = []
    temp_tuple = []
    current_char_count = 0

    for i in range(len(subtitles)):
        word_length = len(subtitles[i]['word'])

        if (not temp_tuple or
            (subtitles[i]['start'] - temp_tuple[-1]['start'] <= max_subtitle_words_overlap and
             len(temp_tuple) < max_subtitle_words and
             current_char_count + word_length <= max_subtitle_characters and
             not temp_tuple[-1]['word'].strip().endswith('.'))):
            temp_tuple.append(subtitles[i])
            current_char_count += word_length
        else:
            merge_consecutive_subtitles_tuples.append(tuple(temp_tuple))
            temp_tuple = [subtitles[i]]
            current_char_count = word_length

    if temp_tuple:
        merge_consecutive_subtitles_tuples.append(tuple(temp_tuple))

    return merge_consecutive_subtitles_tuples

def prepare_transcript(transcript, fps, max_subtitle_words, max_subtitle_words_overlap, max_subtitle_characters):
    subtitles = []
    for segment in transcript['segments']:
        for segment_word in segment['words']:
            subtitles.append(segment_word)

    subtitles = merge_consecutive_subtitles(subtitles, max_subtitle_words, max_subtitle_words_overlap, max_subtitle_characters)

    # convert subtitles timestamps into frames
    subtitles_in_frames = []
    for segment in subtitles:
        converted_segment = []
        for entry in segment:
            start_frame = int(entry['start'] * fps)
            end_frame = int(entry['end'] * fps)
            converted_segment.append({'start': start_frame, 'end': end_frame, 'word': entry['word'].strip()})
        subtitles_in_frames.append(list(converted_segment))
    return subtitles_in_frames

Additionally, update the fonts.css file by appending the new font to the end of the file:

@import url('https://fonts.googleapis.com/css2?family=Poppins:ital,wght@0,100;0,200;0,300;0,400;0,500;0,600;0,700;0,800;0,900;1,100;1,200;1,300;1,400;1,500;1,600;1,700;1,800;1,900&display=swap');

Building the Typing Caption Variant

For the first caption variant, use the Caption Component provided below instead of the one mentioned earlier. Save it in a TypingCaption.tsx file:

import { AbsoluteFill, useCurrentFrame } from 'remotion';
import './fonts.css';

export type Subtitle = {
  start: number;
  end: number;
  word: string;
};

export type TypingCaptionProps = {
  data_subtitles: Subtitle[][];
};

export const TypingCaption = ({ data_subtitles }: TypingCaptionProps) => {
  const frame = useCurrentFrame();

  const activeSubtitleGroup = data_subtitles.find((group) => {
    const groupStart = group[0]?.start;
    const groupEnd = group[group.length - 1]?.end;
    return frame >= groupStart && frame <= groupEnd;
  });

  return (
    <AbsoluteFill style={{ zIndex: 1 }}>
      <div
        style={{
          justifyContent: 'center',
          alignItems: 'center',
          display: 'flex',
          flexDirection: 'column',
        }}
      >
        {activeSubtitleGroup && (
          <span
            style={{
              position: 'fixed',
              bottom: '150px',
              color: 'white',
              fontSize: '43px',
              fontFamily: 'Poppins',
              borderRadius: '10px',
              backgroundColor: 'white',
              paddingTop: '1px',
              paddingLeft: '0px',
              paddingRight: '10px',
            }}
          >
            {activeSubtitleGroup.map((subtitle) => {
              const progress =
                (frame - subtitle.start) / (subtitle.end - subtitle.start);
              const opacity = Math.min(Math.max(progress, 0.2), 1); // Ensure opacity starts at 0.5 and goes up to 1

              const color = 'black';

              return (
                <span style={{ paddingLeft: '10px' }}>
                  <span
                    style={{
                      color: color,
                      opacity: opacity,
                      padding: '1px 2px',
                      fontWeight: 420,
                    }}
                  >
                    {subtitle.word.toLowerCase()}
                  </span>
                </span>
              );
            })}
          </span>
        )}
      </div>
    </AbsoluteFill>
  );
};

Update the parameters for the prepare_transcript call as follows:

data_subtitles = prepare_transcript(
  (transcript = transcript),
  (fps = fps),
  (max_subtitle_words = 6),
  (max_subtitle_words_overlap = 2.5),
  (max_subtitle_characters = 28)
);

Replace the simpler CaptionComponent in CaptionedVideo.tsx with this variant, and make sure to import the new component as well:

import { TypingCaption } from './TypingCaption';

{
  /* <Caption data_subtitles={data_subtitles} /> */
}
<TypingCaption data_subtitles={data_subtitles} />;

Building the Background Tracking Caption Variant

For the second caption variant, use the Caption Component provided below instead of the one mentioned earlier. Save it in a BackgroundTrackingCaption.tsx file:

import { AbsoluteFill, useCurrentFrame, interpolate } from 'remotion';
import './fonts.css';

export type Subtitle = {
  start: number;
  end: number;
  word: string;
};

export type BackgroundTrackingCaptionProps = {
  data_subtitles: Subtitle[][];
};

export const BackgroundTrackingCaption = ({
  data_subtitles,
}: BackgroundTrackingCaptionProps) => {
  const frame = useCurrentFrame();

  const activeSubtitleGroup = data_subtitles.find((group) => {
    const groupStart = group[0]?.start;
    const groupEnd = group[group.length - 1]?.end;
    return frame >= groupStart && frame <= groupEnd;
  });

  return (
    <AbsoluteFill style={{ zIndex: 1 }}>
      <div
        style={{
          justifyContent: 'center',
          alignItems: 'center',
          display: 'flex',
          flexDirection: 'column',
        }}
      >
        {activeSubtitleGroup && (
          <span
            style={{
              position: 'fixed',
              paddingLeft: '10px',
              paddingRight: '10px',
              bottom: '150px',
              color: 'white',
              fontSize: '55px',
              fontFamily: 'Poppins',
              textShadow: '4px 4px 4px black',
              borderRadius: '10px',
              fontWeight: 'bold',
            }}
          >
            {activeSubtitleGroup.map((subtitle) => {
              const isActive = frame >= subtitle.start && frame <= subtitle.end;
              const scale = isActive
                ? interpolate(
                    frame,
                    [subtitle.start, subtitle.start + 5],
                    [1, 1.08],
                    {
                      extrapolateRight: 'clamp',
                    }
                  )
                : 1;

              return (
                <span style={{ paddingLeft: '20px' }}>
                  <span
                    style={{
                      position: 'relative',
                      padding: '1px 0px',
                      borderRadius: '10px',
                    }}
                  >
                    <span
                      style={{
                        position: 'absolute',
                        top: '0',
                        left: '-3px',
                        right: '-3px',
                        bottom: '0',
                        backgroundColor: isActive ? '#764dea' : 'transparent',
                        borderRadius: '10px',
                        zIndex: -1, // send the background behind the text
                        transform: `scale(${scale})`,
                      }}
                    />
                    {subtitle.word.toUpperCase()}
                  </span>
                </span>
              );
            })}
          </span>
        )}
      </div>
    </AbsoluteFill>
  );
};

Update the parameters for the prepare_transcript call as follows:

data_subtitles = prepare_transcript(transcript = transcript, fps = fps, max_subtitle_words = 3, max_subtitle_words_overlap = 1, max_subtitle_characters = 18)

Replace the simpler Caption component in CaptionedVideo.tsx with this variant, and be sure to import the new component as well:

import { BackgroundTrackingCaption } from './BackgroundTrackingCaption';

{
  /* <Caption data_subtitles={data_subtitles} /> */
}
<BackgroundTrackingCaption data_subtitles={data_subtitles} />;

Future Enhancements

The current implementation effectively transforms unedited reels into captivating, social media-optimized content. However, there are several opportunities to expand and enhance its capabilities:

  • Expand Resolution Support: Enable captioning for videos in a wider range of resolutions
  • Incorporate Additional Effects: Add features like flame transitions and zoom in/out effects to boost viewer engagement
  • Introduce Language Support: Provide multi-language dubbing and captioning options to appeal to a global audience

Conclusion

This tutorial demonstrated how to create an automated Reels editing pipeline, saving time while improving video quality and audience engagement. If you're ready to integrate automated Reels editing into your application, join our vibrant Discord community for insights and support. For professional assistance, feel free to reach out to us at contact@sievedata.com.