No Examples Found
Transcript Analysis
This app offers a comprehensive solution for extracting valuable information from video or audio files. It automatically generates titles, chapters, summaries, and tags, enhancing the accessibility and discoverability of your media content. This tool is particularly useful for content creators, marketers, and educational purposes, providing insights and structured data that can improve content engagement and understanding.
For pricing, click here.
For detailed notes, click here.
Key Features
- Title Generation: Creates a concise and relevant title for your media file based on its content, with a maximum length of 10 words.
- Chapter Generation: Automatically divides the content into chapters, making it easier to navigate through different sections of the video or audio.
- Summary Creation: Produces a summary of the main points covered in the media file, with a customizable maximum sentence length.
- Tagging: Generates tags related to the content, aiding in searchability and categorization.
- Audio Denoising: Improves transcription accuracy by denoising audio tracks before processing.
- Highlight Extraction: Identifies and extracts highlights based on predefined search phrases, with adjustable maximum duration.
- Custom Chapters: Allows for custom chapter titles to be provided, with options for manual or LLM-generated chapters. Check out the Custom Chapters section for more details.
- Target Language: Supports analysis in multiple languages.
- Customization: Allows picking different models for different parts of the pipeline, choose between different LLMs, transcription models, and customize outputs through natural language, using prompts.
- Custom Vocabulary: Allows for custom vocabulary for audio transcription, can be used to find and replace or spell words. See Custom Vocabulary for more details.
- Bring your own Keys: Use your own LLM provider keys, or use ours. See API Keys for more details.
Comparing Backends
LLM Backends
Model | Speed | Accuracy | Notes |
---|---|---|---|
🚀 gpt-4o-mini | ⚡⚡⚡ | ⭐⭐ | Fastest, most cost-effective, but potentially less detailed |
🔄 gpt-4o-2024-08-06 | ⚡⚡ | ⭐⭐⭐ | Balanced performance, up-to-date knowledge |
🏆 gpt-4o-2024-05-13 | ⚡⚡ | ⭐⭐⭐ | Most capable, highest accuracy, slightly older version |
🎻 claude-3-5-sonnet-20241022 | ⚡⚡ | ⭐⭐⭐ | Latest Claude model, balanced performance |
Transcription Backends
Model | Speed | Accuracy | Notes |
---|---|---|---|
🏎️ groq-whisper | ⚡⚡⚡ | ⭐⭐⭐ | Fastest, duration-based pricing |
💰 stable-ts | ⚡⚡ | ⭐⭐⭐ | Compute-based pricing, cost-effective for longer audio |
🎯 whisper-timestamped | ⚡⚡ | ⭐⭐⭐⭐ | Accurate timestamps, compute-based pricing |
* Compute-based pricing: costs are calculated based on processing time taken.
Pricing
This function combines multiple models, including transcription models and various LLMs. The pricing is calculated based on the combined usage of all models involved in the process.
LLM Model | Transcription Model | Total Cost per Hour of Audio |
---|---|---|
gpt-4o-mini | groq-whisper | $0.3272 |
gpt-4o-mini | whisper-timestamped | $0.2537 |
gpt-4o-mini | stable-ts | $0.2412 |
gpt-4o-2024-08-06 | groq-whisper | $0.497 |
gpt-4o-2024-08-06 | whisper-timestamped | $0.4235 |
gpt-4o-2024-08-06 | stable-ts | $0.411 |
gpt-4o-2024-05-13 | groq-whisper | $0.581 |
gpt-4o-2024-05-13 | whisper-timestamped | $0.5075 |
gpt-4o-2024-05-13 | stable-ts | $0.495 |
claude-3-5-sonnet-20241022 | groq-whisper | $0.533 |
claude-3-5-sonnet-20241022 | whisper-timestamped | $0.459 |
claude-3-5-sonnet-20241022 | stable-ts | $0.447 |
Notes:
- These are average costs based on default settings, costs may vary based on additional parameters selected.
- The prices in the table above include a $0.2 base charge per hour of audio. This is our fee for running analytics and processing on your video or audio.
- This app utilizes the
speech_transcriber
function for transcription. As a result:- The costs shown in the table above represent the combined price for both this app and the
speech_transcriber
function. - In your Usage page, you'll see separate entries for this app and the
speech_transcriber
app. - The cost displayed for this app in the Usage page will appear lower than the total cost, as the transcription cost is billed under the
speech_transcriber
app.
- The costs shown in the table above represent the combined price for both this app and the
Example 1: groq-whisper + gpt-4o-mini
If groq-whisper
is selected as a backend for transcription, and gpt-4o-mini
is selected as the llm_backend
, the pricing will be calculated as, (assuming a 30 minute video file and generating a summary, title and tags)
total_cost = video_duration_in_mins/60 * llm_transcription_cost
total_cost = 30/60 * $0.3272 = $0.1636
Example 2: stable-ts + gpt-4o-mini
If stable-ts
is selected as a backend for transcription, and gpt-4o-mini
is selected as a backend for the rest of the pipeline, the pricing will be calculated as, (assuming a 30 minute video file, generating a summary, title and tags)
total_cost = video_duration_in_mins/60 * llm_transcription_cost
total_cost = 30/60 * $0.2412 = $0.1206
Output Formats
Check out the example outputs for full details on the output formats.
The function returns outputs in separate sections, indexed from 0 onwards. The number and order of sections depend on which parameters you've selected.
Here's the full list of possible outputs in their default order:
- Video Metadata (always included)
- Transcript (always included)
- Summary (if
generate_summary
is True) - Title (if
generate_title
is True) - Tags (if
generate_tags
is True) - Highlights (if
generate_highlights
is True) - Sentiment Analysis (if
generate_sentiments
is True) - Chapters (if
generate_chapters
is True)
Note: If you set any of these parameters to False, the subsequent sections will shift up in the index. For example, if generate_summary
is False, the Title (if generated) will be at index 2 instead of 3.
Highlights
Highlights are returned as an array of objects, each containing:
{
"highlights": [
{
"title": "Final Countdown and Escape",
"score": 95,
"start_time": 471.81,
"end_time": 588.23,
"start_timecode": "0:07:51.810000",
"end_timecode": "0:09:48.230000",
"duration": 116.42
},
{
"title": "Laser Maze Tension",
"score": 90,
"start_time": 304.71,
"end_time": 325.67,
"start_timecode": "0:05:04.709000",
"end_timecode": "0:05:25.670000",
"duration": 20.96
}
]
}
Sentiment Analysis
Sentiment analysis results are returned as an array of objects for each segment:
{
"sentiment_analysis": [
{
"start_time": 0,
"end_time": 2,
"text": "I just started this 10 minute timer.",
"sentiment": "neutral",
"score": 50
},
{
"start_time": 2,
"end_time": 5,
"text": "And when it hits zero, that room all the way over there,",
"sentiment": "neutral",
"score": 50
}
]
}
Chapters
Chapters are returned as an array of objects, each representing a chapter:
{
"chapters": [
{
"title": "Introduction & Setup",
"start_time": 0,
"timecode": "00:00:00"
},
{
"title": "Starting the Challenge",
"start_time": 22.76,
"timecode": "00:00:22"
}
]
}
API Keys
By default, the transcript-analysis
function will run without requiring any API keys. If you'd like to provide your own API keys, you'll only be charged the small compute-based fee outlined here. You can read about environment variables on Sieve here.
OpenAI
Set OPENAI_API_KEY
environment variable to your OpenAI API key.
Azure OpenAI
Set AZURE_OPENAI_API_KEY
to your Azure OpenAI API key.
Set AZURE_OPENAI_ENDPOINT
to your organization's Azure OpenAI endpoint. (ex. https://my-organization.openai.azure.com/
)
Additionally, you'll need to provide the Azure OpenAI model deployment name for each model you'd like to use.
Model | Environment Variable |
---|---|
gpt-4o-2024-08-06 | AZURE_OPENAI_GPT_4O_2024_08_06_MODEL_NAME |
gpt-4o-2024-05-13 | AZURE_OPENAI_GPT_4O_2024_05_13_MODEL_NAME |
gpt-4o-mini | AZURE_OPENAI_GPT_4O_MINI_MODEL_NAME |
Azure OpenAI deployment information can be found here.
Anthropic
Set the ANTHROPIC_API_KEY
environment variable to your Anthropic API key.
Notes
Custom Chapters
Custom chapters allow you to have more control over how your content is segmented, ensuring that specific topics or sections are always included in the chapter breakdown.
What are Custom Chapters?
Custom chapters are user-defined topics or sections that you want to be identified and timestamped in your content. This feature allows you to guide the AI in creating a more relevant and accurate chapter structure.
Why Use Custom Chapters?
- Ensure Important Topics are Covered: Guarantee that key subjects are always included in the chapter list, even if they're brief.
- Maintain Consistency: Use a similar chapter structure across multiple videos or podcasts in a series.
- Combine AI and Human Insight: Leverage your knowledge of the content while still benefiting from AI-powered timestamp identification.
How to Use Custom Chapters:
- Create a list of chapter titles or topics you want to include.
- Pass this list to the
custom_chapters
parameter. - Choose a
custom_chapter_mode
:- "strict": Only use your provided chapters
- "extended" (default): Use your chapters and allow the AI to add additional relevant chapters
Example:
custom_chapters = ["Introduction", "Problem Statement", "Methodology", "Results", "Conclusion"]
custom_chapter_mode = "extended"
In this example, the AI will ensure these five chapters are included, and may add other relevant chapters it identifies in the content.
By using custom chapters, you can ensure your content is organized in a way that best serves your audience and highlights the most important aspects of your material.
Parameters:
file
: Video or audio file (required).transcription_backend
: Backend to use for transcription. Options: "groq-whisper", "stable-ts", "whisper-timestamped". Default is "groq-whisper".llm_backend
: LLM to use for processing the transcript. Options: "gpt-4o-2024-08-06", "gpt-4o-2024-05-13", "gpt-4o-mini". Default is "gpt-4o-2024-08-06".prompt
: Custom prompt to guide the LLM's analysis. Influences the analysis, focus areas, and overall output of the generated content.generate_summary
: Whether to generate a summary. Default is True.generate_title
: Whether to generate a title. Default is True.generate_chapters
: Whether to generate chapters. Default is True.generate_tags
: Whether to generate tags. Default is True.generate_highlights
: Whether to generate highlights. Default is False.generate_sentiments
: Whether to perform sentiment analysis on each segment. Default is False.custom_chapters
: List of custom chapters. Pass in a list of strings for custom chapter generation.custom_chapter_mode
: Mode for custom chapters. Options: "extended" (default) or "strict".max_summary_length
: Maximum number of sentences in the summary. Default is 5.max_title_length
: Maximum number of words in the title. Default is 10.min_chapter_length
: Minimum length of chapters in seconds. Default is 0.num_tags
: Number of tags to generate. Default is 5.target_language
: The target language of the output. If empty, uses the input file's language.custom_vocabulary
: Custom vocabulary for audio transcription. A dictionary of words to add to the vocabulary, can be used to find and replace or spell words.speaker_diarization
: Whether to use speaker diarization to split audio into segments. Default is False.use_vad
: Whether to use voice activity detection to split audio into segments. Default is False.denoise_audio
: Whether to denoise audio before analysis. Improves transcription but slows processing. Default is False.return_as_json_file
: Whether to return each output as a JSON file. Useful for efficient fetching of large payloads. Default is False.use_azure
: Whether to use Azure OpenAI for OpenAI models instead of OpenAI's API. Default is False.
Custom Vocabulary
The custom_vocabulary
parameter allows you to specify a custom set of words for find-and-replace operations in the transcript. This feature is useful for:
- Correcting common transcription errors
- Ensuring accurate transcription of technical terms or proper nouns
- Standardizing specific terminology in your transcripts
Example: Sieve is often transcribed as "Civ" by the whisper models, so you can add custom_vocabulary = {"Civ": "Sieve"}
to the function to ensure that "Civ" is transcribed as "Sieve" instead of "Civ."