Video Chaptering using Python

You must have seen video chapters in a YouTube video. Video chaptering is the process of dividing a video into distinct segments, each labelled with a specific title or chapter name, to enhance navigation and user experience. So, if you want to learn how to add video chapters to a video, this article is for you. In this article, I’ll take you through the task of Video Chaptering using Python.

Video Chaptering: Getting Started

Video Chaptering involves using natural language processing (NLP) and machine learning techniques to automatically segment videos into coherent chapters based on their content. Expected results include structured and easily navigable videos to enhance the user experience by allowing viewers to quickly find and jump to specific sections of interest.

It works by transcribing the audio content of the video and analyzing the text for key topics, themes, and transitions. So, to get started with Video Chaptering, we need to collect audio data from a video and transcribe it to divide it into chapters.

I’ll collect data from a YouTube video for this task, for which we need to use the YouTube data API. You can follow the steps below to sign up and get access to the YouTube data API:

Go to Google Cloud Console.
Click on the project drop-down at the top, then “New Project”.
Enter a project name and click “Create”.
In the Google Cloud Console, navigate to “APIs & Services” > “Library”.
Search for “YouTube Data API v3” and click on it.
Click “Enable”.
Go to “APIs & Services” > “Credentials”.
Click “+ CREATE CREDENTIALS” and select “API key”.
Copy the generated API key.

If you find any issues while generating the API, feel free to reach me on Instagram or LinkedIn.

Video Chaptering using Python

Now, let’s get started with video chaptering by collecting the data from a YouTube video using Python. Below is how we can collect data from a YouTube video by using the YouTube Data API and save the transcribed data into a CSV file:

import re
import csv
import pandas as pd
from googleapiclient.discovery import build
from youtube_transcript_api import YouTubeTranscriptApi

API_KEY = 'Your API Key'

def get_video_id(url):
    # extract video id from the URL
    video_id_match = re.search(r'(?:v=|\/)([0-9A-Za-z_-]{11}).*', url)
    return video_id_match.group(1) if video_id_match else None

def get_video_title(video_id):
    # build the youTube service
    youtube = build('youtube', 'v3', developerKey=API_KEY)

    # fetch the video details
    request = youtube.videos().list(
        part='snippet',
        id=video_id
    )
    response = request.execute()

    # extract the title
    title = response['items'][0]['snippet']['title'] if response['items'] else 'Unknown Title'
    return title

def get_video_transcript(video_id):
    # fetch the transcript
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return transcript
    except Exception as e:
        print(f"An error occurred: {e}")
        return []

def save_to_csv(title, transcript, filename):
    # save the title and transcript to a CSV file
    transcript_data = [{'start': entry['start'], 'text': entry['text']} for entry in transcript]
    df = pd.DataFrame(transcript_data)
    df.to_csv(filename, index=False)

    # save the title separately
    with open(filename, 'a', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(['Title:', title])

def main():
    url = input('Enter the YouTube video link: ')
    video_id = get_video_id(url)

    if not video_id:
        print('Invalid YouTube URL.')
        return

    title = get_video_title(video_id)
    transcript = get_video_transcript(video_id)

    if not transcript:
        print('No transcript available for this video.')
        return

    filename = f"{video_id}_transcript.csv"
    save_to_csv(title, transcript, filename)
    print(f'Transcript saved to {filename}')

if __name__ == '__main__':
    main()

Enter the YouTube video link: https://youtu.be/71op1DQ2gyo?si=tvMFyTqlQiDDjBj2
Transcript saved to 71op1DQ2gyo_transcript.csv

The above code extracts the transcript of a YouTube video along with its title and saves it to a CSV file. It starts by extracting the video ID from a provided YouTube URL and then uses the YouTube Data API to fetch the video’s title. Next, it retrieves the video’s transcript using the YouTube Transcript API. The title and transcript data are then saved to a CSV file, with the transcript entries listed alongside their start times. If the transcript retrieval is successful, the file is saved with a name derived from the video ID.

Now, let’s explore this collected dataset:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.decomposition import NMF, LatentDirichletAllocation

# load the dataset
transcript_df = pd.read_csv("/content/71op1DQ2gyo_transcript.csv")
print(transcript_df.head())

   start                                     text
0   0.04   in this video I'm going to explain how
1  1.439  to train for Pure muscle growth and I'm
2   3.36            going to lay out five crucial
3   4.96     bodybuilding principles that must be
4   6.68       followed to maximize your muscular

transcript_df['start'] = pd.to_numeric(transcript_df['start'], errors='coerce')

print("Dataset Overview:")
print(transcript_df.info())
print("\nBasic Statistics:")
print(transcript_df.describe())

Dataset Overview:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 461 entries, 0 to 460
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   start   460 non-null    float64
 1   text    461 non-null    object 
dtypes: float64(1), object(1)
memory usage: 7.3+ KB
None

Basic Statistics:
            start
count  460.000000
mean   445.849439
std    252.363844
min      0.040000
25%    223.509250
50%    452.100000
75%    665.970000
max    868.920000

Let’s have a look at the distribution of the text lengths in each row:

# distribution of text lengths
transcript_df['text_length'] = transcript_df['text'].apply(len)
plt.figure(figsize=(10, 5))
plt.hist(transcript_df['text_length'], bins=50, color='blue', alpha=0.7)
plt.title('Distribution of Text Lengths')
plt.xlabel('Text Length')
plt.ylabel('Frequency')
plt.show()

Video Chaptering: Distribution of Text Lengths

Now, let’s have a look at the most common words used in the video:

# most common words
vectorizer = CountVectorizer(stop_words='english')
word_counts = vectorizer.fit_transform(transcript_df['text'])
word_counts_df = pd.DataFrame(word_counts.toarray(), columns=vectorizer.get_feature_names_out())
common_words = word_counts_df.sum().sort_values(ascending=False).head(20)
plt.figure(figsize=(10, 5))
common_words.plot(kind='bar', color='green', alpha=0.7)
plt.title('Top 20 Common Words')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.show()

The next step is to perform topic modelling on this dataset to identify key topics and transitions:

# topic Modeling using NMF
n_features = 1000
n_topics = 10
n_top_words = 10

tf_vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')
tf = tf_vectorizer.fit_transform(transcript_df['text'])
nmf = NMF(n_components=n_topics, random_state=42).fit(tf)
tf_feature_names = tf_vectorizer.get_feature_names_out()

def display_topics(model, feature_names, no_top_words):
    topics = []
    for topic_idx, topic in enumerate(model.components_):
        topic_words = [feature_names[i] for i in topic.argsort()[:-no_top_words - 1:-1]]
        topics.append(" ".join(topic_words))
    return topics

topics = display_topics(nmf, tf_feature_names, n_top_words)
print("\nIdentified Topics:")
for i, topic in enumerate(topics):
    print(f"Topic {i + 1}: {topic}")

Identified Topics:
Topic 1: muscle growth maximize pure good target train going better need
Topic 2: week month pre order add body day doing build 10
Topic 3: tension target causes king rope exercise muscles technique example muscle
Topic 4: failure way closer rep going shy really set better high
Topic 5: bodybuilding program want pure new pre technique order use principles
Topic 6: reps weight tank adding type add effective really case free
Topic 7: exercises squat like bench month barbell cable high think squats
Topic 8: sets hard push need maximize volume pull recovery body minute
Topic 9: range motion need means using example use try usually shown
Topic 10: training important ll hypertrophy strength resistance deep volume pure stretch

In the above code, we are performing topic modelling on the text data using Non-negative Matrix Factorization (NMF). It starts by defining the number of features and topics and then uses CountVectorizer to convert the text data into a matrix of token counts to filter out common English stop words. The NMF model is then fitted to this term-document matrix to identify a specified number of topics. The display_topics function extracts and prints the top words associated with each topic, which helps to interpret the main themes in the transcript.

Now, we will assign topics to each text segment:

# get topic distribution for each text segment
topic_distribution = nmf.transform(tf)

# align the lengths by trimming the extra row in topic_distribution
topic_distribution_trimmed = topic_distribution[:len(transcript_df)]

# compute the dominant topic for each text segment
transcript_df['dominant_topic'] = topic_distribution_trimmed.argmax(axis=1)

In the above code, we are assigning topics to each text segment in the transcript based on the previously fitted NMF model. It starts by transforming the term-document matrix into a topic distribution for each text segment using the NMF model. It then ensures that the lengths of the topic distribution and the transcript dataframe match by trimming any extra rows in the topic distribution. Finally, it computes the dominant topic for each text segment by finding the topic with the highest value in the topic distribution and assigns this topic to the corresponding text segment in the transcript dataframe, which stores the results in a new column called ‘dominant_topic‘.

Now, we will analyze the content of each text segment to manually identify logical breaks:

# analyze the content of each text segment to manually identify logical breaks
logical_breaks = []

for i in range(1, len(transcript_df)):
    if transcript_df['dominant_topic'].iloc[i] != transcript_df['dominant_topic'].iloc[i - 1]:
        logical_breaks.append(transcript_df['start'].iloc[i])

Next, we will consolidate the logical breaks into broader chapters:

# consolidate the logical breaks into broader chapters
threshold = 60  # seconds
consolidated_breaks = []
last_break = None

for break_point in logical_breaks:
    if last_break is None or break_point - last_break >= threshold:
        consolidated_breaks.append(break_point)
        last_break = break_point

Next, we need to merge consecutive breaks with the same dominant topic:

# merge consecutive breaks with the same dominant topic
final_chapters = []
last_chapter = (consolidated_breaks[0], transcript_df['dominant_topic'][0])

for break_point in consolidated_breaks[1:]:
    current_topic = transcript_df[transcript_df['start'] == break_point]['dominant_topic'].values[0]
    if current_topic == last_chapter[1]:
        last_chapter = (last_chapter[0], current_topic)
    else:
        final_chapters.append(last_chapter)
        last_chapter = (break_point, current_topic)

final_chapters.append(last_chapter)  # append the last chapter

In the above code, we are consolidating the chapter breaks by merging consecutive breaks that share the same dominant topic. It initializes the first chapter with the first break point and its corresponding topic. For each subsequent breakpoint, it checks if the dominant topic of the current segment is the same as the previous one. If they match, it continues the current chapter; if they differ, it finalizes the current chapter and starts a new one with the current breakpoint and topic. After processing all breakpoints, it ensures the final chapter is added to the list of chapters. This results in a list of chapters where each chapter consists of continuous segments with the same dominant topic.

Now, we will see the final video chapters according to the time stamps:

# Convert the final chapters to a readable time format
chapter_points = []
chapter_names = []

for i, (break_point, topic_idx) in enumerate(final_chapters):
    chapter_time = pd.to_datetime(break_point, unit='s').strftime('%H:%M:%S')
    chapter_points.append(chapter_time)

    # get the context for the chapter name
    chapter_text = transcript_df[(transcript_df['start'] >= break_point) & (transcript_df['dominant_topic'] == topic_idx)]['text'].str.cat(sep=' ')

    # extract key phrases to create a chapter name
    vectorizer = TfidfVectorizer(stop_words='english', max_features=3)
    tfidf_matrix = vectorizer.fit_transform([chapter_text])
    feature_names = vectorizer.get_feature_names_out()
    chapter_name = " ".join(feature_names)

    chapter_names.append(f"Chapter {i+1}: {chapter_name}")

# display the final chapter points with names
print("\nFinal Chapter Points with Names:")
for time, name in zip(chapter_points, chapter_names):
    print(f"{time} - {name}")

Final Chapter Points with Names:
00:00:01 - Chapter 1: failure going way
00:01:02 - Chapter 2: bodybuilding program want
00:02:02 - Chapter 3: motion need range
00:03:02 - Chapter 4: exercises fatigue high
00:04:04 - Chapter 5: hard push sets
00:05:06 - Chapter 6: hypertrophy ll training
00:06:08 - Chapter 7: failure really way
00:07:09 - Chapter 8: growth muscle target
00:09:15 - Chapter 9: biceps exercise tension
00:11:16 - Chapter 10: exercises guys month
00:12:19 - Chapter 11: bodybuilding program want
00:13:21 - Chapter 12: arm dedicated included
00:14:21 - Chapter 13: guys thank

In the above code, we are converting the final chapter breakpoints into a readable time format and generating meaningful names for each chapter. For each chapter breakpoint, it converts the breakpoint from seconds into a formatted time string and adds it to the chapter_points list. It then concatenates the text of all segments within the chapter to form the chapter text. Then, by using TF-IDF vectorization, it extracts the top three key phrases from this text to create a concise chapter name, which is appended to the chapter_names list. Finally, it prints the chapter points along with their generated names to provide a clear and readable structure for the video chapters.

Summary

So, this is how Video Chaptering works. Video chaptering is the process of dividing a video into distinct segments, each labelled with a specific title or chapter name, to enhance navigation and user experience.

I hope you liked this article on Video Chaptering using Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.