You must have seen video chapters in a YouTube video. Video chaptering is the process of dividing a video into distinct segments, each labelled with a specific title or chapter name, to enhance navigation and user experience. So, if you want to learn how to add video chapters to a video, this article is for you. In this article, I’ll take you through the task of Video Chaptering using Python.
Video Chaptering: Getting Started
Video Chaptering involves using natural language processing (NLP) and machine learning techniques to automatically segment videos into coherent chapters based on their content. Expected results include structured and easily navigable videos to enhance the user experience by allowing viewers to quickly find and jump to specific sections of interest.
It works by transcribing the audio content of the video and analyzing the text for key topics, themes, and transitions. So, to get started with Video Chaptering, we need to collect audio data from a video and transcribe it to divide it into chapters.
I’ll collect data from a YouTube video for this task, for which we need to use the YouTube data API. You can follow the steps below to sign up and get access to the YouTube data API:
- Go to Google Cloud Console.
- Click on the project drop-down at the top, then “New Project”.
- Enter a project name and click “Create”.
- In the Google Cloud Console, navigate to “APIs & Services” > “Library”.
- Search for “YouTube Data API v3” and click on it.
- Click “Enable”.
- Go to “APIs & Services” > “Credentials”.
- Click “+ CREATE CREDENTIALS” and select “API key”.
- Copy the generated API key.
If you find any issues while generating the API, feel free to reach me on Instagram or LinkedIn.
Video Chaptering using Python
Now, let’s get started with video chaptering by collecting the data from a YouTube video using Python. Below is how we can collect data from a YouTube video by using the YouTube Data API and save the transcribed data into a CSV file:
import re
import csv
import pandas as pd
from googleapiclient.discovery import build
from youtube_transcript_api import YouTubeTranscriptApi
API_KEY = 'Your API Key'
def get_video_id(url):
# extract video id from the URL
video_id_match = re.search(r'(?:v=|\/)([0-9A-Za-z_-]{11}).*', url)
return video_id_match.group(1) if video_id_match else None
def get_video_title(video_id):
# build the youTube service
youtube = build('youtube', 'v3', developerKey=API_KEY)
# fetch the video details
request = youtube.videos().list(
part='snippet',
id=video_id
)
response = request.execute()
# extract the title
title = response['items'][0]['snippet']['title'] if response['items'] else 'Unknown Title'
return title
def get_video_transcript(video_id):
# fetch the transcript
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
return transcript
except Exception as e:
print(f"An error occurred: {e}")
return []
def save_to_csv(title, transcript, filename):
# save the title and transcript to a CSV file
transcript_data = [{'start': entry['start'], 'text': entry['text']} for entry in transcript]
df = pd.DataFrame(transcript_data)
df.to_csv(filename, index=False)
# save the title separately
with open(filename, 'a', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Title:', title])
def main():
url = input('Enter the YouTube video link: ')
video_id = get_video_id(url)
if not video_id:
print('Invalid YouTube URL.')
return
title = get_video_title(video_id)
transcript = get_video_transcript(video_id)
if not transcript:
print('No transcript available for this video.')
return
filename = f"{video_id}_transcript.csv"
save_to_csv(title, transcript, filename)
print(f'Transcript saved to {filename}')
if __name__ == '__main__':
main()Enter the YouTube video link: https://youtu.be/71op1DQ2gyo?si=tvMFyTqlQiDDjBj2
Transcript saved to 71op1DQ2gyo_transcript.csv
The above code extracts the transcript of a YouTube video along with its title and saves it to a CSV file. It starts by extracting the video ID from a provided YouTube URL and then uses the YouTube Data API to fetch the video’s title. Next, it retrieves the video’s transcript using the YouTube Transcript API. The title and transcript data are then saved to a CSV file, with the transcript entries listed alongside their start times. If the transcript retrieval is successful, the file is saved with a name derived from the video ID.
Now, let’s explore this collected dataset:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.decomposition import NMF, LatentDirichletAllocation
# load the dataset
transcript_df = pd.read_csv("/content/71op1DQ2gyo_transcript.csv")
print(transcript_df.head())start text
0 0.04 in this video I'm going to explain how
1 1.439 to train for Pure muscle growth and I'm
2 3.36 going to lay out five crucial
3 4.96 bodybuilding principles that must be
4 6.68 followed to maximize your muscular
transcript_df['start'] = pd.to_numeric(transcript_df['start'], errors='coerce')
print("Dataset Overview:")
print(transcript_df.info())
print("\nBasic Statistics:")
print(transcript_df.describe())Dataset Overview:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 461 entries, 0 to 460
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 start 460 non-null float64
1 text 461 non-null object
dtypes: float64(1), object(1)
memory usage: 7.3+ KB
None
Basic Statistics:
start
count 460.000000
mean 445.849439
std 252.363844
min 0.040000
25% 223.509250
50% 452.100000
75% 665.970000
max 868.920000
Let’s have a look at the distribution of the text lengths in each row:
# distribution of text lengths
transcript_df['text_length'] = transcript_df['text'].apply(len)
plt.figure(figsize=(10, 5))
plt.hist(transcript_df['text_length'], bins=50, color='blue', alpha=0.7)
plt.title('Distribution of Text Lengths')
plt.xlabel('Text Length')
plt.ylabel('Frequency')
plt.show()
Now, let’s have a look at the most common words used in the video:
# most common words
vectorizer = CountVectorizer(stop_words='english')
word_counts = vectorizer.fit_transform(transcript_df['text'])
word_counts_df = pd.DataFrame(word_counts.toarray(), columns=vectorizer.get_feature_names_out())
common_words = word_counts_df.sum().sort_values(ascending=False).head(20)
plt.figure(figsize=(10, 5))
common_words.plot(kind='bar', color='green', alpha=0.7)
plt.title('Top 20 Common Words')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.show()
The next step is to perform topic modelling on this dataset to identify key topics and transitions:
# topic Modeling using NMF
n_features = 1000
n_topics = 10
n_top_words = 10
tf_vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')
tf = tf_vectorizer.fit_transform(transcript_df['text'])
nmf = NMF(n_components=n_topics, random_state=42).fit(tf)
tf_feature_names = tf_vectorizer.get_feature_names_out()
def display_topics(model, feature_names, no_top_words):
topics = []
for topic_idx, topic in enumerate(model.components_):
topic_words = [feature_names[i] for i in topic.argsort()[:-no_top_words - 1:-1]]
topics.append(" ".join(topic_words))
return topics
topics = display_topics(nmf, tf_feature_names, n_top_words)
print("\nIdentified Topics:")
for i, topic in enumerate(topics):
print(f"Topic {i + 1}: {topic}")Identified Topics: Topic 1: muscle growth maximize pure good target train going better need Topic 2: week month pre order add body day doing build 10 Topic 3: tension target causes king rope exercise muscles technique example muscle Topic 4: failure way closer rep going shy really set better high Topic 5: bodybuilding program want pure new pre technique order use principles Topic 6: reps weight tank adding type add effective really case free Topic 7: exercises squat like bench month barbell cable high think squats Topic 8: sets hard push need maximize volume pull recovery body minute Topic 9: range motion need means using example use try usually shown Topic 10: training important ll hypertrophy strength resistance deep volume pure stretch
In the above code, we are performing topic modelling on the text data using Non-negative Matrix Factorization (NMF). It starts by defining the number of features and topics and then uses CountVectorizer to convert the text data into a matrix of token counts to filter out common English stop words. The NMF model is then fitted to this term-document matrix to identify a specified number of topics. The display_topics function extracts and prints the top words associated with each topic, which helps to interpret the main themes in the transcript.
Now, we will assign topics to each text segment:
# get topic distribution for each text segment topic_distribution = nmf.transform(tf) # align the lengths by trimming the extra row in topic_distribution topic_distribution_trimmed = topic_distribution[:len(transcript_df)] # compute the dominant topic for each text segment transcript_df['dominant_topic'] = topic_distribution_trimmed.argmax(axis=1)
In the above code, we are assigning topics to each text segment in the transcript based on the previously fitted NMF model. It starts by transforming the term-document matrix into a topic distribution for each text segment using the NMF model. It then ensures that the lengths of the topic distribution and the transcript dataframe match by trimming any extra rows in the topic distribution. Finally, it computes the dominant topic for each text segment by finding the topic with the highest value in the topic distribution and assigns this topic to the corresponding text segment in the transcript dataframe, which stores the results in a new column called ‘dominant_topic‘.
Now, we will analyze the content of each text segment to manually identify logical breaks:
# analyze the content of each text segment to manually identify logical breaks
logical_breaks = []
for i in range(1, len(transcript_df)):
if transcript_df['dominant_topic'].iloc[i] != transcript_df['dominant_topic'].iloc[i - 1]:
logical_breaks.append(transcript_df['start'].iloc[i])Next, we will consolidate the logical breaks into broader chapters:
# consolidate the logical breaks into broader chapters
threshold = 60 # seconds
consolidated_breaks = []
last_break = None
for break_point in logical_breaks:
if last_break is None or break_point - last_break >= threshold:
consolidated_breaks.append(break_point)
last_break = break_pointNext, we need to merge consecutive breaks with the same dominant topic:
# merge consecutive breaks with the same dominant topic
final_chapters = []
last_chapter = (consolidated_breaks[0], transcript_df['dominant_topic'][0])
for break_point in consolidated_breaks[1:]:
current_topic = transcript_df[transcript_df['start'] == break_point]['dominant_topic'].values[0]
if current_topic == last_chapter[1]:
last_chapter = (last_chapter[0], current_topic)
else:
final_chapters.append(last_chapter)
last_chapter = (break_point, current_topic)
final_chapters.append(last_chapter) # append the last chapterIn the above code, we are consolidating the chapter breaks by merging consecutive breaks that share the same dominant topic. It initializes the first chapter with the first break point and its corresponding topic. For each subsequent breakpoint, it checks if the dominant topic of the current segment is the same as the previous one. If they match, it continues the current chapter; if they differ, it finalizes the current chapter and starts a new one with the current breakpoint and topic. After processing all breakpoints, it ensures the final chapter is added to the list of chapters. This results in a list of chapters where each chapter consists of continuous segments with the same dominant topic.
Now, we will see the final video chapters according to the time stamps:
# Convert the final chapters to a readable time format
chapter_points = []
chapter_names = []
for i, (break_point, topic_idx) in enumerate(final_chapters):
chapter_time = pd.to_datetime(break_point, unit='s').strftime('%H:%M:%S')
chapter_points.append(chapter_time)
# get the context for the chapter name
chapter_text = transcript_df[(transcript_df['start'] >= break_point) & (transcript_df['dominant_topic'] == topic_idx)]['text'].str.cat(sep=' ')
# extract key phrases to create a chapter name
vectorizer = TfidfVectorizer(stop_words='english', max_features=3)
tfidf_matrix = vectorizer.fit_transform([chapter_text])
feature_names = vectorizer.get_feature_names_out()
chapter_name = " ".join(feature_names)
chapter_names.append(f"Chapter {i+1}: {chapter_name}")
# display the final chapter points with names
print("\nFinal Chapter Points with Names:")
for time, name in zip(chapter_points, chapter_names):
print(f"{time} - {name}")Final Chapter Points with Names:
00:00:01 - Chapter 1: failure going way
00:01:02 - Chapter 2: bodybuilding program want
00:02:02 - Chapter 3: motion need range
00:03:02 - Chapter 4: exercises fatigue high
00:04:04 - Chapter 5: hard push sets
00:05:06 - Chapter 6: hypertrophy ll training
00:06:08 - Chapter 7: failure really way
00:07:09 - Chapter 8: growth muscle target
00:09:15 - Chapter 9: biceps exercise tension
00:11:16 - Chapter 10: exercises guys month
00:12:19 - Chapter 11: bodybuilding program want
00:13:21 - Chapter 12: arm dedicated included
00:14:21 - Chapter 13: guys thank
In the above code, we are converting the final chapter breakpoints into a readable time format and generating meaningful names for each chapter. For each chapter breakpoint, it converts the breakpoint from seconds into a formatted time string and adds it to the chapter_points list. It then concatenates the text of all segments within the chapter to form the chapter text. Then, by using TF-IDF vectorization, it extracts the top three key phrases from this text to create a concise chapter name, which is appended to the chapter_names list. Finally, it prints the chapter points along with their generated names to provide a clear and readable structure for the video chapters.
Summary
So, this is how Video Chaptering works. Video chaptering is the process of dividing a video into distinct segments, each labelled with a specific title or chapter name, to enhance navigation and user experience.
I hope you liked this article on Video Chaptering using Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.





