Metro Operations Optimization refers to the systematic process of enhancing the efficiency, reliability, and effectiveness of Metro services through various data-driven techniques and operational adjustments. So, if you want to learn how to optimize the operations of a service like Metro using data-driven techniques, this article is for you. In this article, I’ll take you through the task of Metro Operations Optimization using Python.
Metro Operations Optimization: Getting Started
The problem I am working on in this article is my take on a problem statement I found at statso. The dataset contains several files, as follows:
- Agency: Information about the Delhi Metro Rail Corporation, including name, URL, and contact details.
- Calendar: Service schedules delineating the operation days (weekdays, weekends) and valid dates for these schedules.
- Routes: Details of metro routes, including short and long names, type of route, and descriptions.
- Shapes: Geographical coordinates of routes, providing the precise paths taken by the metro lines.
- Stop Times: Timetables for each trip indicating arrival and departure times at specific stops.
- Stops: Locations of metro stops, including latitude and longitude coordinates.
- Trips: Information linking trips to routes, including details like trip identifiers and associated route IDs.
You can download the dataset here.
For the task of Metro Operations Optimization, we will aim to identify the key areas where adjustments in train frequencies could significantly improve the service levels, reduce wait times, and alleviate overcrowding.
Metro Operations Optimization using Python
Now, let’s get started with the task of Metro Operations Optimization by importing the necessary Python libraries and the datasets:
import pandas as pd
# load all the data files
agency = pd.read_csv('agency.txt')
calendar = pd.read_csv('calendar.txt')
routes = pd.read_csv('routes.txt')
shapes = pd.read_csv('shapes.txt')
stop_times = pd.read_csv('stop_times.txt')
stops = pd.read_csv('stops.txt')
trips = pd.read_csv('trips.txt')
# show the first few rows and the structure of each dataframe
data_overviews = {
"agency": agency.head(),
"calendar": calendar.head(),
"routes": routes.head(),
"shapes": shapes.head(),
"stop_times": stop_times.head(),
"stops": stops.head(),
"trips": trips.head()
}
data_overviews{'agency': agency_id agency_name agency_url \
0 DMRC Delhi Metro Rail Corporation http://www.delhimetrorail.com/
agency_timezone agency_lang agency_phone agency_fare_url agency_email
0 Asia/Kolkata NaN NaN NaN NaN ,
'calendar': service_id monday tuesday wednesday thursday friday saturday sunday \
0 weekday 1 1 1 1 1 0 0
1 saturday 0 0 0 0 0 1 0
2 sunday 0 0 0 0 0 0 1
start_date end_date
0 20190101 20251231
1 20190101 20251231
2 20190101 20251231 ,
'routes': route_id agency_id route_short_name \
0 33 NaN R_SP_R
1 31 NaN G_DD_R
2 29 NaN P_MS_R
3 12 NaN M_JB
4 11 NaN P_MS
route_long_name route_desc route_type \
0 RAPID_Phase 3 (Rapid Metro) to Sector 55-56 (R... NaN 1
1 GRAY_Dhansa Bus Stand to Dwarka NaN 1
2 PINK_Shiv Vihar to Majlis Park NaN 1
3 MAGENTA_Janak Puri West to Botanical Garden NaN 1
4 PINK_Majlis Park to Shiv Vihar NaN 1
route_url route_color route_text_color route_sort_order \
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
continuous_pickup continuous_drop_off
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN ,
'shapes': shape_id shape_pt_lat shape_pt_lon shape_pt_sequence shape_dist_traveled
0 shp_1_2 28.615887 77.022461 1 0.000
1 shp_1_2 28.616341 77.022499 2 50.510
2 shp_1_2 28.617985 77.022453 3 233.586
3 shp_1_2 28.618252 77.022453 4 263.487
4 shp_1_2 28.618425 77.022438 5 282.857,
'stop_times': trip_id arrival_time departure_time stop_id stop_sequence stop_headsign \
0 0 05:28:08 05:28:28 21 0 NaN
1 0 05:30:58 05:31:18 20 1 NaN
2 0 05:33:28 05:33:48 19 2 NaN
3 0 05:35:33 05:35:53 18 3 NaN
4 0 05:37:53 05:38:13 17 4 NaN
pickup_type drop_off_type shape_dist_traveled timepoint \
0 0 0 0.000 1
1 0 0 1202.405 1
2 0 0 2480.750 1
3 0 0 3314.936 1
4 0 0 4300.216 1
continuous_pickup continuous_drop_off
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN ,
'stops': stop_id stop_code stop_name stop_desc stop_lat stop_lon
0 1 NaN Dilshad Garden NaN 28.675991 77.321495
1 2 NaN Jhilmil NaN 28.675648 77.312393
2 3 NaN Mansrover park NaN 28.675352 77.301178
3 4 NaN Shahdara NaN 28.673531 77.287270
4 5 NaN Welcome NaN 28.671986 77.277931,
'trips': route_id service_id trip_id trip_headsign trip_short_name direction_id \
0 0 weekday 0 NaN NaN NaN
1 0 weekday 1 NaN NaN NaN
2 0 weekday 10 NaN NaN NaN
3 0 weekday 100 NaN NaN NaN
4 2 weekday 1000 NaN NaN NaN
block_id shape_id wheelchair_accessible bikes_allowed
0 NaN shp_1_30 0 0
1 NaN shp_1_30 0 0
2 NaN shp_1_30 0 0
3 NaN shp_1_30 0 0
4 NaN shp_1_13 0 0 }
Let’s start by analyzing the routes and operations of Delhi Metro to understand how it operates. We’ll start by:
- Plotting the geographical routes on a map to visualize the network.
- Examining the frequency and scheduling of trips across different days of the week.
- Analyzing coverage by examining the distribution of stops and their connectivity.
I’ll start by plotting the geographical paths of different routes on a map to visualize how the Delhi Metro covers the area:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.scatterplot(x='shape_pt_lon', y='shape_pt_lat', hue='shape_id', data=shapes, palette='viridis', s=10, legend=None)
plt.title('Geographical Paths of Delhi Metro Routes')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid(True)
plt.show()
The map visualization shows the geographical paths of the Delhi Metro routes. Each line (identified by a unique colour) represents the path of a different route defined in the shapes dataset. This visual helps to understand how the Metro covers the geographical area of Delhi, demonstrating the connectivity and spread of the network.
Next, I will examine the frequency and scheduling of trips across different days of the week by analyzing the calendar and trip data:
# merge trips with calendar to include the day of operation information
trips_calendar = pd.merge(trips, calendar, on='service_id', how='left')
# count the number of trips per day of the week
trip_counts = trips_calendar[['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']].sum()
# Plotting
plt.figure(figsize=(10, 6))
trip_counts.plot(kind='bar', color='skyblue')
plt.title('Number of Trips per Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Number of Trips')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()
The bar chart illustrates the number of trips scheduled for each day of the week for the Delhi Metro. As we can see, the number of trips from Monday to Friday is consistent, indicating a stable weekday schedule designed to accommodate regular commuter traffic. In contrast, the trips decrease slightly on Saturday and even more on Sunday, reflecting lower demand or a reduced service schedule on weekends.
This finding suggests that the Delhi Metro strategically scales its operations based on the expected daily ridership, which likely peaks during weekdays due to work and school commutes.
To further analyze the connectivity and effectiveness of the route strategy, I will analyze the distribution and connectivity of stops:
# plotting the locations of the stops
plt.figure(figsize=(10, 10))
sns.scatterplot(x='stop_lon', y='stop_lat', data=stops, color='red', s=50, marker='o')
plt.title('Geographical Distribution of Delhi Metro Stops')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid(True)
plt.show()
The scatter plot above shows the geographical distribution of Delhi Metro stops. Each red dot represents a metro stop, and their distribution across the map illustrates how the stops cover different areas of Delhi.
From the plot, we can see a widespread distribution, suggesting that the Delhi Metro provides good spatial coverage, allowing access to a broad area and facilitating efficient travel across the city.
The stops are fairly evenly spread, but there are denser clusters, which indicates areas with higher transit needs or central hubs that connect multiple lines.
Now, let’s analyze the route complexity. I will analyze how many routes pass through each stop, which can highlight key transfer points and central hubs within the Delhi Metro network:
# merge stops with stop_times to link each stop with trips, and then merge with trips to get route information
stops_with_routes = pd.merge(pd.merge(stop_times, trips, on='trip_id'), routes, on='route_id')
# count how many unique routes pass through each stop
stop_route_counts = stops_with_routes.groupby('stop_id')['route_id'].nunique().reset_index()
stop_route_counts = stop_route_counts.rename(columns={'route_id': 'number_of_routes'})
# merge this with stops to get the names and location for plotting
stop_route_counts = pd.merge(stop_route_counts, stops, on='stop_id')
# plot the number of routes per stop
plt.figure(figsize=(10, 10))
sns.scatterplot(x='stop_lon', y='stop_lat', size='number_of_routes', hue='number_of_routes',
sizes=(50, 500), alpha=0.5, palette='coolwarm', data=stop_route_counts)
plt.title('Number of Routes per Metro Stop in Delhi')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.legend(title='Number of Routes')
plt.grid(True)
plt.show()
The scatter plot above represents the number of routes that pass through each Delhi Metro stop. Stops are visualized in different colours and sizes based on the number of routes they connect, providing insights into the complexity of the network at various locations. Key observations are:
- Hubs and Transfer Points: Larger circles (in warmer colours) indicate stops where multiple routes intersect. These stops serve as major transfer points within the network, facilitating easier cross-city travel for passengers.
- Distribution: Stops with fewer routes, shown in cooler colours and smaller sizes, tend to be more peripheral or on less busy lines. The central areas and more populated zones have stops with greater connectivity.
Next, let’s analyze the service frequency by examining the timing intervals between trips during different parts of the day. It will help us understand peak and off-peak operational strategies, which are crucial for managing commuter flow efficiently. Let’s calculate and visualize these intervals:
# converting stop_times 'arrival_time' from string to datetime.time for easier manipulation
import datetime as dt
# function to convert time string to datetime.time
def convert_to_time(time_str):
try:
return dt.datetime.strptime(time_str, '%H:%M:%S').time()
except ValueError:
# Handle cases where the hour might be greater than 23 (e.g., 24:00:00 or 25:00:00)
hour, minute, second = map(int, time_str.split(':'))
return dt.time(hour % 24, minute, second)
stop_times['arrival_time_dt'] = stop_times['arrival_time'].apply(convert_to_time)
# calculate the difference in arrival times for subsequent trips at each stop
stop_times_sorted = stop_times.sort_values(by=['stop_id', 'arrival_time_dt'])
stop_times_sorted['next_arrival_time'] = stop_times_sorted.groupby('stop_id')['arrival_time_dt'].shift(-1)
# function to calculate the difference in minutes between two times
def time_difference(time1, time2):
if pd.isna(time1) or pd.isna(time2):
return None
full_date_time1 = dt.datetime.combine(dt.date.today(), time1)
full_date_time2 = dt.datetime.combine(dt.date.today(), time2)
return (full_date_time2 - full_date_time1).seconds / 60
stop_times_sorted['interval_minutes'] = stop_times_sorted.apply(lambda row: time_difference(row['arrival_time_dt'], row['next_arrival_time']), axis=1)
# drop NaN values from intervals (last trip of the day)
stop_times_intervals = stop_times_sorted.dropna(subset=['interval_minutes'])
# average intervals by time of day (morning, afternoon, evening)
def part_of_day(time):
if time < dt.time(12, 0):
return 'Morning'
elif time < dt.time(17, 0):
return 'Afternoon'
else:
return 'Evening'
stop_times_intervals['part_of_day'] = stop_times_intervals['arrival_time_dt'].apply(part_of_day)
average_intervals = stop_times_intervals.groupby('part_of_day')['interval_minutes'].mean().reset_index()
plt.figure(figsize=(8, 6))
sns.barplot(x='part_of_day', y='interval_minutes', data=average_intervals, order=['Morning', 'Afternoon', 'Evening'], palette='mako')
plt.title('Average Interval Between Trips by Part of Day')
plt.xlabel('Part of Day')
plt.ylabel('Average Interval (minutes)')
plt.grid(True)
plt.show()
The bar chart above displays the average interval between trips on the Delhi Metro during different parts of the day: morning, afternoon, and evening. This data provides insight into the service frequency and how it is adjusted based on the expected demand throughout the day. Key findings are:
- Morning: Shorter intervals between trips are observed in the morning hours, which likely corresponds to the morning rush hour when commuters are heading to work or school.
- Afternoon: The intervals increase slightly during the afternoon, which may indicate a reduction in demand after the morning peak.
- Evening: In the evening, intervals decrease again, likely to accommodate the evening rush hour as people return home.
Now, let’s calculate the number of trips and the available capacity per time interval. It will give us a basic understanding of how service levels vary throughout the day. We’ll classify the intervals as:
- Early Morning: Before 6 AM
- Morning Peak: 6 AM to 10 AM
- Midday: 10 AM to 4 PM
- Evening Peak: 4 PM to 8 PM
- Late Evening: After 8 PM
Let’s calculate the supply in terms of trips during these time intervals using the stop_times data:
# define time intervals for classification
def classify_time_interval(time):
if time < dt.time(6, 0):
return 'Early Morning'
elif time < dt.time(10, 0):
return 'Morning Peak'
elif time < dt.time(16, 0):
return 'Midday'
elif time < dt.time(20, 0):
return 'Evening Peak'
else:
return 'Late Evening'
# apply time interval classification
stop_times['time_interval'] = stop_times['arrival_time_dt'].apply(classify_time_interval)
# count the number of trips per time interval
trips_per_interval = stop_times.groupby('time_interval')['trip_id'].nunique().reset_index()
trips_per_interval = trips_per_interval.rename(columns={'trip_id': 'number_of_trips'})
# sorting the dataframe
ordered_intervals = ['Early Morning', 'Morning Peak', 'Midday', 'Evening Peak', 'Late Evening']
trips_per_interval['time_interval'] = pd.Categorical(trips_per_interval['time_interval'], categories=ordered_intervals, ordered=True)
trips_per_interval = trips_per_interval.sort_values('time_interval')
# plotting the number of trips per time interval
plt.figure(figsize=(10, 6))
sns.barplot(x='time_interval', y='number_of_trips', data=trips_per_interval, palette='Set2')
plt.title('Number of Trips per Time Interval')
plt.xlabel('Time Interval')
plt.ylabel('Number of Trips')
plt.grid(True)
plt.show()
The bar chart displays the number of trips scheduled for each time interval in the Delhi Metro system. From this visualization, we can observe the following:
- Early Morning: There is a relatively low number of trips, indicating less demand or fewer scheduled services during these hours.
- Morning Peak: There is a significant increase in the number of trips compared to the early morning, likely due to morning commute hours as people travel to work or school.
- Midday: The number of trips remains high, perhaps sustaining the morning rush or due to other midday travel needs.
- Evening Peak: This period sees a slight decrease compared to midday but remains one of the busier times, probably reflecting the evening commute.
- Late Evening: The number of trips drops again, but remains higher than in the early morning, likely catering to people returning home late or involved in evening activities.
Optimizing Operations to Reduce Overcrowding in Metro
Now, as we have analyzed the dataset, let’s build a data-driven strategy to reduce the overcrowding in Delhi Metro. To reduce the overcrowding in the Delhi Metro, we can adjust train frequencies based on time interval analysis.
We’ve already analyzed the number of trips during different time intervals, which provides a clear picture of the existing supply. By cross-referencing this data with inferred demand (e.g., crowd levels observed at platforms, as actual passenger data isn’t available), adjustments can be made. For instance, if certain time intervals like the morning or evening peaks show signs of overcrowding, increasing the number of trips or adjusting the timing of the trips could help alleviate this issue.
Let’s start with refining the frequencies of trains during peak and off-peak hours based on the trip frequency analysis we performed earlier. I’ll create a hypothetical scenario where we adjust the frequencies during these times based on assumed passenger loads. An assumption here is that morning and evening peaks need a 20% increase in service, while midday and late evening might handle a 10% reduction without impacting passenger service negatively.
Let’s calculate and visualize these adjustments:
# adjusting frequencies based on hypothetical scenario
adjusted_trips_per_interval = trips_per_interval.copy()
adjustment_factors = {'Morning Peak': 1.20, 'Evening Peak': 1.20, 'Midday': 0.90, 'Early Morning': 1.0, 'Late Evening': 0.90}
# apply the adjustments
adjusted_trips_per_interval['adjusted_number_of_trips'] = adjusted_trips_per_interval.apply(
lambda row: int(row['number_of_trips'] * adjustment_factors[row['time_interval']]), axis=1)
# plotting original vs adjusted number of trips per time interval
plt.figure(figsize=(12, 6))
bar_width = 0.35
r1 = range(len(adjusted_trips_per_interval))
r2 = [x + bar_width for x in r1]
plt.bar(r1, adjusted_trips_per_interval['number_of_trips'], color='blue', width=bar_width, edgecolor='grey', label='Original')
plt.bar(r2, adjusted_trips_per_interval['adjusted_number_of_trips'], color='cyan', width=bar_width, edgecolor='grey', label='Adjusted')
plt.xlabel('Time Interval', fontweight='bold')
plt.ylabel('Number of Trips', fontweight='bold')
plt.xticks([r + bar_width/2 for r in range(len(adjusted_trips_per_interval))], adjusted_trips_per_interval['time_interval'])
plt.title('Original vs Adjusted Number of Trips per Time Interval')
plt.legend()
plt.show()
The bar chart illustrates the original versus adjusted number of trips per time interval for the Delhi Metro, based on our hypothetical adjustments to better align service levels with inferred demand:
- Morning and Evening Peaks: We increased the number of trips by 20%, anticipating higher demand during these hours. This adjustment aims to reduce overcrowding and improve passenger comfort and service reliability.
- Midday and Late Evening: We decreased the trips by 10%, assuming that the demand drops during these times, allowing for more efficient use of resources without significantly impacting service quality.
By implementing these adjustments, Delhi Metro can potentially improve operational efficiency and customer satisfaction, especially during peak hours.
Summary
Metro Operations Optimization refers to the systematic process of enhancing the efficiency, reliability, and effectiveness of Metro services through various data-driven techniques and operational adjustments. Our approach to optimizing metro operations showcases how analyzing existing data can guide decisions in real-time scheduling adjustments, even in the absence of explicit passenger count data.
I hope you liked this article on Metro Operations Optimization using Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.





