Stock Market Crash Analysis with Python

Stock market crashes are periods of rapid and significant declines in market values. Understanding these events and their precursors can provide valuable insights into market dynamics and help develop early warning systems. In this article, I’ll take you through stock market crash analysis with Python and how to build an early warning system to identify crashes in the stock market.

Stock Market Crash Analysis: An Overview

In this task, we will use 30 years of historical Indian stock market data. We will focus on the Sensex to analyze and understand market crashes.

We will start by cleaning and processing the financial data for analysis. Then, we will calculate daily returns, drawdowns, and rolling 10-day average returns and volatility. These metrics will help us identify and cluster periods of high market stress. We will highlight major crash events such as those in 1997, 2008–2009, and 2020. Using rolling metrics, we will build an early warning system to detect signs of an upcoming crash. This system will help data scientists and analysts manage risk and make informed investment decisions.

You can find the dataset we will use for this task from here.

Getting Started with Stock Market Crash Analysis with Python

So, let’s get started with the task of stock market crash analysis by importing the necessary Python libraries and the dataset:

import pandas as pd
import numpy as np

df = pd.read_csv('cleaned_sensex.csv')
print(df.head())

         Date        Close         High          Low         Open  Volume
0  1997-07-01  4300.859863  4301.770020  4247.660156  4263.109863       0
1  1997-07-02  4333.899902  4395.310059  4295.399902  4302.959961       0
2  1997-07-03  4323.459961  4393.290039  4299.970215  4335.790039       0
3  1997-07-04  4323.819824  4347.589844  4300.580078  4332.700195       0
4  1997-07-07  4291.450195  4391.009766  4289.490234  4326.810059       0

Let’s convert the ‘Date’ column to datetime, sort the data by date, and set it as the DataFrame index:

if 'Date' in df.columns:
    df['Date'] = pd.to_datetime(df['Date'])
    df = df.sort_values('Date')
    df.set_index('Date', inplace=True)

print("\nDataFrame after processing the Date column:")
print(df.head())

DataFrame after processing the Date column:
                  Close         High          Low         Open  Volume
Date                                                                  
1997-07-01  4300.859863  4301.770020  4247.660156  4263.109863       0
1997-07-02  4333.899902  4395.310059  4295.399902  4302.959961       0
1997-07-03  4323.459961  4393.290039  4299.970215  4335.790039       0
1997-07-04  4323.819824  4347.589844  4300.580078  4332.700195       0
1997-07-07  4291.450195  4391.009766  4289.490234  4326.810059       0

The code first checks if the DataFrame contains a ‘Date’ column. It then converts the values in the ‘Date’ column to datetime objects. This step ensures the dates have the correct format for time series analysis. Next, the code sorts the DataFrame by the ‘Date’ column in chronological order. Finally, it sets the ‘Date’ column as the index to simplify time series operations.

Analyzing Crashes

Now, let’s calculate the daily percentage change and identify crash days:

df['Daily_Return'] = df['Close'].pct_change() * 100

# Define a threshold for a daily crash (e.g., drop more than 5%)
crash_threshold_daily = -5
df['Crash_Daily'] = df['Daily_Return'] <= crash_threshold_daily

The code first calculates the daily percentage change in the ‘Close’ column using the pct_change() function. It then multiplies the result by 100 to convert the changes into percentages. The resulting values are stored in a new column called ‘Daily_Return’. Next, the code defines a threshold of -5% to identify significant daily drops. These drops are flagged as potential crash events for further analysis.

The code then creates a boolean column named ‘Crash_Daily’ to flag extreme daily declines. It marks a day as True if the daily return is less than or equal to -5%. Otherwise, it marks the day as False. This step helps quickly identify days with significant market drops.

Now, let’s have a look at the Sensex closing prices and highlight daily crashes:

import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index, y=df['Close'], mode='lines', name='Closing Price'))

crash_days = df.index[df['Crash_Daily']]
crash_closes = df['Close'][df['Crash_Daily']]

fig.add_trace(go.Scatter(x=crash_days, y=crash_closes, mode='markers',
                          marker=dict(color='red'),
                          name=f'Daily Crash (<= {crash_threshold_daily}%)'))

fig.update_layout(
    title='Sensex Closing Price with Daily Crashes Highlighted',
    xaxis_title='Date',
    yaxis_title='Sensex Close',
    xaxis_rangeslider_visible=True,  
    template="plotly_white" 

)

fig.show()

Stock Market Crash Analysis with Python: Sensex Closing Price with Daily Crashes Highlighted

The data shows that major crashes often occur during global or domestic economic crises like the 2008 meltdown and COVID-19 in 2020. These crashes usually happen suddenly and sharply, breaking long-term upward trends. Despite several crashes over the years, the Sensex shows strong long-term growth. This pattern highlights the market’s resilience and ability to recover after a crisis.

Analyzing Drawdowns

Next, let’s calculate the cumulative maximum and drawdown (percentage drop from the cumulative maximum):

# Calculate cumulative max and drawdown (% drop from the cumulative max)
df['Cumulative_Max'] = df['Close'].cummax()
df['Drawdown'] = (df['Close'] - df['Cumulative_Max']) / df['Cumulative_Max'] * 100

The code first calculates the cumulative maximum of the ‘Close’ prices using the cummax() function. This tracks the highest closing price reached up to each point in time. It then computes the drawdown by subtracting the cumulative maximum from the current closing price. After that, it divides the result by the cumulative maximum and converts it into a percentage.

This gives a measure of how far the current price is below its historical peak, indicating the severity of any decline.

Now, let’s have a look at the market drawdown over time and highlight a specific drawdown threshold:

drawdown_threshold = -20
fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index, y=df['Drawdown'], mode='lines', name='Drawdown (%)'))

fig.add_hline(y=drawdown_threshold, line_dash='dash', line_color='red',
              annotation_text=f'Drawdown Threshold ({drawdown_threshold}%)',
              annotation_position='bottom right')

fig.update_layout(
    title='Market Drawdown Over Time',
    xaxis_title='Date',
    yaxis_title='Drawdown (%)',
    xaxis_rangeslider_visible=True,  
    template="plotly_white"
)

fig.show()

So, the major crashes are clearly visible around 2000, 2008–2009, and 2020, where drawdowns exceeded -50%, indicating severe market stress. These deep troughs correspond to global financial crises and the pandemic. While smaller drawdowns occur frequently, the most impactful market crashes are rare but sharp, often taking years to recover.

Next, let’s identify the dates where the market drawdown exceeded the -20% threshold and print a sample of these events:

crash_drawdowns = df[df['Drawdown'] <= drawdown_threshold]
print("Dates where drawdown exceeded threshold:")
print(crash_drawdowns[['Close', 'Cumulative_Max', 'Drawdown']].dropna().head(10))

Dates where drawdown exceeded threshold:
                  Close  Cumulative_Max   Drawdown
Date                                              
1997-11-12  3633.179932      4548.02002 -20.115129
1997-11-13  3554.100098      4548.02002 -21.853904
1997-11-14  3569.770020      4548.02002 -21.509360
1997-11-17  3578.100098      4548.02002 -21.326202
1997-11-18  3518.899902      4548.02002 -22.627871
1997-11-19  3454.649902      4548.02002 -24.040574
1997-11-20  3466.860107      4548.02002 -23.772101
1997-11-21  3523.439941      4548.02002 -22.528047
1997-11-24  3403.070068      4548.02002 -25.174690
1997-11-25  3479.889893      4548.02002 -23.485607

Here, we filtered the DataFrame to select rows where the ‘Drawdown’ value is less than or equal to the defined threshold (-20%). It then prints a subset of columns: ‘Close’, ‘Cumulative_Max’, and ‘Drawdown’, for these filtered dates, after dropping any missing values. This allows us to quickly see the first ten instances when the drawdown condition was met, providing insight into the periods of significant market decline.

Analyzing Periods of Crashes

We will now further zoom into these periods of crashes for a more detailed analysis to compare patterns across different crash events.

So, let’s identify contiguous periods (clusters) where the drawdown is below the threshold:

crash_dates = df.index[df['Drawdown'] <= drawdown_threshold]
clusters = []
current_cluster = []

for date in crash_dates:
    if not current_cluster:
        current_cluster.append(date)
    else:
        if (date - current_cluster[-1]).days <= 3:
            current_cluster.append(date)
        else:
            clusters.append(current_cluster)
            current_cluster = [date]
if current_cluster:
    clusters.append(current_cluster)

print("Identified crash clusters based on drawdown threshold:")
for idx, cluster in enumerate(clusters):
    print(f"Cluster {idx+1}: {cluster[0].date()} to {cluster[-1].date()} (Total days: {len(cluster)})")

Identified crash clusters based on drawdown threshold:
Cluster 1: 1997-11-12 to 1997-12-26 (Total days: 31)
Cluster 2: 1997-12-30 to 1997-12-30 (Total days: 1)
Cluster 3: 1998-01-08 to 1998-01-23 (Total days: 12)
Cluster 4: 1998-01-27 to 1998-01-29 (Total days: 3)
Cluster 5: 1998-02-02 to 1998-02-27 (Total days: 19)
Cluster 6: 1998-06-02 to 1998-09-30 (Total days: 86)
...
Cluster 82: 2020-05-04 to 2020-05-22 (Total days: 15)
Cluster 83: 2020-05-26 to 2020-06-01 (Total days: 5)
Cluster 84: 2020-06-11 to 2020-06-11 (Total days: 1)
Cluster 85: 2020-06-15 to 2020-06-17 (Total days: 2)

This part starts by extracting the dates when the drawdown exceeds the defined threshold (i.e. when the drawdown is less than or equal to -20%). It then iterates over these dates to group them into clusters. A new cluster is started when there is a gap of more than three days between consecutive crash dates (to account for weekends or holidays). Each cluster represents a contiguous period of market stress, and the code finally prints the start and end dates of each cluster along with the total number of days in that cluster.

Let’s zoom into the first crash cluster and visualize the closing price, daily returns, and drawdown within a 30-day window before and after the cluster:

if clusters:
    cluster_start = clusters[0][0]
    cluster_end = clusters[0][-1]
    print(f"\nZooming into crash cluster from {cluster_start.date()} to {cluster_end.date()}")

    zoom_start = cluster_start - pd.Timedelta(days=30)
    zoom_end = cluster_end + pd.Timedelta(days=30)
    zoom_df = df.loc[zoom_start:zoom_end]

    fig_close = go.Figure()
    fig_close.add_trace(go.Scatter(x=zoom_df.index, y=zoom_df['Close'], mode='lines', name='Closing Price'))
    fig_close.add_vrect(x0=cluster_start, x1=cluster_end, fillcolor='red', opacity=0.3, layer='below', line_width=0, name='Crash Period')
    fig_close.update_layout(
        title=f'Sensex Closing Price from {zoom_start.date()} to {zoom_end.date()}',
        xaxis_title='Date',
        yaxis_title='Sensex Close',
        xaxis_rangeslider_visible=True,
        template="plotly_white"
    )
    fig_close.show()

    fig_returns = go.Figure()
    fig_returns.add_trace(go.Scatter(x=zoom_df.index, y=zoom_df['Daily_Return'], mode='lines', name='Daily Return (%)'))
    fig_returns.add_hline(y=0, line_color='black', line_width=0.8)
    fig_returns.add_vrect(x0=cluster_start, x1=cluster_end, fillcolor='red', opacity=0.3, layer='below', line_width=0, name='Crash Period')
    fig_returns.update_layout(
        title=f'Sensex Daily Returns from {zoom_start.date()} to {zoom_end.date()}',
        xaxis_title='Date',
        yaxis_title='Daily Return (%)',
        xaxis_rangeslider_visible=True,
        template="plotly_white"
    )
    fig_returns.show()

    fig_drawdown = go.Figure()
    fig_drawdown.add_trace(go.Scatter(x=zoom_df.index, y=zoom_df['Drawdown'], mode='lines', name='Drawdown (%)'))
    fig_drawdown.add_hline(y=drawdown_threshold, line_dash='dash', line_color='red',
                          annotation_text=f'Drawdown Threshold ({drawdown_threshold}%)',
                          annotation_position='bottom right')
    fig_drawdown.add_vrect(x0=cluster_start, x1=cluster_end, fillcolor='red', opacity=0.3, layer='below', line_width=0, name='Crash Period')
    fig_drawdown.update_layout(
        title=f'Sensex Drawdown from {zoom_start.date()} to {zoom_end.date()}',
        xaxis_title='Date',
        yaxis_title='Drawdown (%)',
        xaxis_rangeslider_visible=True,
        template="plotly_white"
    )
    fig_drawdown.show()

else:
    print("No crash clusters identified based on the drawdown threshold.")

The closing price chart shows a clear downward trend as the market moves from above 4000 to the mid-3000 range. The daily returns chart captures short-term volatility, oscillating around zero and reflecting larger negative spikes during the crash period. Finally, the drawdown chart reveals how far the index has fallen from its previous peak, crossing below the -20% threshold in the red-shaded region and signalling a significant market decline.

Now, let’s define specific crash periods and create functions to visualize the closing prices and daily returns for these periods with a 30-day zoom window:

# Define the clusters
cluster1_start = pd.to_datetime("1997-11-12")
cluster1_end = pd.to_datetime("1997-12-26")

cluster49_start = pd.to_datetime("2008-08-18")
cluster49_end = pd.to_datetime("2009-01-23")

cluster79_start = pd.to_datetime("2020-03-16")
cluster79_end = pd.to_datetime("2020-04-03")

def plot_crash_period(cluster_start, cluster_end, title_suffix=""):
    zoom_start = cluster_start - pd.Timedelta(days=30)
    zoom_end = cluster_end + pd.Timedelta(days=30)
    zoom_df = df.loc[zoom_start:zoom_end]

    fig = go.Figure()
    fig.add_trace(go.Scatter(x=zoom_df.index, y=zoom_df['Close'], mode='lines', name='Closing Price'))
    fig.add_vrect(x0=cluster_start, x1=cluster_end, fillcolor='red', opacity=0.3, layer='below', line_width=0, name='Crash Period')
    fig.update_layout(
        title=f'Sensex Closing Price {title_suffix}<br>({zoom_start.date()} to {zoom_end.date()})',
        xaxis_title='Date',
        yaxis_title='Sensex Close',
        xaxis_rangeslider_visible=True,
        template="plotly_white"
    )
    fig.show()

def plot_daily_returns(cluster_start, cluster_end, title_suffix=""):
    zoom_start = cluster_start - pd.Timedelta(days=30)
    zoom_end = cluster_end + pd.Timedelta(days=30)
    zoom_df = df.loc[zoom_start:zoom_end]

    fig = go.Figure()
    fig.add_trace(go.Scatter(x=zoom_df.index, y=zoom_df['Daily_Return'], mode='lines', name='Daily Return (%)'))
    fig.add_hline(y=0, line_color='black', line_width=0.8)
    fig.add_vrect(x0=cluster_start, x1=cluster_end, fillcolor='red', opacity=0.3, layer='below', line_width=0, name='Crash Period')
    fig.update_layout(
        title=f'Sensex Daily Returns {title_suffix}<br>({zoom_start.date()} to {zoom_end.date()})',
        xaxis_title='Date',
        yaxis_title='Daily Return (%)',
        xaxis_rangeslider_visible=True,
        template="plotly_white"
    )
    fig.show()

# Plot for each crash period
plot_crash_period(cluster1_start, cluster1_end, "1997 Crash")
plot_crash_period(cluster49_start, cluster49_end, "2008-2009 Crash")
plot_crash_period(cluster79_start, cluster79_end, "2020 Crash")

# Plot daily returns for each crash period
plot_daily_returns(cluster1_start, cluster1_end, "1997 Crash")
plot_daily_returns(cluster49_start, cluster49_end, "2008-2009 Crash")
plot_daily_returns(cluster79_start, cluster79_end, "2020 Crash")

Stock Market Crash Analysis with Python: Sensex Closing Price

Stock Market Crash Analysis: Sensex Daily Returns

Together, these visualizations illustrate how the market experienced sharp declines in closing prices and significant volatility in daily returns during the crash events, offering valuable insights into market behaviour surrounding these downturns.

Developing Early Warning Signals

Let’s build an early warning system (EWS) based on some common pre-crash indicators. In our historical analysis, we observed that before major downturns:

The daily returns tend to show persistent declines.
The volatility (i.e., the rolling standard deviation of returns) tends to increase.

From historical observations, we can set thresholds (these thresholds are heuristic and can be fine-tuned):

If the 10-day moving average return falls below -0.5%, and
The 10-day volatility exceeds, say, 2%,

Then the system will trigger an early warning signal.

We’ll assume that we have a dataset for 2025 (here, we’ll simulate a synthetic dataset for 2025 based on our historical performance). We’ll apply our EWS to this data to see if any warning conditions are met.

So, let’s simulate synthetic 2025 data for approximately 250 trading days:

np.random.seed(42)

dates_2025 = pd.bdate_range(start="2025-01-01", periods=250)

daily_returns = np.zeros(250)
daily_returns[:150] = np.random.normal(loc=0.0005, scale=0.01, size=150)
daily_returns[150:200] = np.random.normal(loc=-0.008, scale=0.025, size=50)
daily_returns[200:] = np.random.normal(loc=0.0005, scale=0.01, size=50)

prices = [30000]
for ret in daily_returns:
    prices.append(prices[-1]*(1+ret))
prices = prices[1:]

df_2025 = pd.DataFrame({
    'Date': dates_2025,
    'Close': prices,
    'Daily_Return': daily_returns * 100 
})
df_2025.set_index('Date', inplace=True)

This section begins by setting a random seed to ensure consistent synthetic data across runs. It then generates a date range for 2025 using business days, assuming 250 trading days in total. The code creates synthetic daily returns to reflect different market conditions. The first 150 days simulate normal behavior with a slight positive drift and low volatility. Days 151–200 reflect pre-crash conditions with a negative drift and higher volatility. Days 201–250 shift back to normal market behavior.

Starting with an initial closing price of 30,000, the code calculates the synthetic closing prices by compounding these returns over time. Finally, the data is assembled into a Pandas DataFrame with the ‘Date’ column set as the index, ready for further analysis.

Next, we will compute the 10-day rolling average return and rolling volatility for the synthetic 2025 data, and then define and identify early warning signals based on these metrics:

df_2025['Rolling_Mean_Return'] = df_2025['Daily_Return'].rolling(window=10).mean()
df_2025['Rolling_Volatility'] = df_2025['Daily_Return'].rolling(window=10).std()

warning_condition = (df_2025['Rolling_Mean_Return'] < -0.5) & (df_2025['Rolling_Volatility'] > 2)
df_2025['Warning'] = warning_condition

warnings_df = df_2025[df_2025['Warning']]
print("Early Warning Signals for 2025:")
print(warnings_df[['Close', 'Daily_Return', 'Rolling_Mean_Return', 'Rolling_Volatility']].head(15))

We calculated two key rolling metrics over a 10-day window for the synthetic 2025 dataset: the average daily return (Rolling_Mean_Return) and the volatility (standard deviation of daily returns, Rolling_Volatility). We then defined an early warning condition to flag periods when the 10-day rolling average return falls below -0.5% and volatility rises above 2%. These conditions create a new boolean column, ‘Warning’, which indicates when the warning criteria are met.

Finally, we filtered and found the first 15 instances from the DataFrame where an early warning signal occurs, showing the relevant metrics (closing price, daily return, rolling mean return, and rolling volatility).

Now, let’s have a look at the synthetic 2025 Sensex closing price and mark the early warning signals:

fig = go.Figure()

fig.add_trace(go.Scatter(x=df_2025.index, y=df_2025['Close'], mode='lines', name='Closing Price'))

warning_dates = df_2025.index[df_2025['Warning']]
warning_closes = df_2025['Close'][df_2025['Warning']]

fig.add_trace(go.Scatter(x=warning_dates, y=warning_closes, mode='markers',
                          marker=dict(color='red'),
                          name='Early Warning Signal'))

fig.update_layout(
    title='Synthetic 2025 Sensex Closing Price with Early Warning Signals',
    xaxis_title='Date',
    yaxis_title='Sensex Close',
    xaxis_rangeslider_visible=True, 
    template="plotly_white"
)

fig.show()

Stock Market Crash Analysis: Synthetic 2025 Sensex Closing Price with Early Warning Signals

The early warning signals, marked in red, appear shortly before and during the crash, suggesting potential predictive indicators of market stress. The market starts the year strong but gradually weakens, with the crash accelerating after a series of warning signals. This simulated scenario demonstrates the value of early detection tools in identifying the onset of a crash, enabling investors or systems to potentially act before severe losses occur.

Summary

So, through this analysis, we learned about:

Identifying market crashes using daily percentage changes and drawdown methods.
Grouping consecutive days of market stress into distinct crash clusters.
Visualizing and comparing different crash periods to uncover unique market behaviours.
Developing an early warning system using rolling statistics of returns and volatility.

I hope you liked this article on stock market crash analysis with Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.