Stock Market Anomaly Detection using Python

Stock market anomaly detection refers to the process of identifying unusual patterns or behaviours in stock market data that deviate significantly from the expected norm. The key is that these events are unexpected and can lead to significant price movements or unusual trading volumes. So, if you want to learn how to detect, analyze, and interpret anomalies in the stock market, this article is for you. In this article, I’ll take you through the task of Stock Market Anomaly Detection using Python.

Stock Market Anomaly Detection: Process We Can Follow

Anomalies in the stock market are important because they can indicate opportunities or risks. For example, a sudden spike in a stock’s price could be due to positive news about the company or its industry, which signals a potential investment opportunity. Conversely, an unexpected price drop could warn of underlying issues or market sentiment changes, which signals a risk that investors may need to manage.

Below is the process we can follow for the task of stock market anomaly detection:

Gather historical stock market data, including prices (open, high, low, close, adjusted close) and trading volumes.
Develop additional features that may help in detecting anomalies, such as moving averages, relative strength index (RSI), or percentage changes over specific periods.
Visualize the data to identify potential outliers or unusual patterns across time.
Employ statistical methods like Z-score analysis, where data points that are a certain number of standard deviations away from the mean are flagged as anomalies.
Use the insights gained from anomaly detection to inform investment decisions, risk management, and strategic planning.

I will collect real-time stock market data for this task using the yfinance API. Still, if you need a dataset for this task, download it from here.

Collecting Real-time Stock Market Data using Python

Before getting started with Stock Market Anomaly Detection, I’ll collect real-time stock market data of several companies. I’ll use the yfinance API for this task. If you haven’t used it before, you can install it on your Python virtual environment by using the command mentioned below on your terminal or the command prompt:

pip install yfinance

Below is how we can collect real-time stock market data using Python:

import pandas as pd
import yfinance as yf
from datetime import date, timedelta

# define the time period for the data
end_date = date.today().strftime("%Y-%m-%d")
start_date = (date.today() - timedelta(days=365)).strftime("%Y-%m-%d")

# list of stock tickers to download
tickers = ['AAPL', 'MSFT', 'NFLX', 'GOOG', 'TSLA']

data = yf.download(tickers, start=start_date, end=end_date, progress=False)

# reset index to bring Date into the columns for the melt function
data = data.reset_index()

# melt the DataFrame to make it long format where each row is a unique combination of Date, Ticker, and attributes
data_melted = data.melt(id_vars=['Date'], var_name=['Attribute', 'Ticker'])

# pivot the melted DataFrame to have the attributes (Open, High, Low, etc.) as columns
data_pivoted = data_melted.pivot_table(index=['Date', 'Ticker'], columns='Attribute', values='value', aggfunc='first')

# reset index to turn multi-index into columns
stock_data = data_pivoted.reset_index()

print(stock_data.head())

Attribute       Date Ticker   Adj Close       Close        High         Low  \
0         2023-04-10   AAPL  161.169724  162.029999  162.029999  160.080002   
1         2023-04-10   GOOG  106.949997  106.949997  107.970001  105.599998   
2         2023-04-10   MSFT  287.034241  289.390015  289.600006  284.709991   
3         2023-04-10   NFLX  338.989990  338.989990  339.880005  333.359985   
4         2023-04-10   TSLA  184.509995  184.509995  185.100006  176.110001   

Attribute        Open       Volume  
0          161.419998   47716900.0  
1          107.389999   19741500.0  
2          289.209991   23103000.0  
3          335.269989    2657900.0  
4          179.940002  142154600.0

The data we collected contains the following columns:

Date: The date of the stock data entry.
Ticker: The stock ticker symbol.
Adj Close: The adjusted closing price of the stock, which accounts for any corporate actions like splits or dividends.
Close: The closing price of the stock.
High: The highest price of the stock during the trading day.
Low: The lowest price of the stock during the trading day.
Open: The opening price of the stock.
Volume: The number of shares traded during the day.

Now, let’s make some necessary transformations in the dataset before moving forward:

# convert the 'Date' column to datetime format
stock_data['Date'] = pd.to_datetime(stock_data['Date'])

# set the 'Date' column as the index of the dataframe
stock_data.set_index('Date', inplace=True)
print(stock_data.head())

Attribute  Ticker   Adj Close       Close        High         Low        Open  \
Date                                                                            
2023-04-10   AAPL  161.169724  162.029999  162.029999  160.080002  161.419998   
2023-04-10   GOOG  106.949997  106.949997  107.970001  105.599998  107.389999   
2023-04-10   MSFT  287.034241  289.390015  289.600006  284.709991  289.209991   
2023-04-10   NFLX  338.989990  338.989990  339.880005  333.359985  335.269989   
2023-04-10   TSLA  184.509995  184.509995  185.100006  176.110001  179.940002   

Attribute        Volume  
Date                     
2023-04-10   47716900.0  
2023-04-10   19741500.0  
2023-04-10   23103000.0  
2023-04-10    2657900.0  
2023-04-10  142154600.0

Stock Market Anomaly Detection using Python

Now, let’s get started with the task of stock market anomaly detection with a visualization of the adjusted close prices and volumes for each ticker over time to get an overview of the data:

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")

# plotting the adjusted close prices for each ticker over time
plt.figure(figsize=(15, 6))
for ticker in stock_data['Ticker'].unique():
    subset = stock_data[stock_data['Ticker'] == ticker]
    plt.plot(subset.index, subset['Adj Close'], label=ticker)

plt.title('Adjusted Close Prices Over Time')
plt.xlabel('Date')
plt.ylabel('Adjusted Close Price')
plt.legend()
plt.show()

# plotting the trading volume for each ticker over time
plt.figure(figsize=(15, 6))
for ticker in stock_data['Ticker'].unique():
    subset = stock_data[stock_data['Ticker'] == ticker]
    plt.plot(subset.index, subset['Volume'], label=ticker)

plt.title('Trading Volume Over Time')
plt.xlabel('Date')
plt.ylabel('Volume')
plt.legend()
plt.show()

Stock Market Anomaly Detection: Adjusted Close Prices Over Time

The first graph shows the adjusted closing prices of five different stocks:

AAPL (Apple Inc.)
GOOG (Alphabet Inc.)
MSFT (Microsoft Corporation)
NFLX (Netflix, Inc.)
and TSLA (Tesla, Inc.)

GOOG appears to have the highest price and shows a general uptrend throughout the period, despite some volatility. TSLA and AAPL also exhibit an uptrend, with AAPL’s stock price increasing more steadily. MSFT and NFLX display relatively lower prices compared to the others, with NFLX showing considerable fluctuation but remaining mostly flat, and MSFT demonstrating a slight downtrend towards the end of the period.

From the second graph, it is evident that AAPL and TSLA have the highest and most volatile trading volumes, with TSLA showing particularly large spikes. It suggests significant investor interest or reactions to events during those times. GOOG, while having the highest stock price, shows moderate and relatively stable trading volume. MSFT and NFLX have lower and less volatile trading volumes in comparison to AAPL and TSLA. The spikes and dips in trading volumes could correspond to earnings reports, product announcements, or other market-moving events for these companies.

Detecting Anomalies in the Stock Market

Given the variability and trends observed in both adjusted close prices and trading volumes, anomaly detection can focus on identifying:

Significant price movements that deviate from the stock’s typical price range or trend.
Unusual trading volumes that stand out from the normal trading activity.

For the task of stock market anomaly detection, we can use the Z-score method, which identifies anomalies based on how many standard deviations away a data point is from the mean. A common threshold for identifying an anomaly is a Z-score greater than 2 or less than -2, which corresponds to data points that are more than 2 standard deviations away from the mean.

We will compute the Z-scores for both the adjusted close prices and trading volumes for each stock and then identify any data points that exceed this threshold:

from scipy.stats import zscore

def detect_anomalies(df, column):
    df_copy = df.copy()

    # calculate Z-scores and add them as a new column
    df_copy['Z-score'] = zscore(df_copy[column])

    # find where the absolute Z-score is greater than 2 (common threshold for anomalies)
    anomalies = df_copy[abs(df_copy['Z-score']) > 2]
    return anomalies

anomalies_adj_close = pd.DataFrame()
anomalies_volume = pd.DataFrame()

for ticker in stock_data['Ticker'].unique():
    data_ticker = stock_data[stock_data['Ticker'] == ticker]

    adj_close_anomalies = detect_anomalies(data_ticker, 'Adj Close')
    volume_anomalies = detect_anomalies(data_ticker, 'Volume')

    # use concat instead of append
    anomalies_adj_close = pd.concat([anomalies_adj_close, adj_close_anomalies])
    anomalies_volume = pd.concat([anomalies_volume, volume_anomalies])

print(anomalies_adj_close.head())

Attribute  Ticker   Adj Close       Close        High         Low        Open  \
Date                                                                            
2023-04-10   AAPL  161.169724  162.029999  162.029999  160.080002  161.419998   
2023-04-11   AAPL  159.946259  160.800003  162.360001  160.509995  162.350006   
2023-04-12   AAPL  159.249985  160.100006  162.059998  159.779999  161.220001   
2023-04-10   GOOG  106.949997  106.949997  107.970001  105.599998  107.389999   
2023-04-11   GOOG  106.120003  106.120003  107.220001  105.279999  106.919998   

Attribute       Volume   Z-score  
Date                              
2023-04-10  47716900.0 -2.140730  
2023-04-11  47644200.0 -2.275332  
2023-04-12  50133100.0 -2.351934  
2023-04-10  19741500.0 -2.104352  
2023-04-11  18721300.0 -2.173283

print(anomalies_volume.head())

Attribute  Ticker   Adj Close       Close        High         Low        Open  \
Date                                                                            
2023-05-05   AAPL  172.648468  173.570007  174.300003  170.759995  170.979996   
2023-05-31   AAPL  176.552780  177.250000  179.350006  176.759995  177.330002   
2023-06-05   AAPL  178.873611  179.580002  184.949997  178.039993  182.630005   
2023-06-16   AAPL  184.192612  184.919998  186.990005  184.270004  186.729996   
2023-08-04   AAPL  181.274155  181.990005  187.380005  181.919998  185.520004   

Attribute        Volume   Z-score  
Date                               
2023-05-05  113316400.0  3.243334  
2023-05-31   99625300.0  2.448999  
2023-06-05  121946500.0  3.744038  
2023-06-16  101235600.0  2.542426  
2023-08-04  115799700.0  3.387411

Now, let’s plot the adjusted close prices and trading volumes again for each company, highlighting the anomalies we detected:

def plot_anomalies(ticker, anomalies_adj_close, anomalies_volume):
    # Filter the main and anomalies data for the given ticker
    data_ticker = stock_data[stock_data['Ticker'] == ticker]
    adj_close_anomalies = anomalies_adj_close[anomalies_adj_close['Ticker'] == ticker]
    volume_anomalies = anomalies_volume[anomalies_volume['Ticker'] == ticker]

    # plotting
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 12))

    # adjusted close price
    ax1.plot(data_ticker.index, data_ticker['Adj Close'], label='Adj Close', color='blue')
    ax1.scatter(adj_close_anomalies.index, adj_close_anomalies['Adj Close'], color='red', label='Anomalies')
    ax1.set_title(f'{ticker} Adjusted Close Price and Anomalies')
    ax1.set_xlabel('Date')
    ax1.set_ylabel('Adjusted Close Price')
    ax1.legend()

    # volume
    ax2.plot(data_ticker.index, data_ticker['Volume'], label='Volume', color='green')
    ax2.scatter(volume_anomalies.index, volume_anomalies['Volume'], color='orange', label='Anomalies')
    ax2.set_title(f'{ticker} Trading Volume and Anomalies')
    ax2.set_xlabel('Date')
    ax2.set_ylabel('Volume')
    ax2.legend()

    plt.tight_layout()
    plt.show()

# plot anomalies for each ticker
for ticker in stock_data['Ticker'].unique():
    plot_anomalies(ticker, anomalies_adj_close, anomalies_volume)

The above charts for each company display the adjusted close prices and trading volumes over time, with anomalies highlighted:

Anomalies are marked in red and represent significant deviations from the typical price range. These could correspond to days with unexpected news, earnings reports, or market shifts.
Anomalies in trading volume are marked in orange and indicate days with exceptionally high or low trading activity compared to the norm. Such spikes could be due to market events, announcements, or other factors influencing trader behaviour.

Now, let’s analyze the correlation in the anomalies of all the companies:

# consolidate anomalies for adjusted close prices and volumes
all_anomalies_adj_close = anomalies_adj_close[['Ticker']].copy()
all_anomalies_adj_close['Adj Close Anomaly'] = 1  # indicator variable for anomalies

all_anomalies_volume = anomalies_volume[['Ticker']].copy()
all_anomalies_volume['Volume Anomaly'] = 1  # indicator variable for anomalies

# pivot these dataframes to have one row per date and columns for each ticker, filling non-anomalies with 0
adj_close_pivot = all_anomalies_adj_close.pivot_table(index=all_anomalies_adj_close.index, columns='Ticker',
                                                       fill_value=0, aggfunc='sum')

volume_pivot = all_anomalies_volume.pivot_table(index=all_anomalies_volume.index, columns='Ticker',
                                                fill_value=0, aggfunc='sum')

# flatten the multi-level column index
adj_close_pivot.columns = adj_close_pivot.columns.get_level_values(1)
volume_pivot.columns = volume_pivot.columns.get_level_values(1)

# combine the two pivoted dataframes
combined_anomalies = pd.concat([adj_close_pivot, volume_pivot], axis=1, keys=['Adj Close Anomaly', 'Volume Anomaly'])

# calculate the correlation matrix for the anomalies
correlation_matrix = combined_anomalies.corr()

print(correlation_matrix)

                         Adj Close Anomaly                                \
Ticker                                AAPL      GOOG      MSFT      NFLX   
                  Ticker                                                   
Adj Close Anomaly AAPL            1.000000  0.219214 -0.072232 -0.219214   
                  GOOG            0.219214  1.000000  0.121395 -1.000000   
                  MSFT           -0.072232  0.121395  1.000000 -0.121395   
                  NFLX           -0.219214 -1.000000 -0.121395  1.000000   
Volume Anomaly    AAPL                 NaN -0.645497 -0.258199  0.645497   
                  GOOG                 NaN       NaN       NaN       NaN   
                  MSFT                 NaN  0.258199  0.645497 -0.258199   
                  NFLX                 NaN  0.258199 -0.258199 -0.258199   
                  TSLA                 NaN  0.166667 -0.166667 -0.166667   

                         Volume Anomaly                                \
Ticker                             AAPL      GOOG      MSFT      NFLX   
                  Ticker                                                
Adj Close Anomaly AAPL              NaN       NaN       NaN       NaN   
                  GOOG        -0.645497       NaN  0.258199  0.258199   
                  MSFT        -0.258199       NaN  0.645497 -0.258199   
                  NFLX         0.645497       NaN -0.258199 -0.258199   
Volume Anomaly    AAPL         1.000000  0.170507 -0.004707 -0.336011   
                  GOOG         0.170507  1.000000  0.418917 -0.216007   
                  MSFT        -0.004707  0.418917  1.000000 -0.196116   
                  NFLX        -0.336011 -0.216007 -0.196116  1.000000   
                  TSLA        -0.405244 -0.405244 -0.384353 -0.050252   

                                    
Ticker                        TSLA  
                  Ticker            
Adj Close Anomaly AAPL         NaN  
                  GOOG    0.166667  
                  MSFT   -0.166667  
                  NFLX   -0.166667  
Volume Anomaly    AAPL   -0.405244  
                  GOOG   -0.405244  
                  MSFT   -0.384353  
                  NFLX   -0.050252  
                  TSLA    1.000000

AAPL has a low positive correlation with GOOG and a negative correlation with NFLX in terms of adjusted close price anomalies. It suggests that price movements in AAPL have some level of simultaneous occurrence with GOOG but move inversely with NFLX. On the other hand, GOOG and NFLX show a strong negative correlation, indicating that when one experiences an anomalous price increase (or decrease), the other tends to move in the opposite direction.

In trading volumes, GOOG shows a positive correlation with MSFT in trading volume anomalies, suggesting that these companies might have simultaneous unusual trading activities. AAPL’s volume anomalies have a negative correlation with other companies like NFLX and TSLA, indicating that when AAPL experiences unusual trading volume, these companies tend to have opposite anomalies in their trading volumes.

Analyzing the Risk of Anomalies

Now, let’s rate each stock based on the risk inferred from the anomalies detected. For this task, we can consider the frequency and magnitude of these anomalies. A stock could be considered more risky if it has frequent and large anomalies in its price or volume. Here’s how we can approach this:

Frequency of Anomalies: A higher number of anomalies may indicate a higher risk.
Magnitude of Anomalies: Larger deviations from the mean (higher absolute Z-scores) suggest higher risk.

We can compute a risk score for each stock by combining these factors. For simplicity, we could average the absolute Z-scores of anomalies for each stock and then normalize these scores across all stocks to get a risk rating:

# calculate the mean absolute Z-score for each stock as a risk indicator
adj_close_risk = anomalies_adj_close.groupby('Ticker')['Z-score'].apply(lambda x: abs(x).mean())
volume_risk = anomalies_volume.groupby('Ticker')['Z-score'].apply(lambda x: abs(x).mean())

# combine the risk scores from both price and volume anomalies
total_risk = adj_close_risk + volume_risk

# normalize the risk scores to get a relative risk rating from 0 to 1
risk_rating = (total_risk - total_risk.min()) / (total_risk.max() - total_risk.min())

print(risk_rating)

Ticker
AAPL    0.173652
GOOG    0.063253
MSFT    0.000000
NFLX    1.000000
TSLA         NaN
Name: Z-score, dtype: float64

Here’s the interpretation of each rating:

AAPL: Has a risk rating of approximately 0.17. It suggests that Apple’s stock shows some level of risk due to anomalies, but it is relatively moderate compared to others like NFLX.
GOOG: With a risk rating of around 0.06, GOOG appears to be less risky compared to AAPL. It indicates fewer or less significant anomalies in its trading data.
MSFT: Shows a risk rating of 0.00, indicating the least risk among the stocks listed. It suggests that Microsoft had the fewest and smallest anomalies in its price and volume data.
NFLX: Has the highest risk rating at 1.00. It indicates that Netflix is the most risky among these stocks, with the most frequent and largest anomalies detected.
TSLA: The NaN value suggests that it did not have detectable anomalies in the period analyzed.

Summary

So, this is how you can detect, analyze, and interpret the anomalies in the stock market. Anomalies in the stock market are important because they can indicate opportunities or risks. For example, a sudden spike in a stock’s price could be due to positive news about the company or its industry, which signals a potential investment opportunity. Conversely, an unexpected price drop could warn of underlying issues or market sentiment changes, which signals a risk that investors may need to manage.

I hope you liked this article on Stock Market Anomaly Detection using Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.

One comment

Gary

June 27, 2024 / 10:41 am Reply

Thanks Aman. This is a great tutorial.
For those having issues with melt of [‘Date’], the latest versions of Pandas seem to have a ‘melt’ issue (https://github.com/pandas-dev/pandas/issues/57663). This may be resolved by working with Pandas v2.1.4 i.e. pip install pandas==2.1.4

Stock Market Anomaly Detection using Python

Stock Market Anomaly Detection: Process We Can Follow

Collecting Real-time Stock Market Data using Python

Stock Market Anomaly Detection using Python

Detecting Anomalies in the Stock Market

Analyzing the Risk of Anomalies

Summary

Aman Kharwal

One comment

Leave a ReplyCancel reply

Stock Market Anomaly Detection: Process We Can Follow

Collecting Real-time Stock Market Data using Python

Stock Market Anomaly Detection using Python

Detecting Anomalies in the Stock Market

Analyzing the Risk of Anomalies

Summary

Aman Kharwal

Recommended For You

A Guide to Polars for Faster Data Analysis

Introducing: Hands-on GenAI, LLMs and AI Agents

10 Global Companies Hiring Data Scientists This Week

A Beginner’s Guide to Using APIs in Data Science

One comment

Leave a ReplyCancel reply

Discover more from AmanXai by Aman Kharwal