Practical Time Series Concepts for Data Science Interviews

In Data Science interviews, interviewers often ask practical concepts based on time series to check your understanding and approach while solving complex time series problems. So, if you are preparing for Data Science interviews and looking for practice problems based on time series, this article is for you. In this article, I’ll take you through a guide to essential practical time series concepts for Data Science interviews, with example questions.

Practical Time Series Concepts for Data Science Interviews

Below are must-know practical time series concepts for Data Science interviews, each explained in detail with an example question.

Handling Irregular Time Intervals

In real-world scenarios, time series data is not always collected at regular intervals. Handling irregular time intervals requires interpolation, resampling, or forward/backward filling techniques.

Example Question: A sensor records temperature readings at irregular intervals. You are required to resample this data into hourly intervals and fill in the missing values using linear interpolation.

Here’s how to solve this problem using Python:

import pandas as pd
import numpy as np

# sample irregular time series data
data = {'timestamp': ['2024-12-01 08:15:00', '2024-12-01 09:45:00', '2024-12-01 11:30:00'],
        'temperature': [22.5, 23.0, 24.2]}
df = pd.DataFrame(data)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace=True)

# resample to hourly intervals
df_resampled = df.resample('H').mean()

# interpolate missing values
df_resampled['temperature'] = df_resampled['temperature'].interpolate(method='linear')

print(df_resampled)
                     temperature
timestamp
2024-12-01 08:00:00 22.5
2024-12-01 09:00:00 23.0
2024-12-01 10:00:00 23.6
2024-12-01 11:00:00 24.2

Our solution addresses the challenge of handling irregular time intervals in time series data by resampling and interpolating missing values. It creates a DataFrame with temperature readings recorded at irregular timestamps, converts the timestamps to a datetime index, and resamples the data into hourly intervals. Missing values created during resampling are filled using linear interpolation to estimate the missing temperature values based on surrounding data points.

Rolling Window and Moving Statistics

Rolling window calculations (e.g., moving average, moving sum) smooth out short-term fluctuations and highlight long-term trends.

Example Question: Calculate the rolling 7-day average of stock prices and highlight days when the price deviates by more than 10% from the rolling average.

Here’s how to solve this problem using Python:

# sample stock price data
data = {'date': pd.date_range(start='2024-12-01', periods=10, freq='D'),
        'price': [100, 102, 101, 98, 97, 96, 95, 94, 93, 92]}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)

# calculate rolling 7-day average
df['rolling_avg'] = df['price'].rolling(window=7).mean()

# identify deviations > 10%
df['deviation'] = (df['price'] - df['rolling_avg']).abs() / df['rolling_avg'] * 100
df['outlier'] = df['deviation'] > 10

print(df)
            price  rolling_avg  deviation  outlier
date
2024-12-01 100 NaN NaN False
2024-12-02 102 NaN NaN False
2024-12-03 101 NaN NaN False
2024-12-04 98 NaN NaN False
2024-12-05 97 NaN NaN False
2024-12-06 96 NaN NaN False
2024-12-07 95 98.428571 3.483309 False
2024-12-08 94 97.571429 3.660322 False
2024-12-09 93 96.285714 3.412463 False
2024-12-10 92 95.000000 3.157895 False

Our solution calculates a rolling 7-day average of stock prices to identify trends and smooth fluctuations, then detects anomalies by calculating the percentage deviation of daily prices from the rolling average. If the deviation exceeds 10%, the day will be flagged as an outlier. This approach helps highlight significant price changes in the stock’s performance while maintaining a focus on long-term patterns.

Time Series Decomposition

Decomposition breaks a time series into trend, seasonal, and residual components. It’s vital for understanding patterns and anomalies.

Example Question: Decompose a monthly sales dataset into trend, seasonal, and residual components, and plot them to analyze sales patterns.

Here’s how to solve this problem using Python:

from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

# sample monthly sales data
data = {'date': pd.date_range(start='2022-01-01', periods=24, freq='M'),
        'sales': [200, 210, 250, 270, 290, 300, 310, 320, 400, 410, 420, 450,
                  480, 490, 510, 530, 550, 570, 590, 600, 610, 620, 630, 650]}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)

# decompose time series
decomposition = seasonal_decompose(df['sales'], model='additive')
decomposition.plot()
plt.show()
Time Series Decomposition

Our solution uses time series decomposition to break down monthly sales data into its trend, seasonal, and residual components. By applying the additive decomposition model from the statsmodels library, we are separating the overall sales patterns into long-term trends, repeating seasonal patterns, and random variations.

Working with Time Zones

Time zone conversion is crucial while dealing with global data to ensure consistency and accurate comparisons.

Example Question: Convert timestamps in a dataset from UTC to IST (Indian Standard Time) and compute the difference between each entry in hours.

Here’s how to solve this problem using Python:

# sample UTC timestamps
data = {'timestamp': ['2024-12-01 00:00:00', '2024-12-01 06:00:00', '2024-12-01 12:00:00']}
df = pd.DataFrame(data)
df['timestamp'] = pd.to_datetime(df['timestamp'], utc=True)

# convert to IST
df['IST'] = df['timestamp'].dt.tz_convert('Asia/Kolkata')

# calculate hourly differences
df['hour_diff'] = df['IST'].diff().dt.total_seconds() / 3600

print(df)
                  timestamp                       IST  hour_diff
0 2024-12-01 00:00:00+00:00 2024-12-01 05:30:00+05:30 NaN
1 2024-12-01 06:00:00+00:00 2024-12-01 11:30:00+05:30 6.0
2 2024-12-01 12:00:00+00:00 2024-12-01 17:30:00+05:30 6.0

Our solution demonstrates handling time zones in time series data by converting timestamps from UTC to IST (Indian Standard Time) using pandas. It first parses timestamps as UTC, converts them to IST, and then calculates the time difference between consecutive entries in hours.

Fourier Transform for Seasonality

Fourier Transform identifies periodic patterns in time series data, which is helpful for seasonal trend analysis.

Example Question: Use Fourier Transform to detect the primary frequency in a dataset of daily web traffic and reconstruct the signal using the top 2 frequencies.

Here’s how to solve this problem using Python:

import numpy as np
import matplotlib.pyplot as plt

# sample web traffic data
days = np.arange(0, 30)
traffic = 100 + 20 * np.sin(2 * np.pi * days / 7) + 10 * np.sin(2 * np.pi * days / 15)

# perform fourier transform
fft = np.fft.fft(traffic)
frequencies = np.fft.fftfreq(len(traffic))

# reconstruct signal using top 2 frequencies
top_frequencies = np.argsort(np.abs(fft))[-2:]
reconstructed = np.zeros_like(traffic, dtype=complex)
for freq in top_frequencies:
    reconstructed += fft[freq] * np.exp(2j * np.pi * frequencies[freq] * days)

reconstructed = np.real(reconstructed)

# plot original and reconstructed signals
plt.plot(days, traffic, label='Original Signal')
plt.plot(days, reconstructed, label='Reconstructed Signal', linestyle='--')
plt.legend()
plt.show()
Practical Time Series Concepts for Data Science Interviews: Fourier Transform for Seasonality

Our solution uses Fourier Transform to analyze periodic patterns in a synthetic web traffic dataset. The Fourier Transform identifies dominant frequencies in the data, representing the most significant periodic trends. The solution reconstructs the original signal using only the top two dominant frequencies by simplifying the data while retaining its primary seasonal components.

Summary

In this article, we covered essential practical time series concepts for Data Science interviews by focusing on handling irregular intervals, rolling window statistics, time series decomposition, working with time zones, and Fourier Transform for seasonality. I hope you liked this article on practical time series concepts for Data Science interviews. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.

Aman Kharwal
Aman Kharwal

AI/ML Engineer | Published Author. My aim is to decode data science for the real world in the most simple words.

Articles: 2127

Leave a Reply

Discover more from AmanXai by Aman Kharwal

Subscribe now to keep reading and get access to the full archive.

Continue reading