NumPy Concepts for Data Science Interviews

Aman Kharwal
November 5, 2024

NumPy Concepts for Data Science Interviews

NumPy concepts are often asked in Data Science interviews to assess a candidate’s ability to perform efficient data manipulation and numerical computations. So, if you are preparing for Data Science interviews and looking for questions based on NumPy operations, this article is for you. In this article, I’ll take you through NumPy concepts for Data Science interviews, including example questions and how to solve them.

NumPy Concepts for Data Science Interviews

Here are must-know NumPy concepts for Data Science interviews, each explained in detail with an example question and its solution in Python.

Broadcasting

Broadcasting allows NumPy to perform arithmetic operations on arrays of different shapes by stretching the smaller array across the larger array’s shape. This makes computations more efficient and reduces the need for explicit looping.

Example Question: You have a 2D array of daily temperatures for multiple cities (rows: cities, columns: days) and a 1D array with a daily offset for each day of the week. Use broadcasting to adjust each city’s temperatures by the daily offset.

Here’s how to solve this problem using NumPy and Python:

import numpy as np

# 2D array of temperatures (5 cities, 7 days)
temperatures = np.array([
    [20, 21, 19, 22, 18, 20, 19],
    [25, 26, 24, 23, 22, 27, 26],
    [15, 17, 16, 18, 15, 16, 17],
    [30, 31, 29, 32, 30, 28, 27],
    [22, 23, 21, 25, 24, 26, 25]
])

# 1D array with daily offsets
offsets = np.array([1, -1, 0, 2, -2, 1, 0])

# adjust temperatures using broadcasting
adjusted_temperatures = temperatures + offsets
print(adjusted_temperatures)

[[21 20 19 24 16 21 19]
 [26 25 24 25 20 28 26]
 [16 16 16 20 13 17 17]
 [31 30 29 34 28 29 27]
 [23 22 21 27 22 27 25]]

Vectorization

Vectorization uses array operations instead of loops, which makes computations faster and more concise by leveraging NumPy’s low-level optimizations. It’s essential for data science tasks where performance is critical.

Example Question: You have an array of distances in kilometres and need to convert them to miles without using a loop. Use vectorized operations in NumPy for the conversion (1 km = 0.621371 miles).

Here’s how to solve this problem using NumPy and Python:

# array of distances in kilometers
distances_km = np.array([5, 10, 15, 20, 25])

# convert to miles using vectorized operation
distances_miles = distances_km * 0.621371
print(distances_miles)

[ 3.106855  6.21371   9.320565 12.42742  15.534275]

Boolean Indexing

Boolean indexing is a way to filter arrays based on conditions. This concept allows for the efficient selection of elements that meet specific criteria, which is useful for data cleaning and processing.

Example Question: You have an array of ages, and you want to find all ages that are above 18 and increase them by 1 to simulate a birthday. Use Boolean indexing to achieve this.

Here’s how to solve this problem using NumPy and Python:

# array of ages
ages = np.array([15, 20, 17, 19, 21, 18, 23, 16])

# increase ages above 18 by 1 using boolean indexing
ages[ages > 18] += 1
print(ages)

[15 21 17 20 22 18 24 16]

Advanced Aggregation Functions

Advanced aggregation functions in NumPy allow you to perform complex data analysis tasks. Functions like np.mean, np.sum, and np.median can operate along specific axes. This feature enables aggregation calculations for rows. You can also calculate aggregates for columns in multidimensional arrays using these functions.

Example Question: Given a 3D array representing monthly sales data (axis 0: stores, axis 1: months, axis 2: products), calculate the total sales for each product across all stores for each month.

Here’s how to solve this problem using NumPy and Python:

# sample 3D array: 2 stores, 3 months, 4 products
sales_data = np.array([
    [[200, 150, 100, 80], [210, 160, 110, 90], [220, 170, 120, 95]],
    [[180, 140, 90, 70], [190, 150, 100, 85], [200, 160, 110, 80]]
])

# sum sales for each product across all stores for each month
product_sales_monthly = np.sum(sales_data, axis=0)
print(product_sales_monthly)

[[380 290 190 150]
 [400 310 210 175]
 [420 330 230 175]]

Advanced Array Manipulation

Advanced array manipulation techniques like reshaping, transposing, and flattening help in organizing and structuring data. Reshaping changes the array’s shape, transposing flips axes, and flattening turns a multi-dimensional array into 1D. These are essential for preparing data for machine learning models and other analyses.

Example Question: You have a 2D array representing grayscale pixel values (8×8 matrix). Reshape it into a 4D array where each 4×4 sub-array represents a quadrant of the image.

Here’s how to solve this problem using NumPy and Python:

# sample 8x8 array representing grayscale pixel values
pixels = np.arange(64).reshape(8, 8)

# reshape into 4D array (2, 2, 4, 4) to represent 4 quadrants
quadrants = pixels.reshape(2, 4, 2, 4).transpose(0, 2, 1, 3)
print(quadrants)

[[[[ 0  1  2  3]
   [ 8  9 10 11]
   [16 17 18 19]
   [24 25 26 27]]

  [[ 4  5  6  7]
   [12 13 14 15]
   [20 21 22 23]
   [28 29 30 31]]]


 [[[32 33 34 35]
   [40 41 42 43]
   [48 49 50 51]
   [56 57 58 59]]

  [[36 37 38 39]
   [44 45 46 47]
   [52 53 54 55]
   [60 61 62 63]]]]

Summary

These NumPy concepts are foundational for efficient and optimized data manipulation in Data Science. Mastering them prepares you to handle complex data transformations and calculations, which are often required in the real world. I hope you liked this article on NumPy concepts for Data Science interviews. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.

Aman Kharwal

AI/ML Engineer | Published Author. My aim is to decode data science for the real world in the most simple words.

Leave a ReplyCancel reply