Customer Satisfaction Analysis with Python

Customer Satisfaction Analysis is the process of collecting, analyzing, and interpreting data regarding how satisfied customers are with a company’s products, services, and overall experience. If you want to learn how to analyze the satisfaction of customers with a business and how to make further decisions based on satisfaction levels, this article is for you. In this article, I’ll take you through the task of Customer Satisfaction Analysis with Python.

Customer Satisfaction Analysis: Overview

Customer Satisfaction Analysis involves collecting, analyzing, and interpreting data on customer experiences and perceptions through surveys, feedback forms, ratings, and reviews. By identifying key drivers of satisfaction and dissatisfaction, businesses can make informed decisions to improve products, services, and customer interactions.

It helps retain customers, boost loyalty and advocacy, drive sales growth, and gain a competitive edge, which ultimately enhances overall business performance and customer experience.

To get started with the task of Customer Satisfaction Analysis, we need a dataset based on customer satisfaction and feedback. I found an ideal dataset for this task, which contains features like:

CustomerID: Unique identifier for each customer.
Age: Age of the customer.
Gender: Gender of the customer (Male/Female).
PurchaseAmount: Total amount spent by the customer.
PurchaseFrequency: Number of purchases made by the customer.
ProductQualityRating: Customer rating for product quality (1-5).
DeliveryTimeRating: Customer rating for delivery time (1-5).
CustomerServiceRating: Customer rating for customer service (1-5).
WebsiteEaseOfUseRating: Customer rating for website ease of use (1-5).
ReturnRate: Proportion of products returned by the customer.
DiscountUsage: Amount of discount used by the customer.
LoyaltyProgramMember: Whether the customer is a loyalty program member (Yes/No).

You can download the dataset from here.

Customer Satisfaction Analysis with Python

Now, let’s get started with the task of Customer Satisfaction Analysis by importing the necessary Python libraries and the dataset:

import pandas as pd

data = pd.read_csv("/content/E-commerce_NPA_Dataset.csv")

print(data.head())

   CustomerID  Age  Gender  PurchaseAmount  PurchaseFrequency  \
0           1   38  Female      749.097626                 24   
1           2   30    Male      735.224916                 18   
2           3   55    Male     1145.520305                 22   
3           4   39  Female      925.460535                 14   
4           5   51    Male      108.359916                  9   

   ProductQualityRating  DeliveryTimeRating  CustomerServiceRating  \
0                     2                   1                      3   
1                     5                   4                      4   
2                     3                   5                      4   
3                     4                   3                      2   
4                     3                   3                      2   

   WebsiteEaseOfUseRating  ReturnRate  DiscountUsage LoyaltyProgramMember  
0                       5        0.12     135.392573                   No  
1                       5        0.37     193.450663                  Yes  
2                       1        0.10     147.246263                  Yes  
3                       4        0.38      56.362894                  Yes  
4                       5        0.42     338.731055                   No

Let’s have a look at the summary statistics of the data:

print(data.describe())

       CustomerID         Age  PurchaseAmount  PurchaseFrequency  \
count  500.000000  500.000000      500.000000         500.000000   
mean   250.500000   44.170000     1065.050731          14.308000   
std    144.481833   14.813777      583.199658           8.151197   
min      1.000000   18.000000       51.799790           1.000000   
25%    125.750000   32.000000      535.083407           7.000000   
50%    250.500000   44.000000     1100.884065          14.000000   
75%    375.250000   58.000000     1584.348124          22.000000   
max    500.000000   69.000000     1999.655968          29.000000   

       ProductQualityRating  DeliveryTimeRating  CustomerServiceRating  \
count             500.00000          500.000000               500.0000   
mean                2.93400            3.008000                 3.0780   
std                 1.41054            1.372481                 1.4156   
min                 1.00000            1.000000                 1.0000   
25%                 2.00000            2.000000                 2.0000   
50%                 3.00000            3.000000                 3.0000   
75%                 4.00000            4.000000                 4.0000   
max                 5.00000            5.000000                 5.0000   

       WebsiteEaseOfUseRating  ReturnRate  DiscountUsage  
count              500.000000  500.000000     500.000000  
mean                 3.082000    0.252280     251.181010  
std                  1.415374    0.149674     141.531993  
min                  1.000000    0.000000       0.772696  
25%                  2.000000    0.110000     133.672231  
50%                  3.000000    0.260000     251.940355  
75%                  4.000000    0.380000     371.692341  
max                  5.000000    0.500000     499.813315

The summary statistics provide insights into the central tendency, dispersion, and range of the numeric data. Here are some key observations:

Age: The average customer age is around 44 years, with a range from 18 to 69 years.
Purchase Amount: The average purchase amount is $1065, with a significant standard deviation indicating variability in spending.
Purchase Frequency: Customers purchase on average about 14 times, with some making up to 29 purchases.
Ratings: The average ratings for product quality, delivery time, customer service, and website ease of use are around 3, indicating moderate satisfaction levels. These ratings range from 1 (poor) to 5 (excellent).
Return Rate: The average return rate is 25%, with some customers having a return rate as high as 50%.
Discount Usage: The average discount usage is around $251, with high variability.

Now, let’s visualize the distributions of these variables:

import matplotlib.pyplot as plt

numeric_cols = ['Age', 'PurchaseAmount', 'PurchaseFrequency', 'ProductQualityRating', 'DeliveryTimeRating', 'CustomerServiceRating', 'WebsiteEaseOfUseRating', 'ReturnRate', 'DiscountUsage']

plt.figure(figsize=(15, 20))

for i, col in enumerate(numeric_cols, 1):
    plt.subplot(5, 2, i)
    plt.hist(data[col], bins=20, edgecolor='k', alpha=0.7)
    plt.title(f'Distribution of {col}')
    plt.xlabel(col)
    plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

customer satisfaction analysis: data distribution

The histograms reveal several insights into customer demographics and satisfaction metrics:

the age distribution is relatively even with slight peaks in the 30s and 60s;
purchase amounts are right-skewed, indicating most customers spend less than $1000;
purchase frequency is varied, with notable peaks around 10 and 20 purchases;
satisfaction ratings for product quality, delivery time, customer service, and website ease of use show a wide distribution, with most ratings clustering around the middle values;
return rates are varied with peaks around 0.1 and 0.4;
and discount usage is evenly spread, showing no significant trend.

These findings suggest that customer experiences and behaviours are diverse, with varying levels of satisfaction across different service aspects.

Now, let’s segment the customers based on demographic and behavioral factors, and analyze their satisfaction ratings. We’ll create segments based on age, gender, and loyalty program membership. First, let’s analyze satisfaction ratings across different age groups and genders:

# create age groups
bins = [18, 30, 40, 50, 60, 70]
labels = ['18-29', '30-39', '40-49', '50-59', '60-69']
data['AgeGroup'] = pd.cut(data['Age'], bins=bins, labels=labels, right=False)

# select only the numeric columns for calculation
numeric_columns = ['ProductQualityRating', 'DeliveryTimeRating', 'CustomerServiceRating', 'WebsiteEaseOfUseRating']

# calculate mean ratings by age group and gender
mean_ratings_age_gender = data.groupby(['AgeGroup', 'Gender'])[numeric_columns].mean()

# reset the index to display the dataframe
mean_ratings_age_gender.reset_index(inplace=True)
print(mean_ratings_age_gender)

  AgeGroup  Gender  ProductQualityRating  DeliveryTimeRating  \
0    18-29  Female              3.052632            3.210526   
1    18-29    Male              2.933333            3.000000   
2    30-39  Female              2.929825            2.859649   
3    30-39    Male              3.080000            2.820000   
4    40-49  Female              3.090909            2.890909   
5    40-49    Male              2.857143            3.166667   
6    50-59  Female              2.945946            2.945946   
7    50-59    Male              2.833333            2.895833   
8    60-69  Female              2.900000            3.300000   
9    60-69    Male              2.673469            2.938776   

   CustomerServiceRating  WebsiteEaseOfUseRating  
0               3.175439                2.912281  
1               3.333333                3.355556  
2               2.912281                3.070175  
3               2.980000                2.880000  
4               3.036364                3.109091  
5               3.142857                3.142857  
6               3.027027                3.162162  
7               3.312500                3.062500  
8               3.066667                2.950000  
9               2.836735                3.285714

The data shows mean satisfaction ratings by age group and gender for different aspects of service. Here are some insights:

Younger customers (18-29) generally rate product quality slightly higher.
Females in the 40-49 age group give the highest ratings, while males in the 60-69 age group give the lowest.
Delivery time satisfaction is relatively consistent across age groups, with minor variations.
The highest ratings for delivery time satisfaction are from females aged 60-69, while the lowest are from males aged 30-39.
Customer service ratings are fairly consistent, with a slight peak among younger males (18-29).
Males in the 60-69 age group rate customer service the lowest.

Next, let’s analyze the impact of loyalty program membership on customer satisfaction:

# select only the numeric columns for calculation
numeric_columns = ['ProductQualityRating', 'DeliveryTimeRating', 'CustomerServiceRating', 'WebsiteEaseOfUseRating', 'ReturnRate', 'DiscountUsage']

# calculate mean ratings by loyalty program membership
mean_ratings_loyalty = data.groupby('LoyaltyProgramMember')[numeric_columns].mean()

# reset the index to display the dataframe
mean_ratings_loyalty.reset_index(inplace=True)
print(mean_ratings_loyalty)

  LoyaltyProgramMember  ProductQualityRating  DeliveryTimeRating  \
0                   No              2.920502            2.916318   
1                  Yes              2.946360            3.091954   

   CustomerServiceRating  WebsiteEaseOfUseRating  ReturnRate  DiscountUsage  
0               2.987448                3.108787    0.251883     241.426710  
1               3.160920                3.057471    0.252644     260.113108

The data shows mean satisfaction ratings, return rates, and discount usage for loyalty program members versus non-members. Here are the insights:

Product Quality Rating: Loyalty program members rate product quality slightly higher (2.95) compared to non-members (2.92).
Delivery Time Rating: Loyalty program members are more satisfied with delivery time (3.09) than non-members (2.92).
Customer Service Rating: Members rate customer service higher (3.16) compared to non-members (2.99).
Website Ease of Use Rating: Non-members rate the website slightly higher (3.11) than members (3.06).
Return Rate: The return rates are almost identical between members (0.25) and non-members (0.25).
Discount Usage: Members use slightly more discounts ($260) compared to non-members ($241).

Net Promoter Score

Now, let’s calculate the Net Promoter Score. NPS is a metric used to gauge customer loyalty and satisfaction by asking customers how likely they are to recommend a company’s product or service to others on a scale of 0 to 10. Respondents are classified into three categories:

Promoters (9-10)
Passives (7-8)
Detractors (0-6)

The NPS is calculated by subtracting the percentage of Detractors from the percentage of Promoters. A higher NPS indicates more customer loyalty and positive word-of-mouth, which are critical for business growth.

To calculate the NPS, we will use customer service ratings as a proxy for overall satisfaction. Here’s how to calculate NPS:

# define NPS categories based on customer service rating
data['NPS_Category'] = pd.cut(data['CustomerServiceRating'], bins=[0, 6, 8, 10], labels=['Detractors', 'Passives', 'Promoters'], right=False)

# calculate NPS
nps_counts = data['NPS_Category'].value_counts(normalize=True) * 100
nps_score = nps_counts['Promoters'] - nps_counts['Detractors']

nps_counts

NPS_Category
Detractors    100.0
Passives        0.0
Promoters       0.0
Name: proportion, dtype: float64

nps_score

-100.0

The NPS calculation shows:

Detractors: 100% of customers fall into the Detractors category.
Passives: 0%
Promoters: 0%

This results in an NPS score of -100, which indicates extremely low customer satisfaction. This score is a critical indicator that significant improvements are needed in customer service to convert detractors into promoters.

Root Cause Analysis for Low Satisfaction

Now, we will perform a root cause analysis on customer dissatisfaction by identifying the key factors contributing to low ratings in specific areas such as product quality, delivery time, customer service, and website ease of use. We’ll analyze the characteristics of customers who provide low ratings and look for patterns that can help us understand the root causes of dissatisfaction.

We can perform root cause analysis for low ratings by identifying the customers with low ratings and analyzing the characteristics of these customers. We’ll create subsets of the data where ratings are low (1 or 2) for product quality, delivery time, customer service, and website ease of use:

# define low rating threshold
low_rating_threshold = 2

# create subsets for low ratings in different aspects
low_product_quality = data[data['ProductQualityRating'] <= low_rating_threshold]
low_delivery_time = data[data['DeliveryTimeRating'] <= low_rating_threshold]
low_customer_service = data[data['CustomerServiceRating'] <= low_rating_threshold]
low_website_ease_of_use = data[data['WebsiteEaseOfUseRating'] <= low_rating_threshold]

# plot the characteristics for each low rating subset
plt.figure(figsize=(20, 15))

# age distribution for low ratings
plt.subplot(2, 2, 1)
plt.hist([low_product_quality['Age'], low_delivery_time['Age'], low_customer_service['Age'], low_website_ease_of_use['Age']], bins=10, label=['Product Quality', 'Delivery Time', 'Customer Service', 'Website Ease of Use'])
plt.title('Age Distribution for Low Ratings')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.legend()

# purchase amount distribution for low ratings
plt.subplot(2, 2, 2)
plt.hist([low_product_quality['PurchaseAmount'], low_delivery_time['PurchaseAmount'], low_customer_service['PurchaseAmount'], low_website_ease_of_use['PurchaseAmount']], bins=10, label=['Product Quality', 'Delivery Time', 'Customer Service', 'Website Ease of Use'])
plt.title('Purchase Amount Distribution for Low Ratings')
plt.xlabel('Purchase Amount')
plt.ylabel('Frequency')
plt.legend()

# purchase frequency distribution for low ratings
plt.subplot(2, 2, 3)
plt.hist([low_product_quality['PurchaseFrequency'], low_delivery_time['PurchaseFrequency'], low_customer_service['PurchaseFrequency'], low_website_ease_of_use['PurchaseFrequency']], bins=10, label=['Product Quality', 'Delivery Time', 'Customer Service', 'Website Ease of Use'])
plt.title('Purchase Frequency Distribution for Low Ratings')
plt.xlabel('Purchase Frequency')
plt.ylabel('Frequency')
plt.legend()

# return rate distribution for low ratings
plt.subplot(2, 2, 4)
plt.hist([low_product_quality['ReturnRate'], low_delivery_time['ReturnRate'], low_customer_service['ReturnRate'], low_website_ease_of_use['ReturnRate']], bins=10, label=['Product Quality', 'Delivery Time', 'Customer Service', 'Website Ease of Use'])
plt.title('Return Rate Distribution for Low Ratings')
plt.xlabel('Return Rate')
plt.ylabel('Frequency')
plt.legend()

plt.tight_layout()
plt.show()

The histograms indicate several key insights for root cause analysis of low ratings across different aspects of customer satisfaction. Customers giving low ratings span a wide age range, with notable peaks around ages 30-40 and 50-60, which suggests age-related dissatisfaction trends. Purchase amount and frequency distributions reveal that low ratings are not limited to low spenders or infrequent buyers; even high spenders and frequent buyers express dissatisfaction, which shows service quality issues. The return rate distribution shows that higher return rates correlate with low ratings, particularly for product quality and website ease of use, which indicates dissatisfaction with product and website experiences.

Summary

So, this is how we can perform Customer Satisfaction Analysis with Python. Customer Satisfaction Analysis is the process of collecting, analyzing, and interpreting data regarding how satisfied customers are with a company’s products, services, and overall experience.

I hope you liked this article on Customer Satisfaction Analysis with Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.