Customer Satisfaction Analysis is the process of collecting, analyzing, and interpreting data regarding how satisfied customers are with a company’s products, services, and overall experience. If you want to learn how to analyze the satisfaction of customers with a business and how to make further decisions based on satisfaction levels, this article is for you. In this article, I’ll take you through the task of Customer Satisfaction Analysis with Python.
Customer Satisfaction Analysis: Overview
Customer Satisfaction Analysis involves collecting, analyzing, and interpreting data on customer experiences and perceptions through surveys, feedback forms, ratings, and reviews. By identifying key drivers of satisfaction and dissatisfaction, businesses can make informed decisions to improve products, services, and customer interactions.
It helps retain customers, boost loyalty and advocacy, drive sales growth, and gain a competitive edge, which ultimately enhances overall business performance and customer experience.
To get started with the task of Customer Satisfaction Analysis, we need a dataset based on customer satisfaction and feedback. I found an ideal dataset for this task, which contains features like:
- CustomerID: Unique identifier for each customer.
- Age: Age of the customer.
- Gender: Gender of the customer (Male/Female).
- PurchaseAmount: Total amount spent by the customer.
- PurchaseFrequency: Number of purchases made by the customer.
- ProductQualityRating: Customer rating for product quality (1-5).
- DeliveryTimeRating: Customer rating for delivery time (1-5).
- CustomerServiceRating: Customer rating for customer service (1-5).
- WebsiteEaseOfUseRating: Customer rating for website ease of use (1-5).
- ReturnRate: Proportion of products returned by the customer.
- DiscountUsage: Amount of discount used by the customer.
- LoyaltyProgramMember: Whether the customer is a loyalty program member (Yes/No).
You can download the dataset from here.
Customer Satisfaction Analysis with Python
Now, let’s get started with the task of Customer Satisfaction Analysis by importing the necessary Python libraries and the dataset:
import pandas as pd
data = pd.read_csv("/content/E-commerce_NPA_Dataset.csv")
print(data.head())CustomerID Age Gender PurchaseAmount PurchaseFrequency \
0 1 38 Female 749.097626 24
1 2 30 Male 735.224916 18
2 3 55 Male 1145.520305 22
3 4 39 Female 925.460535 14
4 5 51 Male 108.359916 9
ProductQualityRating DeliveryTimeRating CustomerServiceRating \
0 2 1 3
1 5 4 4
2 3 5 4
3 4 3 2
4 3 3 2
WebsiteEaseOfUseRating ReturnRate DiscountUsage LoyaltyProgramMember
0 5 0.12 135.392573 No
1 5 0.37 193.450663 Yes
2 1 0.10 147.246263 Yes
3 4 0.38 56.362894 Yes
4 5 0.42 338.731055 No
Let’s have a look at the summary statistics of the data:
print(data.describe())
CustomerID Age PurchaseAmount PurchaseFrequency \
count 500.000000 500.000000 500.000000 500.000000
mean 250.500000 44.170000 1065.050731 14.308000
std 144.481833 14.813777 583.199658 8.151197
min 1.000000 18.000000 51.799790 1.000000
25% 125.750000 32.000000 535.083407 7.000000
50% 250.500000 44.000000 1100.884065 14.000000
75% 375.250000 58.000000 1584.348124 22.000000
max 500.000000 69.000000 1999.655968 29.000000
ProductQualityRating DeliveryTimeRating CustomerServiceRating \
count 500.00000 500.000000 500.0000
mean 2.93400 3.008000 3.0780
std 1.41054 1.372481 1.4156
min 1.00000 1.000000 1.0000
25% 2.00000 2.000000 2.0000
50% 3.00000 3.000000 3.0000
75% 4.00000 4.000000 4.0000
max 5.00000 5.000000 5.0000
WebsiteEaseOfUseRating ReturnRate DiscountUsage
count 500.000000 500.000000 500.000000
mean 3.082000 0.252280 251.181010
std 1.415374 0.149674 141.531993
min 1.000000 0.000000 0.772696
25% 2.000000 0.110000 133.672231
50% 3.000000 0.260000 251.940355
75% 4.000000 0.380000 371.692341
max 5.000000 0.500000 499.813315
The summary statistics provide insights into the central tendency, dispersion, and range of the numeric data. Here are some key observations:
- Age: The average customer age is around 44 years, with a range from 18 to 69 years.
- Purchase Amount: The average purchase amount is $1065, with a significant standard deviation indicating variability in spending.
- Purchase Frequency: Customers purchase on average about 14 times, with some making up to 29 purchases.
- Ratings: The average ratings for product quality, delivery time, customer service, and website ease of use are around 3, indicating moderate satisfaction levels. These ratings range from 1 (poor) to 5 (excellent).
- Return Rate: The average return rate is 25%, with some customers having a return rate as high as 50%.
- Discount Usage: The average discount usage is around $251, with high variability.
Now, let’s visualize the distributions of these variables:
import matplotlib.pyplot as plt
numeric_cols = ['Age', 'PurchaseAmount', 'PurchaseFrequency', 'ProductQualityRating', 'DeliveryTimeRating', 'CustomerServiceRating', 'WebsiteEaseOfUseRating', 'ReturnRate', 'DiscountUsage']
plt.figure(figsize=(15, 20))
for i, col in enumerate(numeric_cols, 1):
plt.subplot(5, 2, i)
plt.hist(data[col], bins=20, edgecolor='k', alpha=0.7)
plt.title(f'Distribution of {col}')
plt.xlabel(col)
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
The histograms reveal several insights into customer demographics and satisfaction metrics:
- the age distribution is relatively even with slight peaks in the 30s and 60s;
- purchase amounts are right-skewed, indicating most customers spend less than $1000;
- purchase frequency is varied, with notable peaks around 10 and 20 purchases;
- satisfaction ratings for product quality, delivery time, customer service, and website ease of use show a wide distribution, with most ratings clustering around the middle values;
- return rates are varied with peaks around 0.1 and 0.4;
- and discount usage is evenly spread, showing no significant trend.
These findings suggest that customer experiences and behaviours are diverse, with varying levels of satisfaction across different service aspects.
Now, let’s segment the customers based on demographic and behavioral factors, and analyze their satisfaction ratings. We’ll create segments based on age, gender, and loyalty program membership. First, let’s analyze satisfaction ratings across different age groups and genders:
# create age groups bins = [18, 30, 40, 50, 60, 70] labels = ['18-29', '30-39', '40-49', '50-59', '60-69'] data['AgeGroup'] = pd.cut(data['Age'], bins=bins, labels=labels, right=False) # select only the numeric columns for calculation numeric_columns = ['ProductQualityRating', 'DeliveryTimeRating', 'CustomerServiceRating', 'WebsiteEaseOfUseRating'] # calculate mean ratings by age group and gender mean_ratings_age_gender = data.groupby(['AgeGroup', 'Gender'])[numeric_columns].mean() # reset the index to display the dataframe mean_ratings_age_gender.reset_index(inplace=True) print(mean_ratings_age_gender)
AgeGroup Gender ProductQualityRating DeliveryTimeRating \
0 18-29 Female 3.052632 3.210526
1 18-29 Male 2.933333 3.000000
2 30-39 Female 2.929825 2.859649
3 30-39 Male 3.080000 2.820000
4 40-49 Female 3.090909 2.890909
5 40-49 Male 2.857143 3.166667
6 50-59 Female 2.945946 2.945946
7 50-59 Male 2.833333 2.895833
8 60-69 Female 2.900000 3.300000
9 60-69 Male 2.673469 2.938776
CustomerServiceRating WebsiteEaseOfUseRating
0 3.175439 2.912281
1 3.333333 3.355556
2 2.912281 3.070175
3 2.980000 2.880000
4 3.036364 3.109091
5 3.142857 3.142857
6 3.027027 3.162162
7 3.312500 3.062500
8 3.066667 2.950000
9 2.836735 3.285714
The data shows mean satisfaction ratings by age group and gender for different aspects of service. Here are some insights:
- Younger customers (18-29) generally rate product quality slightly higher.
- Females in the 40-49 age group give the highest ratings, while males in the 60-69 age group give the lowest.
- Delivery time satisfaction is relatively consistent across age groups, with minor variations.
- The highest ratings for delivery time satisfaction are from females aged 60-69, while the lowest are from males aged 30-39.
- Customer service ratings are fairly consistent, with a slight peak among younger males (18-29).
- Males in the 60-69 age group rate customer service the lowest.
Next, let’s analyze the impact of loyalty program membership on customer satisfaction:
# select only the numeric columns for calculation
numeric_columns = ['ProductQualityRating', 'DeliveryTimeRating', 'CustomerServiceRating', 'WebsiteEaseOfUseRating', 'ReturnRate', 'DiscountUsage']
# calculate mean ratings by loyalty program membership
mean_ratings_loyalty = data.groupby('LoyaltyProgramMember')[numeric_columns].mean()
# reset the index to display the dataframe
mean_ratings_loyalty.reset_index(inplace=True)
print(mean_ratings_loyalty)LoyaltyProgramMember ProductQualityRating DeliveryTimeRating \
0 No 2.920502 2.916318
1 Yes 2.946360 3.091954
CustomerServiceRating WebsiteEaseOfUseRating ReturnRate DiscountUsage
0 2.987448 3.108787 0.251883 241.426710
1 3.160920 3.057471 0.252644 260.113108
The data shows mean satisfaction ratings, return rates, and discount usage for loyalty program members versus non-members. Here are the insights:
- Product Quality Rating: Loyalty program members rate product quality slightly higher (2.95) compared to non-members (2.92).
- Delivery Time Rating: Loyalty program members are more satisfied with delivery time (3.09) than non-members (2.92).
- Customer Service Rating: Members rate customer service higher (3.16) compared to non-members (2.99).
- Website Ease of Use Rating: Non-members rate the website slightly higher (3.11) than members (3.06).
- Return Rate: The return rates are almost identical between members (0.25) and non-members (0.25).
- Discount Usage: Members use slightly more discounts ($260) compared to non-members ($241).
Net Promoter Score
Now, let’s calculate the Net Promoter Score. NPS is a metric used to gauge customer loyalty and satisfaction by asking customers how likely they are to recommend a company’s product or service to others on a scale of 0 to 10. Respondents are classified into three categories:
- Promoters (9-10)
- Passives (7-8)
- Detractors (0-6)
The NPS is calculated by subtracting the percentage of Detractors from the percentage of Promoters. A higher NPS indicates more customer loyalty and positive word-of-mouth, which are critical for business growth.
To calculate the NPS, we will use customer service ratings as a proxy for overall satisfaction. Here’s how to calculate NPS:
# define NPS categories based on customer service rating data['NPS_Category'] = pd.cut(data['CustomerServiceRating'], bins=[0, 6, 8, 10], labels=['Detractors', 'Passives', 'Promoters'], right=False) # calculate NPS nps_counts = data['NPS_Category'].value_counts(normalize=True) * 100 nps_score = nps_counts['Promoters'] - nps_counts['Detractors'] nps_counts
NPS_Category
Detractors 100.0
Passives 0.0
Promoters 0.0
Name: proportion, dtype: float64
nps_score
-100.0
The NPS calculation shows:
- Detractors: 100% of customers fall into the Detractors category.
- Passives: 0%
- Promoters: 0%
This results in an NPS score of -100, which indicates extremely low customer satisfaction. This score is a critical indicator that significant improvements are needed in customer service to convert detractors into promoters.
Root Cause Analysis for Low Satisfaction
Now, we will perform a root cause analysis on customer dissatisfaction by identifying the key factors contributing to low ratings in specific areas such as product quality, delivery time, customer service, and website ease of use. We’ll analyze the characteristics of customers who provide low ratings and look for patterns that can help us understand the root causes of dissatisfaction.
We can perform root cause analysis for low ratings by identifying the customers with low ratings and analyzing the characteristics of these customers. We’ll create subsets of the data where ratings are low (1 or 2) for product quality, delivery time, customer service, and website ease of use:
# define low rating threshold
low_rating_threshold = 2
# create subsets for low ratings in different aspects
low_product_quality = data[data['ProductQualityRating'] <= low_rating_threshold]
low_delivery_time = data[data['DeliveryTimeRating'] <= low_rating_threshold]
low_customer_service = data[data['CustomerServiceRating'] <= low_rating_threshold]
low_website_ease_of_use = data[data['WebsiteEaseOfUseRating'] <= low_rating_threshold]
# plot the characteristics for each low rating subset
plt.figure(figsize=(20, 15))
# age distribution for low ratings
plt.subplot(2, 2, 1)
plt.hist([low_product_quality['Age'], low_delivery_time['Age'], low_customer_service['Age'], low_website_ease_of_use['Age']], bins=10, label=['Product Quality', 'Delivery Time', 'Customer Service', 'Website Ease of Use'])
plt.title('Age Distribution for Low Ratings')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.legend()
# purchase amount distribution for low ratings
plt.subplot(2, 2, 2)
plt.hist([low_product_quality['PurchaseAmount'], low_delivery_time['PurchaseAmount'], low_customer_service['PurchaseAmount'], low_website_ease_of_use['PurchaseAmount']], bins=10, label=['Product Quality', 'Delivery Time', 'Customer Service', 'Website Ease of Use'])
plt.title('Purchase Amount Distribution for Low Ratings')
plt.xlabel('Purchase Amount')
plt.ylabel('Frequency')
plt.legend()
# purchase frequency distribution for low ratings
plt.subplot(2, 2, 3)
plt.hist([low_product_quality['PurchaseFrequency'], low_delivery_time['PurchaseFrequency'], low_customer_service['PurchaseFrequency'], low_website_ease_of_use['PurchaseFrequency']], bins=10, label=['Product Quality', 'Delivery Time', 'Customer Service', 'Website Ease of Use'])
plt.title('Purchase Frequency Distribution for Low Ratings')
plt.xlabel('Purchase Frequency')
plt.ylabel('Frequency')
plt.legend()
# return rate distribution for low ratings
plt.subplot(2, 2, 4)
plt.hist([low_product_quality['ReturnRate'], low_delivery_time['ReturnRate'], low_customer_service['ReturnRate'], low_website_ease_of_use['ReturnRate']], bins=10, label=['Product Quality', 'Delivery Time', 'Customer Service', 'Website Ease of Use'])
plt.title('Return Rate Distribution for Low Ratings')
plt.xlabel('Return Rate')
plt.ylabel('Frequency')
plt.legend()
plt.tight_layout()
plt.show()
The histograms indicate several key insights for root cause analysis of low ratings across different aspects of customer satisfaction. Customers giving low ratings span a wide age range, with notable peaks around ages 30-40 and 50-60, which suggests age-related dissatisfaction trends. Purchase amount and frequency distributions reveal that low ratings are not limited to low spenders or infrequent buyers; even high spenders and frequent buyers express dissatisfaction, which shows service quality issues. The return rate distribution shows that higher return rates correlate with low ratings, particularly for product quality and website ease of use, which indicates dissatisfaction with product and website experiences.
Summary
So, this is how we can perform Customer Satisfaction Analysis with Python. Customer Satisfaction Analysis is the process of collecting, analyzing, and interpreting data regarding how satisfied customers are with a company’s products, services, and overall experience.
I hope you liked this article on Customer Satisfaction Analysis with Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.





