In the marketing industry, data-driven decision-making is crucial for optimizing campaigns, understanding consumer behaviour, and maximizing return on investment (ROI). If you are aiming to work as a Data Science professional in the marketing domain, this article is for you. In this article, I’ll take you through five essential formulas for Data Science in marketing that every Data Scientist or Analyst working in the marketing domain should be familiar with and how to implement them using Python.
Essential Marketing Formulas for Data Science
Below are some essential formulas for Data Science in marketing:
- Customer Lifetime Value (CLV)
- Conversion Rate
- Return on Advertising Spend (ROAS)
- Customer Acquisition Cost (CAC)
- Net Promoter Score (NPS)
Let’s go through all these essential formulas for Data Science in marketing in detail with implementation using Python.
Customer Lifetime Value (CLV)
Customer Lifetime Value helps understand the total value a customer is expected to contribute to a business throughout their relationship. Below is the formula used for calculating CLV:
CLV = Average Order Value × Purchase Frequency × Customer Lifespan
Now, let’s see how to calculate CLV using Python. I’ll first create a sample data for this task:
import pandas as pd
# sample data
data = {
'CustomerID': [101, 102, 103, 104, 105],
'Total Revenue': [1200, 5000, 1500, 3200, 2800],
'Total Orders': [10, 25, 15, 32, 20],
'Customer Acquisition Year': [2015, 2018, 2016, 2019, 2017],
'Last Purchase Year': [2020, 2022, 2023, 2023, 2021]
}
customer_df = pd.DataFrame(data)
print(customer_df)CustomerID Total Revenue Total Orders Customer Acquisition Year \
0 101 1200 10 2015
1 102 5000 25 2018
2 103 1500 15 2016
3 104 3200 32 2019
4 105 2800 20 2017
Last Purchase Year
0 2020
1 2022
2 2023
3 2023
4 2021
From this data, the Average Order Value (AOV) can be calculated as Total Revenue divided by Total Orders. Purchase Frequency can be calculated as Total Orders divided by the number of years the customer has been active (Last Purchase Year – Customer Acquisition Year + 1). Customer Lifespan is calculated as the difference between the Last Purchase Year and the Customer Acquisition Year plus one (to include both years in the count). Let’s calculate these values:
# calculating AOV, Purchase Frequency, and Customer Lifespan customer_df['AOV'] = customer_df['Total Revenue'] / customer_df['Total Orders'] customer_df['Customer Lifespan'] = customer_df['Last Purchase Year'] - customer_df['Customer Acquisition Year'] + 1 customer_df['Purchase Frequency'] = customer_df['Total Orders'] / customer_df['Customer Lifespan'] print(customer_df)
CustomerID Total Revenue Total Orders Customer Acquisition Year \
0 101 1200 10 2015
1 102 5000 25 2018
2 103 1500 15 2016
3 104 3200 32 2019
4 105 2800 20 2017
Last Purchase Year AOV Customer Lifespan Purchase Frequency
0 2020 120.0 6 1.666667
1 2022 200.0 5 5.000000
2 2023 100.0 8 1.875000
3 2023 100.0 5 6.400000
4 2021 140.0 5 4.000000
Now, here’s how to calculate the CLV:
customer_df['CLV'] = customer_df['AOV'] * customer_df['Purchase Frequency'] * customer_df['Customer Lifespan'] print(customer_df)
CustomerID Total Revenue Total Orders Customer Acquisition Year \
0 101 1200 10 2015
1 102 5000 25 2018
2 103 1500 15 2016
3 104 3200 32 2019
4 105 2800 20 2017
Last Purchase Year AOV Customer Lifespan Purchase Frequency CLV
0 2020 120.0 6 1.666667 1200.0
1 2022 200.0 5 5.000000 5000.0
2 2023 100.0 8 1.875000 1500.0
3 2023 100.0 5 6.400000 3200.0
4 2021 140.0 5 4.000000 2800.0
CLV gives a measure of the total value each customer is expected to bring over their lifetime based on their current purchasing behaviour.
Conversion Rate
Conversion Rate means measuring the effectiveness of marketing campaigns in converting leads into customers. Below is the formula used for calculating the conversion rate:

To demonstrate how to calculate the conversion rate effectively, let’s create a small sample dataset that includes the necessary information such as the number of users visiting a website, the number who engaged in some significant action (like adding items to a cart), and the number who completed a purchase:
# creating sample data
data = {
'Date': ['2024-04-10', '2024-04-11', '2024-04-12', '2024-04-13', '2024-04-14'],
'Visitors': [200, 180, 220, 210, 190],
'Engaged_Users': [150, 140, 160, 155, 145],
'Purchases': [30, 28, 35, 33, 29]
}
conversion_data = pd.DataFrame(data)
print(conversion_data)Date Visitors Engaged_Users Purchases
0 2024-04-10 200 150 30
1 2024-04-11 180 140 28
2 2024-04-12 220 160 35
3 2024-04-13 210 155 33
4 2024-04-14 190 145 29
Now, let’s calculate the conversion rate in two different ways:
- Conversion Rate based on Total Visitors: This measures the percentage of visitors that resulted in conversions.
- Conversion Rate based on Engaged Users: This focuses on users who showed significant engagement (like adding items to the cart) and then converted.
# calculate conversion rate based on total visitors conversion_data['Conversion_Rate_Visitors'] = (conversion_data['Purchases'] / conversion_data['Visitors']) * 100 # calculate conversion rate based on engaged users conversion_data['Conversion_Rate_Engaged'] = (conversion_data['Purchases'] / conversion_data['Engaged_Users']) * 100 print(conversion_data)
Date Visitors Engaged_Users Purchases Conversion_Rate_Visitors \
0 2024-04-10 200 150 30 15.000000
1 2024-04-11 180 140 28 15.555556
2 2024-04-12 220 160 35 15.909091
3 2024-04-13 210 155 33 15.714286
4 2024-04-14 190 145 29 15.263158
Conversion_Rate_Engaged
0 20.000000
1 20.000000
2 21.875000
3 21.290323
4 20.000000
These metrics provide insights into how well the site is converting visitors into buyers and the effectiveness of engaging users in the purchasing process.
Return on Advertising Spend (ROAS)
Return on Advertising Spend helps assess an advertising campaign’s efficiency, i.e., how much revenue each dollar spent on advertising brings back. Below is the formula to calculate the ROAS:
ROAS = Revenue from Ad Campaign / Cost of Ad Campaign
To calculate the Return on Advertising Spend (ROAS), we need information about the revenue generated from the advertising and the cost associated with it. Let’s import a dataset based on an advertising campaign for calculating ROAS. You can download the dataset from here:
control_group_data = pd.read_csv("control_group.csv", delimiter=';')
print(control_group_data.head())Campaign Name Date Spend [USD] # of Impressions Reach \
0 Control Campaign 1.08.2019 2280 82702.0 56930.0
1 Control Campaign 2.08.2019 1757 121040.0 102513.0
2 Control Campaign 3.08.2019 2343 131711.0 110862.0
3 Control Campaign 4.08.2019 1940 72878.0 61235.0
4 Control Campaign 5.08.2019 1835 NaN NaN
# of Website Clicks # of Searches # of View Content # of Add to Cart \
0 7016.0 2290.0 2159.0 1819.0
1 8110.0 2033.0 1841.0 1219.0
2 6508.0 1737.0 1549.0 1134.0
3 3065.0 1042.0 982.0 1183.0
4 NaN NaN NaN NaN
# of Purchase
0 618.0
1 511.0
2 372.0
3 340.0
4 NaN
To calculate ROAS, we need an estimate of the revenue generated. If the dataset doesn’t directly include revenue, we can use the number of purchases to estimate it by assuming an average revenue per purchase. For this example, let’s assume each purchase generates an average revenue of $50:
average_revenue_per_purchase = 50 #an assumption # calculate total revenue control_group_data['Estimated Revenue'] = control_group_data['# of Purchase'] * average_revenue_per_purchase # calculate total spend and total estimated revenue total_spend = control_group_data['Spend [USD]'].sum() total_revenue = control_group_data['Estimated Revenue'].sum() # calculate ROAS roas = total_revenue / total_spend if total_spend else 0 total_spend, total_revenue, roas
Here are the results:
- Total Spend: USD 68,653
- Total Estimated Revenue: USD 758,050 (assuming $50 average revenue per purchase)
- Return on Advertising Spend (ROAS): Approximately 11.04
This ROAS value indicates that for every dollar spent on the advertising campaign, approximately $11.04 was returned in revenue, based on the assumptions made. It is a very healthy ROAS, suggesting that the advertising campaign was highly effective in generating revenue relative to its cost.
Customer Acquisition Cost (CAC)
Customer Acquisition Cost helps understand the cost associated with acquiring a new customer, which helps in budgeting and marketing efficiency analysis. Below is the formula for calculating CAC:
CAC = Total Marketing Expenses / Number of New Customers Acquired
To calculate the Customer Acquisition Cost (CAC), we need data on the total marketing spend and the number of new customers acquired as a result of that spending. Let’s import an ideal dataset for this. You can download the dataset from here:
cac_data = pd.read_csv("customer_acquisition_cost_dataset.csv")
print(cac_data.head())Customer_ID Marketing_Channel Marketing_Spend New_Customers
0 CUST0001 Email Marketing 3489.027844 16
1 CUST0002 Online Ads 1107.865808 33
2 CUST0003 Social Media 2576.081025 44
3 CUST0004 Online Ads 3257.567932 32
4 CUST0005 Email Marketing 1108.408185 13
The dataset contains information about marketing spend and the number of new customers acquired through various marketing channels for different customers. We can use this data to calculate the Customer Acquisition Cost (CAC) for each customer, and also aggregate the data to find the overall CAC. Let’s calculate the CAC for each customer and also provide an overall CAC across all customers:
# calculate CAC for each customer cac_data['CAC'] = cac_data['Marketing_Spend'] / cac_data['New_Customers'] # calculate overall CAC total_marketing_spend = cac_data['Marketing_Spend'].sum() total_new_customers = cac_data['New_Customers'].sum() overall_cac = total_marketing_spend / total_new_customers print(cac_data.head())
Customer_ID Marketing_Channel Marketing_Spend New_Customers CAC
0 CUST0001 Email Marketing 3489.027844 16 218.064240
1 CUST0002 Online Ads 1107.865808 33 33.571691
2 CUST0003 Social Media 2576.081025 44 58.547296
3 CUST0004 Online Ads 3257.567932 32 101.798998
4 CUST0005 Email Marketing 1108.408185 13 85.262168
overall_cac
102.91741100024466
The CAC for individual customers varies significantly based on the marketing channel used and the effectiveness of those campaigns. For example, CAC ranges from about $33.57 for some customers up to $218.06 for others, indicating a diverse efficiency of marketing efforts. The Overall CAC across all customers is approximately $102.92. This value represents the average cost to acquire a new customer across all marketing channels and campaigns included in the dataset.
Net Promoter Score (NPS)
Net Promoter Score helps businesses understand customer satisfaction and loyalty by categorizing customers into Promoters, Passives, and Detractors based on their likelihood of recommending the company. Below is the formula used for calculating NPS:

To calculate NPS, data typically includes customer responses rated on a scale of 0-10, where respondents are categorized as Promoters (9-10), Passives (7-8), and Detractors (0-6). Let’s load an ideal dataset for this task. You can download the dataset from here:
user_behaviour_data = pd.read_csv("userbehaviour.csv")
print(user_behaviour_data.head())userid Average Screen Time Average Spent on App (INR) Left Review \
0 1001 17.0 634.0 1
1 1002 0.0 54.0 0
2 1003 37.0 207.0 0
3 1004 32.0 445.0 1
4 1005 45.0 427.0 1
Ratings New Password Request Last Visited Minutes Status
0 9 7 2990 Installed
1 4 8 24008 Uninstalled
2 8 5 971 Installed
3 6 2 799 Installed
4 5 6 3668 Installed
The dataset includes a “Ratings” column, which consists of customer ratings that can be used to calculate the Net Promoter Score (NPS). Ratings are typically on a scale from 0 to 10, where:
- Promoters are rated 9-10: They are loyal enthusiasts who will keep buying and refer others.
- Passives are rated 7-8: They are satisfied but unenthusiastic customers who are vulnerable to competitive offerings.
- Detractors are rated 0-6: They are unhappy customers who can damage the brand through negative word-of-mouth.
Let’s proceed to calculate the NPS from the ratings provided in this dataset:
# classify respondents into Promoters, Passives, and Detractors promoters_count = (user_behaviour_data['Ratings'] >= 9).sum() passives_count = ((user_behaviour_data['Ratings'] >= 7) & (user_behaviour_data['Ratings'] < 9)).sum() detractors_count = (user_behaviour_data['Ratings'] < 7).sum() # calculate total respondents total_respondents = len(user_behaviour_data) # calculate NPS nps = ((promoters_count - detractors_count) / total_respondents) * 100 promoters_count, passives_count, detractors_count, nps
(279, 269, 451, -17.217217217217218)
This NPS indicates that there are more detractors than promoters among the users rated in this dataset, leading to a negative score. An NPS of -17.22 suggests challenges in customer satisfaction and loyalty, as a significant number of users are not recommending the service or product.
Summary
So these are some essential formulas for Data Science in marketing:
- Customer Lifetime Value (CLV)
- Conversion Rate
- Return on Advertising Spend (ROAS)
- Customer Acquisition Cost (CAC)
- Net Promoter Score (NPS)
I hope you liked this article on some essential formulas for Data Science in marketing with implementation using Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.






Helpful…