Business Simulations With Python

Using Monte Carlo to Explore Customer Lifetime Value and Customer Acquisition Cost

Nov 3, 2019

12 min read

As I wrote this post on business strategy for data scientists, I realized it might be fun to run a simulation for a marketing campaign and the cohort of new customers that it produces.

A lot of my professional life has been spent in the world of finance and investing. I’m more used to thinking about companies as a whole (analyzing them at the aggregate level) than on a day to day level. But I do find the more granular details really interesting. After all, what’s a whole if it’s not the sum of its parts?

A great way to learn more about something is to simulate it (with reasonable inputs and assumptions) and then study the results. You will often be surprised by how seemingly simple interactions among a few key variables can produce some surprising results. So let’s use Python to simulate a marketing campaign and see what happens.

We will focus our simulation on customer lifetime value (CLTV) and customer acquisition cost (CAC), the key metrics of many a startup (and public companies as well). In my previous business strategy post, I explored these metrics in depth conceptually, so please refer to it if you want more background.

Coding Up A Marketing Campaign

The detailed code for this simulation can be found here on my GitHub. But here is the high level overview of what we are trying to do:

Define a function that runs a marketing campaign – given some inputs like conversion rate, cost, budget, etc., I want to return a list of newly signed up customers.
Define a function that simulates a cohort of customers – given some assumptions about churn, spending frequency, and spending amount, I want to simulate the performance of this cohort of customers and ultimately calculate their average lifetime value to my business.
Run a bunch of simulations and look at the distribution of results.

Let’s write the function for running a marketing campaign first. We can do this one of two ways – we use a randomly generated number to represent the outcome of each impression (each impression has a tiny probability of turning into a paying customer) and then tally up our results (using a for loop). Or we can take advantage of a statistical model. Let’s go with the latter as looping millions of times is quite slow. Instead we can take advantage of our old friend, the binomial distribution (and it gives me an excuse to link to my old blog post – please read if you are unfamiliar with the binomial distribution).

Quoting my previous self:

The binomial distribution is the probability distribution of a sequence of experiments where each experiment produces a binary outcome and where each of the outcomes is independent of all the others.

Whether or not the outcome of each impression is truly independent or not is debatable, but for our purposes we can safely make that assumption. That means we can use a random variable with a binomial distribution to model our marketing campaign. Our budget and the CPM (cost per 1,000 impressions from an ad vendor like Google) determines the number of trials. And the conversion rate of each impression is the probability of success. The following function does what I just described:

# Function for calculating the results of a marketing campaign
def run_campaign(spend, cpm, conversion_rate):
 return np.random.binomial(spend/cpm*1000, conversion_rate)

The conversion rate itself is just an estimate so we should inject some uncertainty into it as well. For a first pass, it’s reasonable to assume that our conversion rate is normally distributed. Note that I set a floor for the conversion rate as negative conversion rates don’t make sense.

# From the expected conversion rate and stdev of it, get the
# realized conversion rate
def get_conversion_rate(expected, stdev):
 conversion_rate = max(expected + np.random.normal()*stdev, 
 0.000001)
 return conversion_rate

Now let’s give it some inputs – spend is the amount that we plan to spend on our marketing campaign, cpm is the cost for 1,000 impressions, and the two conversion rate related variables are our estimates for the rate’s expected value and standard deviation. The expected conversion rate looks small, but it’s actually not that small. Another way to see it is that for every 20,000 impressions served, we expect to get one customer. That sounds like a lot before you realize that thanks to Google, the cost per thousand impressions is $2 – so 20,000 impressions costs us just $40.

# Budget
spend = 50000
# Cost per thousand
cpm = 2
# Conversion rate
conversion_rate_expected = 0.00005
conversion_rate_stdev = 0.00002

Now let’s give our functions a try. The variable cohort_size is the one we are after – it is the number of new customers that result from our marketing campaign. And with it, we can calculate our customer acquisition cost (CAC). CAC is the total cost of our marketing effort divided by the number of customers we gained from that effort.

# Let's call it to get the number of new customers from our campaign
conversion_rate = get_conversion_rate(conversion_rate_expected, 
 conversion_rate_stdev)
cohort_size = run_campaign(spend, cpm, conversion_rate)

# And calculate our Customer Acquisition cost
CAC = spend/cohort_size

print('Customers Gained: ', cohort_size)
print('CAC: ', round(CAC, 2))

Running the above lines produces the following results:

Customers Gained: 1,309

CAC (Customer Acquisition Cost): $50,000 / 1,309 = $38

Of course if we run it again, we would expect to see different values as we’ve injected two layers of randomness (the conversion rate as a normally distributed random variable and the number of people that ultimately become customers as a binomial distributed random variable).

Simulating the Customer Cohort Over Time

Our next step is to write some functions that simulate how our customer cohort changes over time. Each year, some members of the cohort churn (and are no longer customers) while the remaining ones buy goods from us.

We can use a random number generator (that generates a float between 0 and 1, uniformly distributed) to simulate whether a specific customer churns each year. The logic goes like this – for each customer in the cohort in a given year, we generate a random number. If the number we generated is below a minimum threshold, then that customer has canceled. For simplicity, we are assuming that all cancels happen at the beginning of the year, so the churned customers make zero purchases from us in the year that they leave.

# Function that models the progression of a cohort over time
def simulate_cohort(cohort_size, churn_rate, transactions, price, 
 retention_cost, yrs=5):
 customers_left = []
 spending = []
 profit = []
 for i in range(yrs):
 for customer in range(cohort_size):
 # Assume cancels happen at the start of the year 
 # (for simplicity)
 churn_random_num = np.random.random()
 # Generate a random number between 0 and 1, if less 
 # than churn_rate then customer has churned and we 
 # subtract 1 from cohort_size
 if churn_random_num <= churn_rate:
 cohort_size += -1
 # Calculate and record cohort's data
 customers_left.append(cohort_size)
 spending.append(cohort_size*transactions*price)
 profit.append(cohort_size*(transactions*price -
 retention_cost))
 return customers_left, spending, profit

Next, let’s collect our inputs and run our function. To be more realistic, we should assume a per customer annual retention cost. This is the amount we need to spend each year to keep a particular customer happy and loyal to our service. Note that I only forecast 5 years of profits – our remaining customers might still buy from us in year 6 but to be conservative, I will assume that a given cohort of customers has at most a 5 year life.

churn_rate = 0.20
# Number of annual transactions per average cohort member
transactions = 6
# Price of goods sold per average transaction
price = 10
# Annual cost of retaining/servicing customer
retention_cost = 20

# Run the function
customers_left, spending, profit =
 simulate_cohort(cohort_size, churn_rate, transactions, 
 price, retention_cost, yrs=5)

The output we are most interested in is profit, the income to our business produced by our cohort of customers, as that is the primary driver of customer lifetime value. There are other indirect drivers as well such as referrals from satisfied customers but we will not model those here today.

Before we can calculate customer lifetime value (CLTV), we need to write a present value function. Customers buy from us over time (by spending a bit on our products each year for as long as they are customers). This means that a lot of the payoff occurs in the future. And future dollars are worth less than today’s dollars because of inflation, opportunity cost, and uncertainty. We can approximately adjust for these effects by applying a haircut to any dollars that we receive in the future (the further out we receive the money, the more it needs to be discounted). The following function does just that:

def present_value(cashflows, rate):
 pvs = []
 for i, val in enumerate(cashflows):
 pvs.append(val/(1 + rate)**(i+1))
 return pvs

We have what we need to calculate CLTV now. We will apply a discount rate of 10% to transform profits received in the future into present values (in today’s dollars). For those that are curious, discount rates are complicated because we are trying to express numerous factors (inflation, risk, our hopes and dreams, etc) all through one single number – there is a ton of literature out there on how to properly estimate a discount rate so I will not go into it here.

# Calculate CLTV
rate = 0.10

# Get the PV of the profits
pvs = present_value(profit, rate)
# Value of the cohort in today's dollars is sum of PVs
cohort_value = sum(pvs)

print('Total Cohort Value: ', int(cohort_value))
print('CLTV: ', int(cohort_value/cohort_size))
print('CLTV-CAC Spread: ', int(cohort_value/cohort_size - CAC))

Let’s take a look at how profits and their present values compare:

👁 Profit and Its Present Value

Profit and Its Present Value

Profits from the cohort (in blue) decline over time as more and more customers churn. The present values of profits (in orange) decline at an even faster rate because the ones that are further out in the future are discounted more severely.

The sum of the present values is the value of our cohort, and our CLTV is the cohort value divided by the initial size of the cohort:

Total Cohort Value: $113,285

CLTV (Customer Lifetime Value): $113,285 / 1,309 = $86

CLTV-CAC Spread: $86 – $38 = $48

The value we really care about is the spread between CLTV and CAC – if it’s negative, then we are not going to be in business for long. Granted, there are certain cases when a negative spread is OK. For example, early stage startups that are desperate to scale will spend a ton to attract even low value customers. That’s because their outcome distribution is pretty binary (and their risk/reward is pretty asymmetric). If they scale fast enough, they build a successful business and become rich. If they don’t reach the required scale and fizzle out, well it was other people’s money anyway (this attitude is a byproduct of how cheap and available money has been, especially from venture capitalists, for a number of years now).

Of course, the expectation is that the spread for future customer cohorts will be positive – as the company and brand gain steam, CLTV should increase while CAC declines.

Running 1,000 Scenarios

Finally, we have all the requisite pieces and can run our Monte Carlo simulation now. We will simulate 1,000 marketing campaigns so that we can look at the distribution of outcomes.

# Simulate 1000 times and look at the distributions

cohort_size_list = []
CAC_list = []
CLTV_list = []

for i in range(1000):

 # Run marketing campaign sim
 conversion_rate = get_conversion_rate(conversion_rate_expected, 
 conversion_rate_stdev)
 cohort_size = run_campaign(spend, cpm, conversion_rate)
 CAC = spend/cohort_size

 # Simulate the resulting cohort
 customers_left, spending, profit =
 simulate_cohort(cohort_size, churn_rate, transactions, 
 price, retention_cost, yrs=5)

 cohort_value = sum(present_value(profit, rate))

 cohort_size_list.append(cohort_size)
 CAC_list.append(CAC)
 CLTV_list.append(cohort_value/cohort_size)

# Store simulation results in a dataframe
results_df = pd.DataFrame()
results_df['initial_cohort_size'] = cohort_size_list
results_df['CLTV'] = CLTV_list
results_df['CAC'] = CAC_list
results_df['Spread'] = results_df['CLTV'] - results_df['CAC']

The starting size of our cohort of customers can vary widely because we have 2 layers of randomness (we allow the "true" conversion rate of each campaign to vary and each impression that goes out is itself a trial with a random outcome):

👁 Initial Cohort Size Distribution

Initial Cohort Size Distribution

And if the cohort size can vary widely, so can CAC (we expect it to have a median value of around $40 (because or our input assumptions). And wow, it is pretty wide. The median CAC is $40, but it is possible to experience terrible campaign returns (where very few customers sign up) and get CACs in the triple digits – more than 6% of the simulations resulted in a CAC of $100 or more. So don’t expect the worst but do be prepared for it.

The shape of the distribution is interesting. From a CAC perspective, it’s asymmetrical – if we sign up a lot of customers, we get a low CAC, but there is a realistic limit to how low it can go (I call it a soft cap to our upside). But if we don’t sign up many customers, we could potentially see very high CACs that could be disastrous for our company.

👁 CAC Distribution

CAC Distribution

Now let’s look at the revenue side of the picture, CLTV. There is noticeably less variance here because of the way I modeled it. I could have allowed for more variance by injecting randomness into the annual churn rate or the customer transactions (by varying the frequency and value of each transaction). If I had, the bell curve would be wider (due to more sources of variation and uncertainty), but would still have the same mean value and general shape.

👁 CLTV Distribution

CLTV Distribution

Finally, let’s take a look at the spread between CLTV and CAC. Here, again we see a long tail and the soft cap, but this time it’s flipped (because we subtract CAC from CLTV when calculating the spread). What it means is twofold:

Even if we have a massively successful campaign, there is a limit to the per customer upside (as measured by the spread) we can achieve. This limit exists because of the fact that CAC can only go so low (for example, the lowest CAC encountered in the 1,000 scenarios that I ran was $16) and each customer can only buy so much.
On the other hand, if we have a disastrous campaign, the spread could easily be negative, meaning that the resulting customers will likely never make back their cost of acquisition. Depending on the phase of the company and the scale of the marketing campaign, this may not be a deal breaker (it could just be a temporary and one-off hiccup). But this should dissuade us from betting a significant chunk of our firm’s assets or budget on one massive campaign – a failure there could spell doom for the company.

👁 CLTV-CAC Spread Distribution

CLTV-CAC Spread Distribution

Hope this was helpful, cheers!

More Data Science and Analytics Related Posts By Me:

Business Strategy For Data Scientists

What Do Data Scientists Do?

Understanding Bayes’ Theorem

Understanding The Naive Bayes Classifier

The Binomial Distribution

_Understanding PCA_

Written By

Tony Yiu