Measuring the Intrinsic Causal Impact of Marketing Campaigns | Ryan O'Sullivan | June 2024

Machine Learning


Causal AI explores the integration of causal inference into machine learning

Ryan O'Sullivan
Towards Data Science
Photo by Melanie Deziel on Unsplash

Welcome to our series on Causal AI, where we discuss integrating causal inference into machine learning models. We discuss numerous practical applications across a variety of business contexts.

In the previous article Optimizing nonlinear treatment effects in pricing and promotionThis time Measure the intrinsic causal impact of your marketing campaigns.

If you missed our previous article on nonlinear processing effects in pricing and promotions, you can find it here.

In this article, we will help you understand how to measure the intrinsic causal impact of your marketing campaigns.

The following points are covered:

  • What are the challenges in marketing measurement?
  • What are intrinsic influences and how do they work?
  • A practical Python case study that shows how to leverage intrinsic causal influence to give your marketing campaigns the recognition they deserve.

You can find the complete notebook here:

What types of marketing campaigns are there?

Organizations use marketing to grow their business by acquiring new customers and retaining existing ones. Marketing campaigns are typically divided into three main categories:

  • brand
  • performance
  • Retention

Each presents its own challenges when it comes to measurement, and understanding these challenges is important.

Brand Campaign

The goal of a brand campaign is to raise awareness of your brand among new audiences. Brand campaigns often take place on television and/or social media, often in video format, and typically do not include a direct call to action (e.g., “Our product lasts a lifetime”).

The challenge of measuring TV is immediately apparent: you can't track who saw your TV ad. But there's a similar challenge with social media: if someone watches a video on Facebook and then organically visits your website the next day and buys your product, it's nearly impossible to connect the two events.

There's also a secondary challenge of delayed impact: when you're building awareness among a new audience, it may take days, weeks, or even months before they consider purchasing your product.

While there is an argument to be made that brand campaigns do all the heavy lifting, when it comes to marketing measurement, they are often undervalued due to some of the challenges highlighted above.

Performance Campaigns

Performance campaigns are typically targeted to customers who are looking for your product. They run across paid search, social, and affiliate channels. They usually have a call to action, such as “Click now to get 5% off your first purchase.”

When it comes to performance campaigns, it's not immediately obvious why they're difficult to measure: You're much more likely to be able to tie the event of a customer clicking on a performance campaign to that customer making a purchase that day.

But would they have clicked if they didn't know the brand yet? How did they discover the brand? If you hadn't shown them the campaign, would they have purchased organically anyway? These are hard questions to answer from a data science perspective.

Retention Campaign

Another category of campaigns is retention – marketing aimed at keeping existing customers. Typically, you measure these campaigns by running AB tests.

Acquisition Marketing Graph

Brand and performance campaigns are commonly referred to as acquisition marketing. As we mentioned before, brand and performance campaigns are difficult to measure. Brand campaigns are often undervalued and performance campaigns are often overvalued.

The chart below is a clear (but simplified) example of how acquisition marketing works.

User-generated images

How can we (fairly) estimate how much each node contributed to the revenue? This is where intrinsic causal influence comes into play. We'll take a closer look at what it is in the next section.

Where did this concept come from?

The concept was originally proposed in a paper in 2020.

This is implemented in the GCM module in the Python package DoWhy.

Personally, I found this concept quite difficult to grasp at first, so in the next section I will explain it step by step.

Causal Graph Summary

Before attempting to understand the intrinsic causal influences, it is important to understand causal graphs, structural causal models (SCM) and additive noise models (ANM). The previous articles in this series should help you gain a deeper understanding.

As a reminder, each node in the causal graph can be seen as a target in the model where its direct parents are used as features. For each node other than the root, it is common to use an additive noise model.

User-generated images

What exactly is an essential causal influence?

Now that we have outlined what a causal graph is, let’s try to understand what essential causal influence actually is.

Intrinsic means “belonging to nature,” according to the dictionary. I think of a funnel in my mind. The top of the funnel plays a big role. We want to give the funnel the causal influence it deserves.

To further elucidate the essential causal influence, let’s look at the example graph below.

User-generated images
  • A, B, and C are root nodes.
  • D is a non-root node and can be modeled using its direct parents (A, B, C) and a noise term.
  • E is a non-root node and, like D, can be modeled using its direct parents (A, B, C) and a noise term.
  • F is the target node and can be modeled using its direct parents (D, E) and a noise term.

Let us focus on node D. Node D inherits some of the influence on node F from nodes A, B, and C. A substantial part of the influence on node F comes from the noise terms. Therefore, we can say that we can use the noise terms of each node to estimate its essential causal influence on the target node. Note that the root node consists of only noise.

The case study provides further details on how to accurately calculate the essential causal impact.

How does it help you measure your marketing campaigns?

You probably already see the connection between our acquisition marketing example and intrinsic causal influence. Can intrinsic causal influence help us avoid underestimating brand campaigns and overestimating performance campaigns? Let's take a look at a case study.

background

With the end of the year approaching, the Marketing Director is under pressure from the Finance team to justify their plans to spend heavily on marketing next year. The Finance team uses a last-click model where revenue is attributed to the last thing a customer clicks. They wonder why they need to spend on TV when everyone is coming through organic and social channels.

The data science team’s task is to estimate the unique causal impact of each marketing channel.

Graph (DAG) Setup

First, we'll use the domain knowledge of experts to set up a DAG, reusing our marketing acquisition example from earlier.

# Create node lookup for channels
node_lookup = {0: 'Demand',
1: 'TV spend',
2: 'Social spend',
3: 'Organic clicks',
4: 'Social clicks',
5: 'Revenue'
}

total_nodes = len(node_lookup)

# Create adjacency matrix - this is the base for our graph
graph_actual = np.zeros((total_nodes, total_nodes))

# Create graph using expert domain knowledge
graph_actual[0, 3] = 1.0 # Demand -> Organic clicks
graph_actual[0, 4] = 1.0 # Demand -> Social clicks
graph_actual[1, 3] = 1.0 # Brand spend -> Organic clicks
graph_actual[2, 3] = 1.0 # Social spend -> Organic clicks
graph_actual[1, 4] = 1.0 # Brand spend -> Social clicks
graph_actual[2, 4] = 1.0 # Social spend -> Social clicks
graph_actual[3, 5] = 1.0 # Organic clicks -> Revenue
graph_actual[4, 5] = 1.0 # Social clicks -> Revenue

Essentially, the last-click model used by finance teams measures marketing using only the direct parent of revenue.

User-generated images

Data Generation Process

Follow the DAG data generation process to create some data samples.

  • Three root nodes comprise noise terms: demand, brand spend, and social spend.
  • Two non-root nodes, both of which inherit the influence of the three root nodes, as well as some noise terms (organic clicks, social clicks).
  • It inherits the influence of one target node, two non-root nodes, and a noise term.
# Create dataframe with 1 column per code
df = pd.DataFrame(columns=node_lookup.values())

# Setup data generating process
df[node_lookup[0]] = np.random.normal(100000, 25000, size=(20000)) # Demand
df[node_lookup[1]] = np.random.normal(100000, 20000, size=(20000)) # Brand spend
df[node_lookup[2]] = np.random.normal(100000, 25000, size=(20000)) # Social spend
df[node_lookup[3]] = 0.75 * df[node_lookup[0]] + 0.50 * df[node_lookup[1]] + 0.25 * df[node_lookup[2]] + np.random.normal(loc=0, scale=2000, size=20000) # Organic clicks
df[node_lookup[4]] = 0.30 * df[node_lookup[0]] + 0.50 * df[node_lookup[1]] + 0.70 * df[node_lookup[2]] + np.random.normal(100000, 25000, size=(20000)) # Social clicks
df[node_lookup[5]] = df[node_lookup[3]] + df[node_lookup[4]] + np.random.normal(loc=0, scale=2000, size=20000) # Revenue

SCM Training

Now we can train the SCM using the GCM module from the Python package DoWhy. Now that we have set up the data generation process with linear relationships, we can use ridge regression as the causal mechanism for each non-root node.

# Setup graph
graph = nx.from_numpy_array(graph_actual, create_using=nx.DiGraph)
graph = nx.relabel_nodes(graph, node_lookup)

# Create SCM
causal_model = gcm.InvertibleStructuralCausalModel(graph)

causal_model.set_causal_mechanism('Demand', gcm.EmpiricalDistribution()) # Deamnd
causal_model.set_causal_mechanism('TV spend', gcm.EmpiricalDistribution()) # Brand spend
causal_model.set_causal_mechanism('Social spend', gcm.EmpiricalDistribution()) # Social spend

causal_model.set_causal_mechanism('Organic clicks', gcm.AdditiveNoiseModel(gcm.ml.create_ridge_regressor())) # Organic clicks
causal_model.set_causal_mechanism('Social clicks', gcm.AdditiveNoiseModel(gcm.ml.create_ridge_regressor())) # Social clicks
causal_model.set_causal_mechanism('Revenue', gcm.AdditiveNoiseModel(gcm.ml.create_ridge_regressor())) # Revenue

gcm.fit(causal_model, df)

Intrinsic causal influence

The GCM module makes it easy to calculate the endogenous impact: calculate and convert the contribution into a percentage.

# calculate intrinsic causal influence
ici = gcm.intrinsic_causal_influence(causal_model, target_node='Revenue')

def convert_to_percentage(value_dictionary):
total_absolute_sum = np.sum([abs(v) for v in value_dictionary.values()])
return {k: round(abs(v) / total_absolute_sum * 100, 1) for k, v in value_dictionary.items()}

convert_to_percentage(ici)

User-generated images

Let's display these in a bar graph.

# Convert dictionary to DataFrame
df = pd.DataFrame(list(ici.items()), columns=['Node', 'Intrinsic Causal Influence'])

# Create a bar plot
plt.figure(figsize=(10, 6))
sns.barplot(x='Node', y='Intrinsic Causal Influence', data=df)

# Rotate x labels for better readability
plt.xticks(rotation=45)
plt.title('Bar Plot from Dictionary Data')
plt.show()

User-generated images

Are the results intuitive? Looking back at the data generation process code, we see that it is. Notice what each non-root node inherits and what noise it adds.

The Intrinsic Influence module is very easy to use, but it doesn't help you understand the method behind it. Finally, let's explore the inner workings of Intrinsic Influence.

Intrinsic causal influence — how does it work?

It estimates how much the noise term at each node contributes to the target node.

  • Note that the root node consists of only noise terms.
  • In non-root nodes, we separate the noise terms from those inherited from the parent.
  • We also include a noise term from the target node, which can be interpreted as the contribution of an unobserved confounder (but could also be due to model misspecification).
  • We then use a noise term to explain the variance of the target nodes, which can be thought of as a model with the noise term as the feature and the target nodes as the outcome.
  • This model is used to estimate the conditional distribution of the target nodes given a subset of noise variables.
  • We then use Shapley to estimate the contribution of each noise term: if changing a noise term has little effect on the target, then its unique causal influence will be very small.

Today we've discussed how to estimate the intrinsic causal impact of your marketing campaigns. We conclude with a few observations.

  • Intrinsic causal influence is a powerful concept that can be applied to a variety of use cases, not just marketing.
  • Understanding the inner workings will allow you to apply it more effectively.
  • Identifying the DAG and accurately estimating the graph is key to obtaining reasonable estimates of unique causal influences.
  • In the marketing acquisition example, you may want to consider adding a delay effect to your brand marketing.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *