Optimizing Nonlinear Processing Effects in Pricing and Promotions | Ryan O'Sullivan

Causal AI explores the integration of causal inference into machine learning

Welcome to our series on Causal AI, where we discuss integrating causal inference into machine learning models. We discuss numerous practical applications across a variety of business contexts.

In the previous article Optimizing treatment strategies using dual machine learning and linear programmingThis time we will continue to explore the theme of optimization. Optimizing nonlinear treatment effects in pricing and promotion.

If you missed our previous article on Dual Machine Learning and Linear Programming, you can find it here.

In this article, we show you how to optimize nonlinear processing effects in pricing (though the ideas can also be applied to marketing and other fields).

In this article, you will gain a better understanding of:

Why are nonlinear treatment effects in pricing common?
Which tools in the Causal AI toolbox are suitable for estimating nonlinear treatment effects?
How can nonlinear programming be used to optimize pricing?
A Python case study illustrating how to combine the Causal AI toolbox and nonlinear programming to optimize pricing budgets.

You can find the complete notebook here:

Diminishing returns

Let's take the example of a retailer adjusting the price of a product. When they initially lower the price, they may see a large increase in sales. But if they continue to lower the price, the increase in sales may start to plateau. This is called diminishing returns. The effects of diminishing returns are generally nonlinear, as shown below.

Reduced revenue can be seen in many areas beyond pricing. Here are some common examples:

Marketing — Increasing your spending on social media can increase customer acquisition, but over time it can make it harder to target new, untapped audiences.
Agriculture — Fertilizing fields will initially greatly increase crop yields, but this effect will quickly begin to decrease.
Manufacturing —Adding workers to a production process increases efficiency, but each additional worker may contribute less to overall output.

This raises the question: if diminishing returns are so common, what techniques in the Causal AI toolbox can address it?

Toolbox

To identify which methods from the Causal AI toolbox are appropriate for your pricing problem, ask two key questions:

Can you provide ongoing treatment?
Can nonlinear treatment effects be captured?

Below we outline how each method is suitable.

Propensity Score Matching (PSM) — Treatment must be binary ❌
Inverse Propensity Score Matching (IPSM) — Treatment must be binary ❌
T-Learner — Treatment needs to be dualistic❌
Dual Machine Learning (DML) — Treatment effect is linear ❌
Doubly Robust Learner (DR) — Treatment must be binary ❌
S-Learner — can handle continuous treatments and non-linear relationships between treatments and outcomes if appropriate machine learning algorithms (e.g. gradient boosting) are used 💚

S-Learner

The “S” in S-Learner stands for “single model.” You use any machine learning model to predict an outcome using treatments, confounders, and other covariates as features. You then use this model to estimate the potential outcome difference under different treatment conditions (thus obtaining the treatment effect).

S-Learner offers the following benefits:

It can handle both binary and continuous operations.
Any machine learning algorithm can be used, giving you the flexibility to capture non-linear relationships in both features and processing.

One thing to note is regularization bias. Modern machine learning algorithms use regularization to prevent overfitting, but this can have adverse effects on causality problems. Maximum Function In gradient boosting tree methods, many trees may not include the treatment in the model, which weakens the effect of the treatment.

When using S-Learner, we recommend that you think carefully about your regularization parameters (e.g., settings). Maximum Function Set it to 1.0 (effectively turning off feature normalization).

Price Optimization

Suppose you have multiple products and want to optimize their prices based on a set promotional budget. For each product, you train S-Learner (using gradient boosting) with the treatment set as the discount level and the outcome as the total number of orders. S-Learner outputs a complex model that can be used to estimate the effect of different discount levels. But how do you optimize the discount level for each product?

Response Curve

Optimization techniques such as linear (or nonlinear) programming rely on a clear functional form for the response. Machine learning techniques such as random forests and gradient boosting do not give you this (unlike linear regression). However, a response curve converts the output of an S-learner into a comprehensive form that shows how the results respond to treatment.

If you're not yet sure how to create a response curve, don't worry, we'll cover this in a Python case study.

Michaelis-Menton equation

There are several equations that can be used to map S-Learner onto a response curve, one of which is the Micaelis-Menton equation.

The Michaelis-Menton equation is often used in enzyme kinetics (the study of how quickly enzymes catalyze chemical reactions) to describe the rate of enzyme reactions.

v — Response rate (this is the transformed response, which is the total number of orders in the pricing example)
Vmax — maximum reaction velocity (this is called alpha and is the parameter that needs to be learned);
Km — substrate concentration (this is called lambda and is the parameter we need to learn)
S — Michaelis constant (this is how we work with it, and in our pricing example it is the discount level)

This principle can be applied to other areas too, especially when dealing with systems where an increase in input does not increase the output proportionally due to a saturation factor. Below is a visualization of how different values of alpha and lambda affect the curve.

def michaelis_menten(x, alpha, lam):
return alpha * x / (lam + x)

Once we have the response curve, we can think about optimization. Micaelis-Menton provides a nonlinear function, so nonlinear programming is the right choice.

Nonlinear Programming

In the last article we looked at linear programming. Nonlinear programming is similar, but the objective function and constraints are nonlinear in nature.

Sequential least squares programming (SLSQP) is an algorithm used to solve nonlinear programming problems, and it allows both equality and inequality constraints, making it a good choice for this use case.

Equality constraints (e.g. the total promotion budget is £100,000)
Inequality constraints (e.g. the discount on each product ranges from £1 to £10)

SciPy has an easy to use implementation of SLSQP.

We then explain how powerful the combination of S-Learner, the Micaelis-Menton equation, and nonlinear programming is.

background

Historically, the promotions team has used their professional judgment when setting discounts for their top three products. However, given the current economic situation, the promotions team is being forced to reduce their overall promotions budget by 20%. They asked the data science team for advice on how to achieve this reduction while minimizing loss in orders.

Data Generation Process

We set up a data generation process with the following characteristics:

Four characteristics that have a complex relationship with order volume
Treatment effects according to the Michaelis-Menton equation

def data_generator(n, tau_weight, alpha, lam):# Set number of features
p=4
# Create features
X = np.random.uniform(size=n * p).reshape((n, -1))
# Nuisance parameters
b = (
np.sin(np.pi * X[:, 0])
+ 2 * (X[:, 1] - 0.5) ** 2
+ X[:, 2] * X[:, 3]
)
# Create treatment and treatment effect
T = np.linspace(200, 10000, n)
T_mm = michaelis_menten(T, alpha, lam) * tau_weight
tau = T_mm / T
# Calculate outcome
y = b + T * tau + np.random.normal(size=n) * 0.5
y_train = y
X_train = np.hstack((X, T.reshape(-1, 1)))
return y_train, X_train, T_mm, tau

The X features are confounding variables.

Use the data generator to create samples of three products, each with a different treatment effect.

np.random.seed(1234)n=100000
y_train_1, X_train_1, T_mm_1, tau_1 = data_generator(n, 1.00, 2, 5000)
y_train_2, X_train_2, T_mm_2, tau_2 = data_generator(n, 0.25, 2, 5000)
y_train_3, X_train_3, T_mm_3, tau_3 = data_generator(n, 2.00, 2, 5000)

S-Learner

You can train S-Learner using any machine learning algorithm and include treatments and covariates as features.

def train_slearner(X_train, y_train):model = LGBMRegressor(random_state=42)
model.fit(X_train, y_train)
yhat_train = model.predict(X_train)
mse_train = mean_squared_error(y_train, yhat_train)
r2_train = r2_score(y_train, yhat_train)
print(f'MSE on train set is {round(mse_train)}')
print(f'R2 on train set is {round(r2_train, 2)}')
return model, yhat_train

We train S-Learners for each product.

np.random.seed(1234)model_1, yhat_train_1 = train_slearner(X_train_1, y_train_1)
model_2, yhat_train_2 = train_slearner(X_train_2, y_train_2)
model_3, yhat_train_3 = train_slearner(X_train_3, y_train_3)

At the moment this is just a predictive model. Below we visualize how well the model performs at this task.

Extraction of treatment effects

We then use S-learner to extract treatment effects for the full range of treatment values (discount amounts) while holding other features at their mean values.

First, extract the expected outcomes (number of orders) for the full range of treatment values.

def extract_treated_effect(n, X_train, model):# Set features to mean value
X_mean_mapping = {'X1': [X_train[:, 0].mean()] * n,
'X2': [X_train[:, 1].mean()] * n,
'X3': [X_train[:, 2].mean()] * n,
'X4': [X_train[:, 3].mean()] * n}
# Create DataFrame
df_scoring = pd.DataFrame(X_mean_mapping)
# Add full range of treatment values
df_scoring['T'] = X_train[:, 4].reshape(-1, 1)
# Calculate outcome prediction for treated
treated = model.predict(df_scoring)
return treated, df_scoring

Do this for each product:

treated_1, df_scoring_1 = extract_treated_effect(n, X_train_1, model_1)
treated_2, df_scoring_2 = extract_treated_effect(n, X_train_2, model_2)
treated_3, df_scoring_3 = extract_treated_effect(n, X_train_3, model_3)

Next, extract the expected result (number of orders) if you set the treatment to 0.

def extract_untreated_effect(n, X_train, model):# Set features to mean value
X_mean_mapping = {'X1': [X_train[:, 0].mean()] * n,
'X2': [X_train[:, 1].mean()] * n,
'X3': [X_train[:, 2].mean()] * n,
'X4': [X_train[:, 3].mean()] * n,
'T': [0] * n}
# Create DataFrame
df_scoring = pd.DataFrame(X_mean_mapping)
# Add full range of treatment values
df_scoring
# Calculate outcome prediction for treated
untreated = model.predict(df_scoring)
return untreated

Again, do this for each product.

untreated_1 = extract_untreated_effect(n, X_train_1, model_1)
untreated_2 = extract_untreated_effect(n, X_train_2, model_2)
untreated_3 = extract_untreated_effect(n, X_train_3, model_3)

We can now calculate the treatment effect for the full range of treatment values.

treatment_effect_1 = treated_1 - untreated_1
treatment_effect_2 = treated_2 - untreated_2
treatment_effect_3 = treated_3 - untreated_3

Comparing this to the actual treatment effects saved from the data generator, we can see that S-Learner is very effective at estimating treatment effects for the full range of treatment values.

Now that we have treatment efficacy data, we can use it to create response curves for each product.

Michaelis Manton

To create the response curve we need a curve fitting tool, SciPy has a good implementation of this.

scipy.optimize.curve_fit – SciPy v1.13.0 Manual

scipy.optimize.curve_fit ( f , xdata , ydata , , , , bounds = (-inf, inf) , , , * , , , ** kwargs ) Towards Data Science use…

First, set the function you want to learn.

def michaelis_menten(x, alpha, lam):
return alpha * x / (lam + x)

Then we use curve_fit to learn the alpha and lambda parameters.

def response_curves(treatment_effect, df_scoring):maxfev = 100000
lam_initial_estimate = 0.001
alpha_initial_estimate = max(treatment_effect)
initial_guess = [alpha_initial_estimate, lam_initial_estimate]
popt, pcov = curve_fit(michaelis_menten, df_scoring['T'], treatment_effect, p0=initial_guess, maxfev=maxfev)
return popt, pcov

Do this for each product:

popt_1, pcov_1 = response_curves(treatment_effect_1, df_scoring_1)
popt_2, pcov_2 = response_curves(treatment_effect_2, df_scoring_2)
popt_3, pcov_3 = response_curves(treatment_effect_3, df_scoring_3)

We can now input the learned parameters into a Michaelis-Menten function and visualize how well the curve fitting was done.

treatment_effect_curve_1 = michaelis_menten(df_scoring_1['T'], popt_1[0], popt_1[1])
treatment_effect_curve_2 = michaelis_menten(df_scoring_2['T'], popt_2[0], popt_2[1])
treatment_effect_curve_3 = michaelis_menten(df_scoring_3['T'], popt_3[0], popt_3[1])

We can see that the curve fitting worked well.

Now that we know the alpha and lambda parameters for each product, we can start thinking about nonlinear optimization.

Nonlinear Programming

First, collate and set up all the information required for optimization.

Full Product List
Total Promotion Budget
Budget range for each product
Parameters for each product from the Michaelis-Menten response curve

# List of products
products = ["product_1", "product_2", "product_3"]# Set total budget to be the sum of the mean of each product reduced by 20%
total_budget = (df_scoring_1['T'].mean() + df_scoring_2['T'].mean() + df_scoring_3['T'].mean()) * 0.80
# Dictionary with min and max bounds for each product - set as +/-20% of max/min discount
budget_ranges = {"product_1": [df_scoring_1['T'].min() * 0.80, df_scoring_1['T'].max() * 1.2], 
"product_2": [df_scoring_2['T'].min() * 0.80, df_scoring_2['T'].max() * 1.2], 
"product_3": [df_scoring_3['T'].min() * 0.80, df_scoring_3['T'].max() * 1.2]}
# Dictionary with response curve parameters
parameters = {"product_1": [popt_1[0], popt_1[1]], 
"product_2": [popt_2[0], popt_2[1]], 
"product_3": [popt_3[0], popt_3[1]]}

Next we set up our objective function: we want to maximize the number of orders, but because we are using a minimization method, it returns the negative of the total expected number of orders.

def objective_function(x, products, parameters):sum_orders = 0.0
# Unpack parameters for each product and calculate expected orders
for product, budget in zip(products, x, strict=False):
L, k = parameters[product]
sum_orders += michaelis_menten(budget, L, k)
return -1 * sum_orders

Finally, we run an optimization to determine the optimal budget to allocate for each product.

# Set initial guess by equally sharing out the total budget
initial_guess = [total_budget // len(products)] * len(products)# Set the lower and upper bounds for each product
bounds = [budget_ranges[product] for product in products]
# Set the equality constraint - constraining the total budget
constraints = {"type": "eq", "fun": lambda x: np.sum(x) - total_budget}
# Run optimisation
result = minimize(
lambda x: objective_function(x, products, parameters),
initial_guess,
method="SLSQP",
bounds=bounds,
constraints=constraints,
options={'disp': True, 'maxiter': 1000, 'ftol': 1e-9},
)
# Extract results
optimal_treatment = {product: budget for product, budget in zip(products, result.x, strict=False)}
print(f'Optimal promo budget allocations: {optimal_treatment}')
print(f'Optimal orders: {round(result.fun * -1, 2)}')

The output shows the optimal promotion budget for each product.

A closer look at the response curves gives us some intuition for the optimization results.

Product 1 budget decreased slightly
Drastically reduce the budget for product 2
Significantly increase the budget for product 3

Today, I have described a powerful combination of S-Learner, Michaelis-Menton equations, and nonlinear programming. I will conclude with some thoughts.

As mentioned above, when using S-Learner, be careful of regularization bias.
I have chosen to use the Micaelis-Menton equation to create the response curve, but this may not fit your problem and can be replaced with other transformations that are more appropriate.
Using SLSQP to solve nonlinear programming problems allows the flexibility of using both equality and inequality constraints.
I chose to focus on pricing and promotions, but this framework can be extended to marketing budgets as well.

Source link