As a budding Data Scientist, my academic background taught me to honor accuracy as a sign of a successful project. The industry, on the other hand, cares about making and saving money in the short and long term. This article is a lesson on ROI — Return On Investment — the holy grail of business actions.
A large portion of promotional campaigns target segments of customers rather than the individual directly. Examples of these are Paid Search, Display Ads, Paid Social and so on. Direct-To-Consumer(D2C) campaigns, on the other hand, are aimed directly at individual customers. These are Direct Mail, Email, SMS or even push notifications. Businesses in the banking and fintech space are capable of running massive D2C campaigns because everyone has the app. But nowadays, these businesses are looking to be efficient in their promotional spends (how?).
With that background, let’s talk about a credit card issuer, Flex, who offers a free first year — that is, no annual fee. From the second year of use, it charges a full annual fee. Over the past 3 years, they observed a low yearly retention rate with only 30% of the holders continuing the card after their first year. Flex decides to experiment with renewal offers for select customers to continue to grow their customer base. The problem is — this strategy can be costly if we’re not careful.
As Data Scientists, we are tasked with preparing the smallest group of target customers for extending these offers from the list of 5 million customers who are up for renewal.
For many years data scientists were engaged in building response models to predict the likelihood that a customer would respond to a direct campaign. For newer businesses, this may work but as brands mature their questions evolve.
Problems that are not solved by response models are:
- How much more likely is a customer to respond if exposed to a campaign?
- How can we prioritize the customers who are at the risk of churn? Who are they?
- Are there customers who might respond negatively to promotional messages? Who are they?
- How can we reduce the target customers in the campaign without affecting the incremental revenue?
Enter uplift modeling. It is a machine learning technique that predicts the incremental impact of a treatment on an individual’s purchasing behavior, rather than just the likelihood of the behavior. This way, you can target the customers who are most likely to be influenced by your campaign and avoid wasting resources on those who are not. This boosts the campaign’s return on investment and customer satisfaction.
You may have seen this classification of customers before. The Sure things have a strong affinity for your brand or product and would make a purchase anyhow. The Lost causes do not have a need for your product. The promotional campaign is unlikely to sway these two classes of customers. The Sleeping dogs are those that would have purchased if not bothered by promotion. It is the Persuadables who present the biggest opportunity — they would ONLY purchase if marketed. They lift the ROI of the campaign.
In this task, we have to first identify the Persuadables. Second, find the most suitable offer for each of them.
We have a dataset of 5 million customers who are at a tenure of 10 months, which means they have 2 months to renew. This is simulated customer data that you can create yourself with this Python code.
We have to do some EDA here and I have used the ydata-profiling (formerly called Pandas Profiling) tool to generate an interactive report.
We have 20 customer variables — both qualitative (like age, income tier) and quantitative (transactions, spend in categories). Some of the variables are quite highly correlated.
Flex has already run a pilot campaign on 50K customers with a message like the one below.
We are pleased to inform you that your credit card is eligible for renewal with a special offer. For a limited time, you can renew your credit card with a lowered annual fee of only $49, saving you up to 50% compared to the regular fee. This offer is exclusive to our loyal customers like you, who have been using our credit card for more than a year.
There were 3 offers based on how much the customers pay in second year — 30%, 50% or 70%. From the campaign, it was concluded that the treated segments had a 55% retention rate which is a 25% (55 minus 30) lift from the control group who paid the full annual fee. This is called the Average Treatment Effect (ATE).
We have the campaign results, and this data can be used to optimize the next campaign. To do this we have to calculate the Conditional Average Treatment Effect (CATE) for every customer — it is a fancy name for the effect at a customer level.
Note — A pilot campaign is a small-scale test of a promotional or marketing strategy before launching it on a larger scale. It allows marketers to evaluate the effectiveness, feasibility, and costs of the strategy, and to identify and resolve any issues or challenges. A pilot campaign can help to optimize the marketing plan, increase the return on investment, and reduce the risks of failure.
Propensity score matching (PSM) aims to match customers that have similar probabilities of receiving the treatment based on their observed characteristics. PSM can help in reducing the bias caused by confounding variables in observational studies, where random assignment of treatment is not possible. It involves estimating the propensity scores for each customer, which are the conditional probabilities of being treated given the covariates, and then matching treated and untreated customers with similar scores.
Since we have 3 different treatments in the pilot campaign, I will use PSM to approximate an identical control group for each treatment group. Example — a set of customers in control group (who paid full annual fee) that are similar to the customers who received the Annual Fee x 30% treatment. And similarly, for the groups Annual Fee x 50% and Annual Fee x 70%. This would eliminate any confounding variables in an experimental setup we would be able to identify the true lift for each treatment group.
Typically, propensity scores are calculated using simple logistic regression models. I would also recommend packages such as psmpy that do this well and also handle the class imbalance for you.
After propensity score matching, we have 3 pairs of datasets —
I have used these pairs to build 3 models, one for each treatment group, using the X-learner algorithm in CausalML library. The SHAP values can be used to check which features are linked to uplift.
We construct 3 Qini curves, where we see the cumulative uplift from adding customers into target starting from highest to lowest CATE. It is similar to a ROC curve in traditional machine learning. The lower line is the uplift from random assignment into treatment/control. Here we report the Area Under Uplift Curve or Qini Score— the higher the better.
As expected, the Annual Fee x 30% treatment has the highest Qini score. Now the models are ready, and we can apply them on new data.
We move on to the 5 million customers who are up for renewal. We have the option of offering them Annual Fee x 30%, Annual Fee x 50% or Annual Fee x 70%. Or we don’t offer them anything — Full Annual Fee. With the three X-learners, I predict the CATE from each of them. The treatment with the maximum CATE will be the best treatment. If all treatments have a similar CATE (within +-10% of each other), then we pick the Annual Fee x 70% treatment (of course, we want higher revenue). If the maximum CATE is negative, then we don’t market to this customer (they’re a sleeping dog).
Here are our best assignments. About half a million customers are not recommended for the treatment.
In this type of representation (see below) we split customers into deciles based on CATE. Decile 1 has the highest CATE and decile 10 has the lowest. If we give all customers one single type of treatment, we can see the lower deciles falling below 0 earlier. Hence, we will stick to the best treatment for our next campaign.
The Qini curve tells us that we expect quite a bit of lift from running this campaign. There isn’t a clear cut-off or inflection point in the curve to separate out the Persuadables.
The average lift in next campaign is expected to be 0.052. The deciles which have an uplift above the average are the targetable customers. But, to be frugal in this campaign, we will take the top 20% only and call them Persuadables. The deciles with negative uplift are the Sleeping dogs. The rest are either Sure things or Lost causes.
It’s easier to visualize the Persuadables in this revamped Best Treatment plot. In this case, they are the top 5 deciles.
We cannot report uplift to business teams, so let’s convert this to Incremental ROI and Revenue scale. For decile d, the Incremental ROI is
Revenue is total amount of renewal fees from the decile. Campaign Cost is the portion of the renewal fees that Flex bears itself. We see that it is only profitable to offer discounts to first 7 deciles or top 70% of the customers.
The top 20%, or Persuadables, are expected to bring 80% of the total revenue from the renewal of these 5 million customers. This is often observed in business and they call it Pareto Principle. Such bar charts can be constructed for CLV (Customer Lifetime Value) as well to learn the long-term ROI of the campaign.
So to answer the question — who do we target? It is the Persuadables which are about 1 million customers. How do we personalize their offer? We use the best treatment with the highest Conditional Average Treatment Effect.
In this way Uplift Modeling identifies the customers who will bring the most incremental ROI to the campaign and targets them accordingly. By doing so, uplift modeling optimizes the campaign’s return on investment and reduces wasteful spending.