Champion / Challenger, it’s a Number’s Game
Automating decisions is mostly valuable when you can change the underlying decision logic as fast as your business changes. It might be due to regulatory changes, competitive pressure, or simply business opportunities. Changing rhymes with testing… It would be foolish to change a part of your business model without making sure that it is implemented correctly of course. However, testing is not always sufficient. It is needed obviously, but it has its limitations. How can you test your decision logic when many unknowns are out of your control? What we need in terms of testing is sometimes more akin to a race between different strategies. I will discuss a technique pioneered a few decades ago, and yet not widely adopted outside of a few niches. This technique is called Champion / Challenger.
Why Champion / Challenger
Have you ever experimented with Champion / Challenger? Or maybe you have heard of it as A/B testing… The main objective is to compare a given strategy (your champion) with one or more alternatives (the challengers). This has been used over and over again with website design. The objective could be about highlighting call-to-actions in different ways, or even changing drastically the wording on several pages alternatives, and measuring which version yields the best results. While it is a norm in web design, it is not as widely applied in decisioning. Why, may you ask? My hunch is that many companies are not comfortable with how to set it up. I have actually seen companies that used this technique, and still tainted their experiment with a careless setup. I would welcome comments from you all to see which other industries are making strides in Champion / Challenger experimentation.
Let me explain briefly the basic concept as it applies to decision management. Like web design, decision management experiments aim at comparing different alternatives in a live environment. The rationale is that testing and simulation in a sandbox can estimate the actual business performance of a decision (approving a credit line for example), but it cannot predict how people will react over time. Simulation would only tell you how many people in your historical sample would be accepted versus declines. You can approve a population segment, and then discover over time that this segment performs poorly because of high delinquency. Live experimentation allows you to make actual decisions and then measure over time the business performance of this sample.
How Champion / Challenger works
Technically, two or more decision services can be actually deployed in production. Since your system cannot approve and decline at the same time, you need the infrastructure to route transactions randomly to a strategy, and mark the transaction for monitoring. The keyword here is ‘randomly’. It is critical that your setup distributes transactions without any bias. That being said, it is common to exclude entire segments because of their strategic value (VIP customers for example), or because of regulations (to avoid adverse actions on the elderly for example, which could result in fines). Your setup will determine what volume of transactions will go to the champion strategy, let’s say 50%, and how many will go to the challengers, let’s say 25% for each of 2 challengers.
It becomes trickier to setup when you need to test multiple parts of your decisions. It is not my objective to describe this issue in details here. I might do that in a follow up post. I just want to raise the importance of experimentation integrity as a possible reason for the perceived complexity.
Once the strategies are deployed, you need to wait a week, a month, or whatever time period, before you can conclude that one of the strategies is ‘winning’, meaning that is outperforms the others. At that point in time, you can promote that strategy as the established champion, and possibly start a new experimentation.
It’s a Number’s Game
As you process transactions day in and day out, you will allocate a percentage to each strategy. In our earlier example, we have 50% going to champion and 25% going to challengers 1 & 2. In order for the performance indicator to be statistically relevant, you will need ‘enough data’. If your system processes a dozen transactions a day, it will take a long time before you have enough transactions going to each of the challengers. This becomes increasingly problematic if you want to test out even more challengers at once. And, on the other hand, systems that process millions of transactions per day will get results faster.
So, basically, you end up with 3 dimensions you can play with:
- Number of transactions per day
- Number of strategies to consider
- Amount of time you run the experiment
As long as the volumes along these 3 dimensions are sufficient, you will be able to learn from your experimentation.
Is that enough? Not quite. While you can learn from any experiment, you, the expert, is the one making sense of these numbers. If you run an experimentation in retail for the whole month of December, it is not clear that the best performing strategy is also applicable outside of the holidays. If your delinquency typically starts after 2 or 3 months of the account being open, a shorter experimentation will not give you this insight. While the concept of testing several strategies in parallel is fairly simple, it is a good idea to get expert advice on these parameters, and use your common sense on what needs to prove that a strategy is actually better than the alternatives. Once you are familiar with the technique, your business performance will soar. Champion / Challenger is a very powerful tool.