A/B testing for pricing
There is no optimization without measurement, and the optimization of the pricing does require both experiments and measurements. Indeed, too frequently, pricing is only about price changes without those changes being leveraged to provide a more in-depth knowledge about the way the market responds to new prices. Crafting pricing strategies should be a knowledge-driven process, where most changes are analyzed a posteriori in order to fine tune the existing set of strategies. In this page, we outline an A/B testing protocol for pricing to be executed with the support of Lokad. While the discussing takes advantage of the tooling at hand, it must be noted that the protocol itself is not specific of Lokad and could be replicated with other tools, even with Excel if sheets are maintained with great care.
- 1. Create a reference document
- 2. Create a dedicated project
- 3. Define the test hypothesis
- 4. Positive and negative control groups
- 5. Saving the control groups
- 6. Define the new pricing script
- 7. Price publication and observation
1. Create a reference document
If you are capable of performing one pricing experiment, then soon you will have performed dozens of pricing experiments. Without a structured way to collect all this information, you will have a messy result and most of the potential insights to be gained will be lost. Thus, we suggest you start every pricing experiment with the creation of the reference document that will collect all the information related to this experiment. This document can be a Microsoft Word file, but nowadays, we strongly encourage you to use Web-native collaborative tools like Google Sites or hackpad, or any of the many Content Management Systems available online.
Then, once the document is created, you should put some effort in finding a memorable name for the experiment. Indeed, an experiment will be effective only if people can easily communicate about it and if the conclusions can be remembered by the pricing team and possibly their successors, as people are likely to come and go over time. Boredom is the enemy of knowledge. If the name is too boring to be remembered, then this piece of information will be dismissed even by the most dedicated employees.
2. Create a dedicated project
Within your Lokad, we suggest you create a project dedicated to the pricing experiment. For the sake of clarity, this project should be named after the memorable name chosen for the experiment. Additionally, we suggest you leverage the hyperlink behavior of the label
tile to directly link the reference document that is also available online. Consequently, the beginning of your Envision script will typically look like:
// Pricing Experiment: Sevilla
// Start: 2014-07-09 End: 2014-08-08
// Author: Joannes Vermorel
show label "http://example.org/my-reference-document Ref. Document"
With this script, the label
tile will appear in the dashboard as a hyperlink leading to the location of the reference document that details the pricing experiment itself.
This very project will serve multiple purposes through the different stages of the pricing experimentation. It will be used to:
- Create the sample control groups of items
- Persist those groups
- Define the revised pricing strategy
- Check for the correct deployment of the prices
- Compile the visualization of the two control groups
All these steps will be implemented through sections of a script for this project.
3. Define the test hypothesis
Whenever testing a pricing strategy, it is very tempting to re-explain the results afterward, regardless of whether the results are aligned with the initial expectations or not. The psychological bias falls into the narrative fallacy bias that has been best described by Taleb:
The narrative fallacy addresses our limited ability to look at sequences of facts without weaving an explanation into them, or, equivalently, forcing a logical link, an arrow of relationship upon them. Explanations bind facts together. They make them all the more easily remembered; they help them make more sense. Where this propensity can go wrong is when it increases our impression of understanding. —Nassim Nicholas Taleb, The Black Swan.
The problem is particularly acute regarding pricing, because there is no such thing as a controlled environment for market conditions. No matter how much attention is paid to the experimentation protocol, many factors remain irremediably outside our control, starting from the initiatives of our competitors.
Therefore, it is very important to define the hypothesis being tested at the beginning of the experiment, to ensure that we are confirming or invalidating this particular hypothesis, and not some ad-hoc hypothesis revised during the course of the experiment itself. You should write this hypothesis at the beginning of the reference document that we mentioned previously. A good hypothesis should look like: severe drops of sales volumes are not explained by a drop in demand, but by lesser visible competitors with aggressive prices. Hence, if we aggressively lower our margins on those products, sales volumes should ramp-up accordingly.
4. Positive and negative control groups
Once the hypothesis has been written down, it is time to design the experiment itself. It is typically not possible or impractical – sometimes even illegal - to try to display two distinct prices for the same item to distinct groups of customers. Therefore, a more practical approach in retail consists in selecting two equivalent groups of items that represent only a small fraction of the overall catalog.
These two samples are going to be named respectively:
- The positive control group where the new pricing strategy is applied.
- The negative control group when the old pricing strategy remains unchanged.
By comparing the outcome in the positive and negative control groups, you can then assess whether the initial hypothesis was correct. Envision makes it very straightforward to randomly select two control groups thanks to the hash function:
seed = "hello world"
R = rankd(hash(concat(Id, seed)))
where R <= 1000
// positive group here
where R > 1000 and R <= 2000
// negative group here
In the above script, we are computing a random shuffle of all items in Line 2. The function hash produces a pseudo-random number for the text value passed as input. This is not true randomness because the same text value yields the same hashed value every time. Hence, in order to produce distinct samples, one should change the text passed as the seed. Then in Line 3, we select 1,000 items as the positive control group. In Line 5, we do the same for the 1,000 items kept as the negative control group.
This logic can easily be adjusted to situations where we want to define two groups that both fit certain conditions. For example, let us assume that we want to establish control groups within the set of items that belong to the brand Fabrikam. This could be done with:
where Brand == "Fabrikam" // scoping only the brand 'Fabrikam'
seed = "hello world"
R = rankd(hash(concat(Id, seed)))
where R <= 50
// positive group here
where R > 50 and R <= 100
// negative group here
5. Saving the control groups
When complex conditions are used for the selection of the control groups, there is a risk that the same sampling logic will not return the same list of items over time. For example, if new items are introduced, they will influence the sampling logic as introduced in the previous section. Hence, we advise you to save the control groups in a separate file so that they can inadvertently change over time.
seed = "hello world"
R = rankd(hash(concat(Id, seed)))
Control = ""
where R <= 1000
Control = "pos" // positive group
where R > 1000 and R <= 2000
Control = "neg" // negative group
// exporting the group
startDate = "2014-07-09"
endDate = "2014-08-09"
where Control != ""
show table "sample" write:"/exp/g-sevilla.tsv" with
Id
startDate as "Date"
endDate
Control
Price
In the script above, once the groups are established, we save the results to a file named `g-sevilla.tsv`. We will show in the following section how the prefix `g` (or any alternative prefix) can be very handy to avoid overlaps between the samples. `sevilla` is just an example of a memorable name for the experiment. This file is already formatted as an event stream file that you will be able to re-load in Lokad later on. You should run this logic once, and then comment out the file export lines in order to avoid overwriting a previously existing file. Then, as a safety measure, download this file and save a copy of it attached to your reference document.
Later on, in order to avoid defining groups that overlap with another ongoing experiment, you can reload the file that has been saved and use this information to exclude the relevant items. This could be done with:
read "/exp/g-*" as Experiments expect [Date] with // loading all experiments so far
Date: date
EndDate: date
today = Date(2014,7,9)
IsPartOfGroup = false
where Experiments.EndDate < today // we exclude all ongoing experiments
// look for the existence of a matching group
IsPartOfGroup = any(Experiments.Date)
where not IsPartOfGroup
seed = "hello world"
R = rankd(hash(concat(Id, seed)))
Control = ""
where R <= 1000
// positive group here
where R > 1000 and R <= 2000
// negative group here
6. Define the new pricing script
In the previous sections, we have covered multiple pricing strategies. At this point of the experiment protocol, it is time to implement the revised pricing script. The details of the pricing logic itself is beyond the scope of the present section, however you should know that prices can typically be exported using a simple table
tile:
read "/exp/g-sevilla.tsv" as Scope with
Control: text
// Original logic to generate the control groups is commented out.
// We reload the control groups directly from the persisted copy.
Control = last(Scope.Control) default ""
where Control != "" // excluding items not part of the scope
// snipped here: actual pricing logic
show table "sample" write:"/exp/p-sevilla.tsv" with Id, today() as Date, Price
In this script, we start by loading the persisted copy of the control groups. Then, we redefine a scope that is narrowed down to those control groups only. Finally, we produce a file export of the new prices.
In practice, it might not be as simple to persist the revised prices as compared to persisting the control groups themselves. Indeed, prices keep changing according to their respective underlying strategies for the duration of the experiment.
7. Price publication and observation
Once the scripts are in place to produce the revised prices, those prices should be published across the various sales channels. Whenever possible, you should emphasize relatively automated price publication processes. In particular, Lokad offers a REST API that supports the automated trigger of projects. Upon completion of the script execution, the output files can be retrieved by FTP or FTPS from Lokad to be imported into the production systems.
In order to be confident about the sales observations made after revising the prices, it is important to ensure that the prices available on the sales channels are indeed those that have been produced by the Envision scripts. We routinely observe that faulty or partial price imports happen, leading to a discrepancy between the prices actually published and the prices calculated by the pricing strategy.
Ideally, the published historical prices should be fed back to Lokad as inputs. When such a data loop exists, it becomes possible to compare the published prices with the prices calculated within Lokad and to ensure that the values are aligned. In the script below, we load the historical prices retrieved from production, then the scope of the experiment, followed by the prices as originally calculated.
// ... snipped items and other data files ...
read "/prices.tsv" as Prices // from the production systems
read "/exp/g-sevilla.tsv" as Scope
read "/exp/p-sevilla.tsv" as ExpPrices
// The original logic to generate control groups is commented out.
// We reload the control groups directly from the persisted copy.
Control = last(Scope.Control) default ""
where Control != "" // excluding items not part of the scope
// relevant date for the publish date of the prices
when date <= date(2014,8,1)
where last(Prices.Price) != last(ExpPrices.Price)
show scalar "Item count with price mismatch" with count(Id)
The script outputs the number of price mismatches where the price retrieved from the production systems do not match the prices originally computed by the pricing scripts.
Finally, after a few days or a few weeks, depending on the applicable timeframe for the pricing experiment, the observations collected on the evolution of sales for the two groups can be compared in order to assess if the initial hypothesis was correct. Accumulating conclusive experiments is an asset that merchants can use to design increasingly efficient pricing strategies over time.