Workshop #2: Sales Analysis

This workshop is intended for Master students willing to learn more about Supply Chain Optimization. This second workshop will be tackling the topic of sales and customers. Starting with a generic overview of usual sales KPIs (key performance indicators) to get a grasp of the company’s activities, we will then dive into the identification of influencing factors to challenge our first naive analysis. Finally, the use of supply chain quantitative tools will be carried out to push the sales analysis even further. The workshop should take between 5 to 10 hours.

Engineer analyzing complex computer system

Table of contents


Company overview

CM is a fictitious European company selling outdoor clothing with retail stores in three countries: Germany, France, and Italy. CM also has a strong online presence. The company’s website offers customers the ability to purchase their products online, with home delivery available for added convenience. In addition to their website, CM also sells their products through various online platforms, such as Amazon and other e-commerce marketplaces. This allows CM to reach a wider audience and connect with customers who prefer to shop online.

The company was founded by a group of friends who were passionate about the outdoors and wanted to make quality outdoor clothing and sporting equipment more accessible to everyone.

The first store opened in Germany, and it was an instant success. The store was filled with high-quality outdoor clothing and equipment that was both stylish and functional. The founders of CM believe that people should be able to enjoy the outdoors without sacrificing style or comfort.

CM quickly expanded to other European countries, opening stores in France, and Italy. Each store has a unique style and atmosphere that reflects the local culture and environment. The stores are more than just retail spaces; they are community hubs where outdoor enthusiasts can come together to share their passion for adventure.

Customers and Sales channels

Now that CM has made their way into the sportswear selling industry, the founders would like to carry out a more in-depth analysis of their sales to identify bottlenecks and areas of improvement. Notably, CM’s popularity and turnover has been gradually increasing every year since its creation, but the founders think that it is time to expand the company’s market share.

CM’s customers are amateur and intermediate level athletes. The usual average order value is thus moderate; however, they sometimes experience periods of higher sales (during promotional events, etc.). Following the struggles CM encountered with their suppliers due to delays in deliveries, the company is periodically experiencing stock outs on various SKUs.

To attract a wider range of customers and reach their new quarterly targets, CM’s owners are looking into analyzing their current sales data to sort out their catalog, assess the profitability of their products and start improving their most critical products.

Although CM’s sales activity started in their retail store only, a substantial part of CM’s sales is now made through their online sales channels (Amazon, store website, other platforms). CM would like to have a better understanding of their sales breakdown by channel to be able to put the necessary resources into developing their online presence.

Objectives of the workshop

The objective of this session is to pinpoint and understand the influence of various factors (economical, logistics…) on CM’s sales:

The workshop is structured in 3 parts:

  1. We start with a general overview of the CM’s sales activities.
  2. Then, we challenge this first approach and refine our analysis with new KPIs and influencing factors.
  3. We ultimately introduce quantitative supply-chain notions to overcome the limitations of average analysis.

Useful definitions :

Additional information :

The cost of a unit in stock for CM is not limited to its buy price: other fees like transportation and stocking costs are to be taken into account. The annual stocking cost is 22% of the BuyPrice per unit for a given item. The transportation cost is 0.05€ per unit from supplier to Paris and 0.1€ by unit from supplier to both Berlin and Milan.


The dataset provided for this workshop consists from 5 tables (see the data schema below). They are preloaded in the Envision playground (see lines of code #3 to #71).

Data schema

In this schema, blue, green and lilac arrows are indicating the relationships between the tables, specifically the foreign key to primary key relationships. These relationships are essential for joining tables in a relational database to query data across different entities.

For example, the Sku table, which stores information about stock keeping units, is a central table with relationships to the StockOuts and Orders tables through their foreign keys, as indicated by the blue arrows. This implies that the .Sku field in these tables references the primary key in the Sku table, allowing for a relational link between the data in these different tables.

The arrow lines start from the foreign key side with the arrowheads pointing towards the primary key side, which is a typical way to denote the direction of the relationship from the foreign key in one table to the primary key in another.

1. Suppliers Table (Suppliers.tsv)

2. Items Table (Catalog.tsv)

3. SKU Table (SKU.tsv)

4. Orders Table (Orders.tsv.gz)

5. StockOuts Table (StockOuts.tsv.gz)

Part 1: Customers Overview – Sales Basic Analysis


a. Orders of Magnitude

  1. Display the following information in a summary tile: First Order Date, Last Order Date, Total Sold Quantity, Total Net Amount Sold.

b. Sales History

  1. Display historical sales order quantities and net amounts at weekly granularity in a linechart.

Hint: necessary aggregation can be done using group by together with monday or week functions, but also by performing straight weekly aggregation.

  1. What is the best period of sales for the company within a year?

c. Best Sellers

Overview by Net Amount

  1. Display the following information in a table tile for the 10 best-selling items (in net amount) at item level: Id, Total Number of Orders, Total Quantity, Total Net Amount, Net Amount for year 2022, Net amount for year 2021, Category, Average Sell Price.

Hint: order the table by Total Net Amount in descending order, then have a look at the documentation on distinct, avg and rank().

  1. Which item is the best-seller for the company and what category does it belong to?

Overview by Quantity

  1. Display the following information in a table tile for the 10 best-selling items (in quantity) at item level: Id, Total Number of Orders, Total Quantity, Total Net Amount, Category, Average Sell Price. Order the table by Total Quantity in descending order.

  2. What can be observed about Ids 13751180 and 15063964 in the two previous tables?

d. Best Customers

  1. Display the following information in a table tile for the top 10 customers (by quantity purchased): total number of orders, total quantity, total net amount, net amount for year 2022, net amount for year 2021. Order the table by total quantity in descending order.

Hint: create Clients table to carry your calculations at aggregated customer level. See line #119 where the table creation statement is provided and understand how by statement works. Continue writing the code from there. You might need to check documentation for aggregators same, distinct, sum, year function and rank() scan process.

  1. Looking at the different values in the previous table tile, is there an indicator that seems more relevant to establish a list of the best customers?

e. Sales by Channel

  1. Display the total quantity sold by sales channel in 2021 and 2022 in a barchart.

Hint: you may create a table SalesChannels with one row per sales channel in the same way the Clients table was created. An aggregator single might be useful as well.

  1. Display on a linechart tile the quantity sold by month in 2021 and 2022 for each sales channel.

  2. Display a table tile containing the following information at sales channel level: channel, average amount of an order, average quantity of products in an order (rounded to the closest integer). Consider only orders in 2022.

  3. Display a summary tile with the evolution of sales amount (%) for each channel between 2021 and 2022.

  4. CM’s marketing team would like to conduct a promotional campaign for their best sellers on one of their sales channels. Which channel do you deem would be the best candidate and why?



CM sales are on a steady positive trend which legitimates the constant increase of their purchases observed in Workshop #1. Be it in terms of sales quantities or sales net amount, this rising trend appears in all categories and all sales channels.

Sales profiles

Most of CM’s sales (in quantities) rely on 10% of their catalog. CM’s most critical category of references is “Footwear”. Among CM’s best sellers however, “Jackets” and “Climbing” categories references are proportionately represented. A focus on a single category to analyze CM’s sales is thus to be excluded.

References sales can be split into two types of profiles: high quantities and high net amounts. Therefore, products called “critical” cannot be considered based on the same KPIs. Likewise, customers buying large quantities of references might not generate the highest turnover for CM. It is important to conduct more in-depth analysis, considering margins and revenue, to introduce profitability indicators.

Sales channels

The sales channels with the highest activity (in terms of quantities sold) are CM’s retail store and website. Customers tend to buy more products on those channels. Even though the other platform channel has the lowest activity, its trend is steeper than Amazon sales trend, and closest to CM’s best sales channels. A marketing campaign to boost sales and general activity on Amazon could be an option to consider.

Hint: To check yourself here is an example of a dashboard that you might built. The exact layout of the dashboard and colors might be different depending on your choices.

Part 1 dashboard

Part 2: Identifying Risks Factors

a. Margins Analysis

  1. Calculate the FullBuyPrice of each SKU in the SKU table, using the information provided about SKUs cost. Consider that FullBuyPrice = BuyPrice + 1year stocking cost + transportation cost.

  2. For the 10 SKUs (which have been sold at least once) with the lowest revenue, display a table containing the following information: SKU, BuyPrice, FullBuyPrice, SellPrice, Revenue and total quantity sold at SKU level. Order the table from lowest to highest revenue.

  3. Display on a linechart tile the revenue by month in 2021 and 2022 for each Sales Channel. Which channel is the most profitable?

b. ABC Analysis on economic values (Pareto)

  1. In a summary tile, display the three following information: percentage of references between class A (<= 70% of the overall Sales Margin), class B (> 70% and <= 90%) and class C (> 90%).

  2. Display in a plot tile the percentage of references sold by percentage of sales margins.

  3. For references with at least one unit sold ever, display in a summary tile: Quantity of References, Highest Total Sales Margin, Lowest Total Sales Margin, Highest Quantity Sold, Lowest Quantity Sold.

  4. For a reference 13751180, display in a summary tile: Total Sales Margin, Total Sold Quantity, Rank by Sales Margin, Rank by Quantity Sold (rank 1 being attributed to the reference with the highest value). What is noticeable about this reference? What is the use of keeping such a reference in the catalog?

c. Stock-Out Periods

  1. Create table SKUsDay that contains one row for each SKU for each day. Use cross operation for SKU and Day tables. Create a column QtySold containing the aggregated quantity sold each day for each SKU.

  2. In a linechart at day granularity, display the sales history for a SKU 10095235/Berlin for 2021 and 2022. Prolonged periods where no sales happen can be observed: what could be the cause of such periods?

  3. Using table StockOuts which contains one row for each day when a SKU has been in stock-out, add a light red background to the two previous linecharts to highlight stock out periods.

Hint: use the column StockOutAsNumber defined in the script, along with stylecode properties seriesType, color and seriesOpacity. Can you confirm or overturn your hypothesis about the cause of zero sales periods for those SKUs?

d. Data Inconsistencies

Duplicate rows

  1. In the table Orders for rows having the same OrderId, SKU, OrderDate and DeliveryQty, display the following information in a table tile: OrderId, SKU, Date, Localisation, Sales Channel, DeliveryQty, LokadNetAMount. What do these rows have in common and what can you infer from these common points?

  2. Display the fraction of duplicated orders sold quantities compared to the overall sold quantity.

Negative margins

  1. For Orders table rows with a negative margin, display a table tile containing the following information: SKU, Date, DeliveryQty, LokadNetAmount, FullBuyPrice, Margin. Order the table by SKU.

  2. Display the list of all SKUs that have at least 30% of their sales with negative margins, along with their total number of sales and their number of sales with negative margin.

  3. What kind of scenario can justify a SKU with negative margins sales?


Margin & Revenue

High quantity sold is not synonymous with high profitability. To better evaluate the contribution of a SKU to CM’s activities, economic indicators such as the margin and the revenue are most appropriate. However, those indicators are not sufficient to completely grasp the behavior and criticality of a SKU.

Relevance of non-economic indicators

We saw that relying solely on quantity sold is not a good factor to model sales activity. Nevertheless, some SKUs can be sold in remarkably high quantities but with low, even negative margins. Setting aside the hypothesis of errors in the data, there are reasons to keep proposing such SKUs in a catalog. Indeed, those SKUs are necessary to initially attract the customers and showcase the rest of the catalog, potentially getting clients to purchase more profitable SKUs. Thus, the sole margin/revenue indicator is not sufficient either: mixing economic and quantitative KPIs provides a more global view of sales.

Stock outs

Revenue and quantity sold cannot be relied on as such to model a reference’s sales: those values must be put into perspective with SKUs’ availability in stock. Lower sales during a given period are not necessarily the consequence of lower activity or lower customers’ interest: it might be a result of poorly managed supply chains or suppliers’ exceptional unavailability. It is consequently of major importance to acknowledge stock out periods when analyzing sales data.

Data inconsistencies

Besides process and management issues, data issues are also responsible for biases in sales KPIs and must be investigated. This actually constitutes an important part of the work of any data analyst: data is bound to contain errors and should not be deemed 100% accurate.

Hint: To check yourself here is an example of a dashboard that you might built. The exact layout of the dashboard and colors might be different depending on your choices.

Part 2 dashboard 1

Part 2 dashboard 2

Part 3: Limitations and In-Depth Observations

a. Average Sales Analysis

  1. In a table tile, display the following information for SKUs 14901389/Milan and 10592438/Berlin: SKU, average monthly quantity sold, overall standard deviation.

Hint: create the SkuMonth table to aggregate sales calculations at a monthly level. See line #121 where the table creation statement is provided and understand how cross keyword works. Continue writing the code from there. Functions avg and stdev might be useful.

  1. Display the following information in a linechart at a monthly level for both SKUs: quantity sold, and overall average quantity sold by month.

  2. Looking at the tiles you have displayed in the two previous steps, explain why the sole use of the average quantity sold per SKU is not enough to infer their sales profile.

b. Exceptional events

  1. Display in a linechart the historical weekly sales and the overall average weekly quantity sold for SKU 14404386/Milan.

Hint: create SkuWeek table in a same manner as you created SkuMonth table.

  1. Can you identify two periods with different sales profiles? What could be the cause of this behavior?

c. Case study

  1. Sales of SKUs 14262733/Paris and 14113811/Paris are correlated: using Items, Orders and StockOuts data, give an explanation of their sales behavior.

  2. Same question for SKUs 14245992/Berlin and 29586777/Berlin. Display also a correlation coefficient between these 2 SKUs.

Hint: use table SkusDay to calculate the correlation coefficient (there is no built-in function for this calculation).

Hint: To check yourself here is an example of a dashboard that you might built. The exact layout of the dashboard and colors might be different depending on your choices.

Part 3 dashboard 1

Part 3 dashboard 2

Conclusion of the Sales Analysis

Sales Profiles

Two products with the same volume of sales can have quite different sales profiles. Thus, product analysis cannot be conducted on the sole basis of their average sales: notions of seasonality (sales pattern depending on the period of the year), trend and exceptional events need to be introduced to define a forecasted demand for all products.

Correlation between references

Depending on the company’s activities, different products’ sales can be dependent. It is the case for products that are sold together (for instance, a shirt and its plastic bag) or new products that are replacing older products. In the case of the replacement of a SKU, if the newest SKU is expected to have the same sales behavior as the oldest one (same sell price, function, color…), the old SKU’s sales history can be used as a “virtual” sales history for the new SKU, thus increasing the number of historical data points to get a more accurate demand forecast.

Necessity of quantitative sales analysis

The main conclusion of the analysis led in this workshop is that a product’s sales behavior cannot be defined by its average demand only. Even though this information is useful, it must be coupled with other quantitative KPIs which can affect sales at separate times of a product’s lifetime. Such variations in a product’s sales should be anticipated to create a demand forecast considering all potential future events.


Get the dataset from the Envision playground

/// ## 0.1 Reading data tables
/// The list of suppliers.
read "/Suppliers.tsv" as Suppliers[Supplier] with
  /// Primary key, identifies each supplier.
  Supplier : text
  /// The delay from the order date to the delivery date, in days.
  Leadtime : number

/// The list of items, purchased and sold.
read "/Catalog.tsv" as Items[Ref] expect [Supplier] with
  /// The primary key, identifies each item.
  Ref : text
  /// Foreign key to `Suppliers`.
  Supplier : text
  /// Unit price to buy 1 unit from the supplier.
  BuyPrice : number
  /// Unit price to sell 1 unit to to a client.
  SellPrice : number
  Category : text
  Brand : text

///The list of SKUS, purchased and sold.
read "/SKU.tsv" as SKU[SKU] expect[Ref,Supplier] with
  /// The primary key, identifies each sku
  Sku : text
  /// The place where the item is stocked
  Loc : text
  /// Foreign key to `Items`.
  Ref : text
  Category : text
  SubCategory : text
  Brand : text
  Supplier : text
  ///The stock available on the shelves (in units)
  StockOnHand : number
  ///The stock currently in transfer from suppliers (in units)
  StockOnOrder : number
  ///The sales orders that need to be fulfilled (in units)
  BackOrder : number
  /// Unit price to buy 1 unit from the supplier.
  BuyPrice : number
  /// Unit price to sell 1 unit to to a client.
  SellPrice : number

///The list of past transactions, sold to customer
read "/Orders.tsv.gz" as Orders expect[SKU,Ref,Date] with
 // Foreign key to `SKU`.
  Sku : text
///The date when the sales orders happened  
  "OrderDate" as Date : date
 // Foreign key to `Items`.
  Ref : text
  Loc : text
///The quantity delivered to the customer
  DeliveryQty : number
  LokadNetAmount : number
///The identifier of the sales transaction
  OrderId : text
///The identifier of the customer
  Client : text
///The channel used by the customer for the sales transaction
  SalesChannel : text

///The history of stock-outs which means that item was missing in a given location
read "/StockOuts.tsv.gz" as StockOuts expect[SKU, date] with
 // Foreign key to `SKU`.
  Sku : text
///The date when the stock was missing
  date : date
///If true, it means that stock was indeed missing
  StockOut : boolean

/// ## 0.2 Introduction
show label "Sales Analysis" a1g1 {textAlign: center ; textBold: "true"}
show markdown "Generic Documentation" a2g2 with """
  Hi there, welcome to your second Envision exercise! Here, you will do a sales analysis.
  You'll start by a basic analysis and draw first conclusions and limits of a
  classic approach. You'll then discover some advanced concepts of the Quantitative Supply Chain!
/// ## 0.3 Displaying raw Data tables

show table "Suppliers" a3b3 with
  Suppliers.Leadtime as "Leadtime"
show table "Items" c3d3 with
  Items.Supplier as "Supplier"
  Items.BuyPrice as "BuyPrice"
  Items.SellPrice as "SellPrice"
  Items.Category as "Category"
  Items.Brand as "Brand"
show table "SKU" e3f3 with
  SKU.Supplier as "Supplier"
  SKU.Ref as "Ref"
  SKU.Loc as "Loc"
  SKU.Category as "Category"
  SKU.SubCategory as "SubCategory"
  SKU.Brand as "Brand"
  SKU.StockOnHand as "StockOnHand"
  SKU.StockOnOrder as "StockOnOrder"
  SKU.BackOrder as "BackOrder"
  SKU.BuyPrice as "BuyPrice"
  SKU.SellPrice as "SellPrice"
show table "Orders" a4b4 with
  Orders.date as "date"
  Orders.Ref as "Ref"
  Orders.SKU as "SKU"
  Orders.Loc as "Loc"
  Orders.DeliveryQty as "DeliveryQty"
  Orders.LokadNetAmount as "LokadNetAmount"
  Orders.OrderId as "OrderId"
  Orders.Client as "Client"
  Orders.SalesChannel as "SalesChannel"
show table "StockOuts" c4d4 with
  StockOuts.date as "date"
  StockOuts.SKU as "SKU"
  StockOuts.StockOut as "StockOut"

////0.4 ---- Coding Tips///

table Clients = by Orders.Client
table SkuMonth = cross(SKU,Month)

/// ## 0.5 Documentation
/// The public Envision documentation is available here: https://docs.lokad.com/"

User Contributed Notes
0 notes + add a note