Cohort Analysis with Power BI and SQL

A simple guide to customer analysis with Cohort Analysis

Olubukunola Akinsola
6 min readJun 18, 2022

This is a step-by-step tutorial for creating a simple cohort analysis with SQL and Power BI

image credit here

Table Content

  • Introduction
  • What is Cohort Analysis?
  • Segment your customers into cohorts with SQL
  • Finding the rate of customers retention with Power BI
  • Visualizing your cohort analysis
  • Conclusion
  • Appendix

Prerequisite: Knowledge of SQL, Power BI (DAX)

Introduction

My first experience with Cohort Analysis was in 2021. Like some of my readers, I was also confused when I heard about the concept. Not to worry, by the end of this article, you’ll no longer be confused.

Cohort Analysis, like RFM analysis, is one of the many customer analysis methods.

Overview of Cohort Analysis?

Before I explain what cohort analysis is, let’s understand what a cohort is.

A cohort is a group of people who share a common characteristic over a certain period of time.

So cohort analysis is analyzing a cohort, yea? That’s pretty much it but maybe I’ll give a detailed definition.

Cohort Analysis is a subset of behavioral analytics in which large volumes of complex data are broken into related small groups to perform Analysis. These smaller groups or Cohorts tend to share common characteristics or experiences within a specific time span.

For this article, we will be putting our customers into Cohorts based on their first month of purchase and determine their retention rate after this first month of purchase.

fig 1: Monthly customer acquisition

Tools used: SQL, Power BI

Dataset: Superstore

Segment your customers into cohorts with SQL

For this analysis, I started by creating a Database using MySQL workbench. I had done this analysis with Power BI in the past but wanted to try something different, so I tried to replicate what I did in Power BI with SQL.

Let’s take a look at the tables.

Sales: this table has a list of all sales made in the year 2014–2017 of a superstore. For this analysis, we will only be working with 2017 data.

fig 2: Sales dataset

Customers: has a list of unique customer information

fig 3: customer dataset

The first step is to put the customers into different cohorts based on their acquisition month.

For this data, we do not have a data field that records the customer acquisition month. We will assume the first month of purchase as the acquisition month.

To do this, we get the minimum order date of each customer. The order date of each customer is recorded on the <Sales> table which will be joined to the <customers> table

SELECT 
b.CustomerID AS CustomerID,
SUM(a.Sales) AS Sales,
b.CustomerName AS CustomerName,
MIN(a.OrderDate) AS MinOrderDate
FROM
Sales AS a
INNER JOIN customers AS b
ON a.CustomerID = b.CustomerID
WHERE
YEAR(OrderDate) = 2017
GROUP BY CustomerName

This is what our query returns

fig 4: table with first order date of each customer

Now that we have the first date that each of our customers ordered, we need to expand our query to put our customers in their respective cohorts. We want this cohort to be recorded in the <Sales> table.

SELECT 
d.OrderID AS OrderID,
d.OrderDate AS OrderDate,
c.CustomerName AS CustomerName,
d.Sales AS SalesAmount,
c.CustomerID,
DATE_FORMAT(c.FirstOrderDate, '%M %Y') AS Cohort
FROM
Sales AS d
INNER JOIN
(
SELECT
b.CustomerID AS CustomerID,
SUM(a.Sales) AS Sales,
b.CustomerName AS CustomerName,
MIN(a.OrderDate) AS FirstOrderDate

FROM
Sales AS a
INNER JOIN
customers AS b
ON
a.CustomerID = b.CustomerID
WHERE
YEAR(OrderDate) = 2017
GROUP BY CustomerName
) AS c
ON d.CustomerID = c.CustomerID
WHERE
YEAR(d.OrderDate) = 2017
ORDER BY
OrderDate ASC;

Importing to Power BI

I imported my data using the Advanced option in the MySQL Get data window. I copied my SQL query from my MySQL Workbench into this quey box and clicked OK

fig 5: Power BI import dialogue box

Here’s a snippet of how the table looks in Power BI

fig 6: the cohort table in Power BI

Some Transformations:

  • I named by table <Orders>
  • Changed [OrderDate] from Datetime to Date format
  • Changed [SalesAmount] from Decimal to Fixed Decimal

The Future Months Table

The <Future months> table is a table that has a list of months number to denote the months after the customer’s first purchase. This table was created in Power query,

Month 0: is the initial purchase month

Month 1–12: is the first-twelfth months after the initial purchase.

fig 7: Future months table

Finding the rate of customers retention with Power BI

First, I want to get the number of customers retained each month after the initial month of purchase and the retention rate of the customers.

CustomerRetention =
VAR CurrentMonthAfter =
SELECTEDVALUE ( 'Future Months'[Value] )
VAR CurrentFirstOrderMonth =
SELECTEDVALUE ( 'Orders'[Cohort] )
RETURN
CALCULATE (
DISTINCTCOUNT ( Orders[CustomerName] ),
FILTER (
'Orders',
EOMONTH ( 'Orders'[OrderDate], 0 )
= EOMONTH ( CurrentFirstOrderMonth, CurrentMonthAfter )
)
)

Customer Retention %

CustomerRetention% =
VAR CurrentMonthAfter =
SELECTEDVALUE ( 'Future Months'[Value] )
VAR CurrentFirstOrderMonth =
SELECTEDVALUE ( Orders[Cohort] )
RETURN
DIVIDE (
CALCULATE (
DISTINCTCOUNT ( Orders[CustomerName] ),
FILTER (
'Orders',
EOMONTH ( Orders[OrderDate], 0 )
= EOMONTH ( CurrentFirstOrderMonth, CurrentMonthAfter )
)
),
DISTINCTCOUNT ( Orders[CustomerName] ),
0
)

The logic of the DAX formulas

We first create two variables that return the selected Future Months and Cohorts. We then calculate the distinct customers and filter to where the end of the month in the selected date equals the end of the month in the <CurentFirstOrderMonth> variable created.

The EOMONTH function takes in two parameters, the start date in DateTime format, or in an accepted text representation of a date, and the months, a number representing the number of months before or after the start_date.

EOMONTH(<start_date>, <months>)

Visualizing your Cohort Analysis.

Let’s create a simple visual with a matrix. We will apply a conditional formating that shows the retention rate. When the format is in a darker shade of blue, it means a high retention rate and when it is lighter, it means a low retention rate.

Month 0, the first month of purchase is at 100% which is the first month of purchase of all the customers in that cohort.

fig 8: Cohort Analysis visual

Conclusion

Cohort analysis is a very useful technique for customer retention and can be used with RFM analysis to gain more insights into your customers’ behaviour.

Appendix

If you are more comfortable with Power BI, here’s a formula that can help you create your cohorts in Power BI using DAX.

In the <Orders> table, create a new calculated column with the formula below. Change the Column format to date format <month year>

Cohort =
VAR CurrentCustomer = Orders[CustomerName]
RETURN
CALCULATE (
EOMONTH ( MIN ( 'Orders'[OrderDate] ), 0 ),
FILTER ( 'Orders', 'Orders'[CustomerName] = CurrentCustomer )
)

Further Readings

https://www.verfacto.com/blog/customer-segmentation/cohort-analysis/

Check out my article in Customer Lifetime Value with Power BI here.

Check out my article on RFM Analysis with Power BI here.

--

--