“People are purchasing Bindle subscriptions, that’s so amazing! I got an email from this venture firm, B Permutator or something. They’re asking what’s our purchase rate, do we know enough SQL to calculate it?” asked Sandra.
Technically speaking – yes. Let’s look at the purchase rate of users who signed up within the first week of Bindle. We already know how to calculate users within a time range:
SELECT
COUNT(*) AS total_users_count
FROM users
WHERE
signup_date BETWEEN '2018-01-01' AND '2018-01-07'
In the same way, we can calculate how many of them became customers:
SELECT
COUNT(*) AS total_customers_count
FROM users
WHERE
signup_date BETWEEN '2018-01-01' AND '2018-01-07'
AND status = 'customer'
The only thing – we need to manually divide the number of customers by the number of users to calculate the purchase rate. Remember, how SQL itself was inspired by accounting? Clearly there must be a way to easily calculate rates and percentages.
The only difference between 2 queries above is the filter for customer status
status = 'customer'
. Wouldn’t it be great if we were able to count all users and customers in one query (and then divide one by the other)? It’s possible via the CASE
statement: we can use conditions (when talking about if/else our well-known filters are often referred to as conditions) in the SELECT
clause. Here’s a query that counts the number of users and customers at the same time:
SELECT
COUNT(*) AS total_users_count,
COUNT(CASE WHEN status = 'customer' THEN id END) AS total_customers_count
FROM users
WHERE
signup_date BETWEEN '2018-01-01' AND '2018-01-07'
In this query we have used a couple of new SQL features (tricks):
COUNT(*)
function will calculate the total number of rows.CASE
statement in the second COUNT
in a way that it’ll only count a user record only if this user has a customer status.CASE
statement without ELSE
. The default value of any CASE
statement is NULL
(refresher: NULL
is a special type in SQL that stands for no value). Think of it as SQL would always add ELSE NULL
to your CASE
statement if you didn’t specify it:COUNT(CASE WHEN status = 'customer' THEN id END)
is equivalent to
COUNT(CASE WHEN status = 'customer' THEN id ELSE NULL END)
And the final SQL feature we used today: counting rows using the COUNT
aggregate function will count only rows with existing values ( it’ll skip all
NULL
values). This is why our CASE
statement for customers returned the user’s id if the user has the customer status.
To have a better feeling how the CASE
statement works inside COUNT
we can imagine it as a 2 step process. First, we run a query to have a new column with the CASE
statement like so:
SELECT
*,
CASE WHEN status = 'customer' THEN id END AS new_user_status
FROM users
and then we run COUNT(new_user_status)
on it (oh yeah, you totally can query … another query Why not? The result of any query is a table like any other SQL table).
The most frequent question I get at this point is “What’s the purpose of this id
column inside the CASE statement?”
SELECT
COUNT(CASE WHEN status = 'customer' THEN id END) AS customers_count
FROM users
Well, we can use anything but NULL
to make this query count users with customer status:
SELECT
COUNT(CASE WHEN status = 'customer' THEN id END),
COUNT(CASE WHEN status = 'customer' THEN 1 END),
COUNT(CASE WHEN status = 'customer' THEN TRUE END),
COUNT(CASE WHEN status = 'customer' THEN FALSE END),
COUNT(CASE WHEN status = 'customer' THEN status END),
COUNT(CASE WHEN status = 'customer' THEN 'customer' END)
FROM users
As you can see, all these
COUNT()
-s return the same value. This is because the COUNT()
function ignores NULL
values. You can visualize it like so: the COUNT()
functions goes row-by-row in our table; if the argument (in our case the result of the CASE
statement) is not NULL
– it’ll increment the counter by 1.
So why
id
then? IMO because it’s the most useful column for the job. First of all, id
-s are unique identifiers. I read such a query as If the CASE condition is TRUE, then count this record in. Second, we often want to count only unique records and this is where id
column is absolutely necessary. More on that later in the course.
That was a looong lesson If you have any questions drop me a message via
chat in the bottom right corner. Don’t forget to practice and run all the queries from this lesson
Hi, it’s Anatoli, the author of SQL Habit.
SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst and solve real-life challenges from Business, Marketing, and Product Management.
SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work.