“People are purchasing Bindle subscriptions, that’s so amazing! I got an email from this venture firm, B Permutator or something. They’re asking what’s our purchase rate, do we know enough SQL to calculate it?” asked Sandra.
Technically speaking – yes. Let’s look at the purchase rate of users who signed up within the first week of Bindle. We already know how to calculate users within a time range:
SELECT COUNT(*) AS total_users_count FROM users WHERE signup_date BETWEEN '2018-01-01' AND '2018-01-07'
In the same way, we can calculate how many of them became customers:
SELECT COUNT(*) AS total_customers_count FROM users WHERE signup_date BETWEEN '2018-01-01' AND '2018-01-07' AND status = 'customer'
The only thing – we need to manually divide the number of customers by the number of users to calculate the purchase rate. Remember, how SQL itself was inspired by accounting? Clearly there must be a way to easily calculate rates and percentages.
The only difference between 2 queries above is the filter for customer status
status = 'customer'. Wouldn’t it be great if we were able to count all users and customers in one query (and then divide one by the other)? It’s possible via the
CASE statement: we can use conditions (when talking about if/else our well-known filters are often referred to as conditions) in the
SELECT clause. Here’s a query that counts the number of users and customers at the same time:
SELECT COUNT(*) AS total_users_count, COUNT(CASE WHEN status = 'customer' THEN id END) AS total_customers_count FROM users WHERE signup_date BETWEEN '2018-01-01' AND '2018-01-07'
In this query we have used a couple of new SQL features (tricks):
COUNT(*)function will calculate the total number of rows.
CASEstatement in the second
COUNTin a way that it’ll only count a user record only if this user has a customer status.
ELSE. The default value of any
NULLis a special type in SQL that stands for no value). Think of it as SQL would always add
ELSE NULLto your
CASEstatement if you didn’t specify it:
COUNT(CASE WHEN status = 'customer' THEN id END)
is equivalent to
COUNT(CASE WHEN status = 'customer' THEN id ELSE NULL END)
And the final SQL feature we used today: counting rows using the
COUNT aggregate function will count only rows with existing values ( it’ll skip all
NULL values). This is why our
CASE statement for customers returned the user’s id if the user has the customer status.
To have a better feeling how the
CASE statement works inside
COUNT we can imagine it as a 2 step process. First, we run a query to have a new column with the
CASE statement like so:
SELECT *, CASE WHEN status = 'customer' THEN id END AS new_user_status FROM users
and then we run
COUNT(new_user_status) on it (oh yeah, you totally can query … another query Why not? The result of any query is a table like any other SQL table).
The most frequent question I get at this point is “What’s the purpose of this
id column inside the CASE statement?”
SELECT COUNT(CASE WHEN status = 'customer' THEN id END) AS customers_count FROM users
Well, we can use anything but
NULL to make this query count users with customer status:
SELECT COUNT(CASE WHEN status = 'customer' THEN id END), COUNT(CASE WHEN status = 'customer' THEN 1 END), COUNT(CASE WHEN status = 'customer' THEN TRUE END), COUNT(CASE WHEN status = 'customer' THEN FALSE END), COUNT(CASE WHEN status = 'customer' THEN status END), COUNT(CASE WHEN status = 'customer' THEN 'customer' END) FROM users
As you can see, all these
COUNT()-s return the same value. This is because the
COUNT() function ignores
NULL values. You can visualize it like so: the
COUNT() functions goes row-by-row in our table; if the argument (in our case the result of the
CASE statement) is not
NULL – it’ll increment the counter by 1.
id then? IMO because it’s the most useful column for the job. First of all,
id-s are unique identifiers. I read such a query as If the CASE condition is TRUE, then count this record in. Second, we often want to count only unique records and this is where
id column is absolutely necessary. More on that later in the course.
That was a looong lesson If you have any questions drop me a message via chat in the bottom right corner. Don’t forget to practice and run all the queries from this lesson
Hi, it’s Anatoli, the author of SQL Habit.
SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst and solve real-life challenges from Business, Marketing, and Product Management.
SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work.