Fundamentals of Data Analysis Calculating purchase rate. Part 1

# 26. Calculating purchase rate. Part 1

“People are purchasing Bindle subscriptions, that’s so amazing! I got an email from this venture firm, B Permutator or something. They’re asking what’s our purchase rate, do we know enough SQL to calculate it?” asked Sandra.

Technically speaking – yes. Let’s look at the purchase rate of users who signed up within the first week of Bindle. We already know how to calculate users within a time range:

``````SELECT
COUNT(*) AS total_users_count
FROM users
WHERE
signup_date BETWEEN '2018-01-01' AND '2018-01-07'
``````

In the same way, we can calculate how many of them became customers:

``````SELECT
COUNT(*) AS total_customers_count
FROM users
WHERE
signup_date BETWEEN '2018-01-01' AND '2018-01-07'
AND status = 'customer'
``````

The only thing – we need to manually divide the number of customers by the number of users to calculate the purchase rate. Remember, how SQL itself was inspired by accounting? Clearly there must be a way to easily calculate rates and percentages.

## CASE statement

The only difference between 2 queries above is the filter for customer status `status = 'customer'`. Wouldn’t it be great if we were able to count all users and customers in one query (and then divide one by the other)? It’s possible via the `CASE` statement: we can use conditions (when talking about if/else our well-known filters are often referred to as conditions) in the `SELECT` clause. Here’s a query that counts the number of users and customers at the same time:

``````SELECT
COUNT(*) AS total_users_count,
COUNT(CASE WHEN status = 'customer' THEN id END) AS total_customers_count
FROM users
WHERE
signup_date BETWEEN '2018-01-01' AND '2018-01-07'
``````

In this query we have used a couple of new SQL features (tricks):

• we have multiple aggregate functions. That’s totally OK, remember how aggregate functions work? They calculate a single value based on all filtered rows. For example, the `COUNT(*)` function will calculate the total number of rows.
• we used the `CASE` statement in the second `COUNT` in a way that it’ll only count a user record only if this user has a customer status.
• we used the `CASE` statement without `ELSE`. The default value of any `CASE` statement is `NULL` (refresher: `NULL` is a special type in SQL that stands for no value). Think of it as SQL would always add `ELSE NULL` to your `CASE` statement if you didn’t specify it:

is equivalent to

And the final SQL feature we used today: counting rows using the `COUNT` aggregate function will count only rows with existing values ( it’ll skip all `NULL` values). This is why our `CASE` statement for customers returned the user’s id if the user has the customer status.

To have a better feeling how the `CASE` statement works inside `COUNT` we can imagine it as a 2 step process. First, we run a query to have a new column with the `CASE` statement like so:

``````SELECT
*,
CASE WHEN status = 'customer' THEN id END AS new_user_status
FROM users
``````

and then we run `COUNT(new_user_status)` on it (oh yeah, you totally can query … another query Why not? The result of any query is a table like any other SQL table).

## Why using id in the CASE statement?

The most frequent question I get at this point is “What’s the purpose of this `id` column inside the CASE statement?”

``````SELECT
COUNT(CASE WHEN status = 'customer' THEN id END) AS customers_count
FROM users
``````

Well, we can use anything but `NULL` to make this query count users with customer status:

``````SELECT
COUNT(CASE WHEN status = 'customer' THEN id END),
COUNT(CASE WHEN status = 'customer' THEN 1 END),
COUNT(CASE WHEN status = 'customer' THEN TRUE END),
COUNT(CASE WHEN status = 'customer' THEN FALSE END),
COUNT(CASE WHEN status = 'customer' THEN status END),
COUNT(CASE WHEN status = 'customer' THEN 'customer' END)
FROM users
``````

As you can see, all these `COUNT()`-s return the same value. This is because the `COUNT()` function ignores `NULL` values. You can visualize it like so: the `COUNT()` functions goes row-by-row in our table; if the argument (in our case the result of the `CASE` statement) is not `NULL` – it’ll increment the counter by 1.

So why `id` then? IMO because it’s the most useful column for the job. First of all, `id`-s are unique identifiers. I read such a query as If the CASE condition is TRUE, then count this record in. Second, we often want to count only unique records and this is where `id` column is absolutely necessary. More on that later in the course.

That was a looong lesson If you have any questions drop me a message via chat in the bottom right corner. Don’t forget to practice and run all the queries from this lesson