Fundamentals of Data Analysis If/else logic via CASE statement

# 25. If/else logic via CASE statement

“Who are these people?” asked Sandra laughing, “So many users signed up! It would be so great to know more about them, for example how old are they. We do ask for age in our signup flow, I bet we can query this data!”

Time to explore one more column in the `users` table – `age`. Inside the `age` column we store a number that users type in after they signed up. It could be used in many ways – we can look at product usage or marketing efforts per age group. It helps better understand your audience (and adjust copy or product strategy based on data) or make sure you target the right audience (imagine you want 30+ people to use your product and your marketing campaigns only bring you 15-year-olds. It’s better to turn these campaigns off).

## Calculating age group with CASE statement

Age itself isn’t very helpful – we’ll most likely have all ages between 10 and 100. Usually, we assign an age group to each user and then use groups for analysis. Typical age groups are:

• Youth (< 17)
• Middle-aged (36-45)
• Aged (46+)

(makes me shiver when I’m writing this, I’m 30 ). All righty, now we have our age groups defined let’s write a query that will assign a group to each user. To do this for each row we’d need to perform the following calculation: “If a user’s age is less than 16 – set age_group as ‘youth’; if age is between 18 and 25 set_age_group to ‘young_adult’ … else set age_group to ‘aged’.” In programming (and in SQL as well) it’s often referred to as if/else logic.

In SQL if/else logic is implemented via the `CASE` statement:

``````SELECT
CASE
WHEN age <= 16 THEN 'youth'
WHEN age BETWEEN 17 AND 25 THEN 'young_adult'
WHEN age BETWEEN 26 AND 35 THEN 'adult'
WHEN age BETWEEN 36 AND 45 THEN 'middle_aged'
ELSE 'aged'
END AS age_group,
*
FROM users
``````

I know this query might be overwhelming, shoot me a message via chat or leave feedback after you finish the lesson.

Let’s break down what’s going on in this query:

• we’re selecting all columns from the `users` table using a wildcard `*`
• we’re adding a new column named age_group using alias operator `AS age_group`
• we’re using comparison operators to determine which age group to set inside new age_group column (less than < and `BETWEEN`
• note how the `CASE` statement ends with ELSE keyword – it works only if all previous conditions are not valid

If you run this query you’ll see that the `age_group` column will be the first one, followed by the rest of the `users` table columns (because we used the wildcard `*` in the `SELECT` statement). We can totally change the order and have `age_group` in the end like this:

``````SELECT
*,
CASE
WHEN age <= 16 THEN 'youth'
WHEN age BETWEEN 17 AND 25 THEN 'young_adult'
WHEN age BETWEEN 26 AND 35 THEN 'adult'
WHEN age BETWEEN 36 AND 45 THEN 'middle_aged'
ELSE 'aged'
END AS age_group
FROM users
``````

Just a refresher: we can select multiple columns from the table, we just need to list them in the `SELECT` statement and separate with commas. From that point of view the `age_group` is just yet another column we’re adding to our result set.

We can organize our query’s result as we want by simply specifying necessary columns in the `SELECT` statement. When it comes to calculating new custom columns (like `age_group`) I always add them in the beginning, because usually they’re the most important ones.

## Case statement

Let’s look closely at the `CASE` statement:

Here are more details on how it works:

• it always starts with `CASE` keyword and ends with `END`
• conditions (condition_1, condition_2, etc) are the same filters we write in `WHERE` clause
• when the `CASE` statement is evaluated it checks conditions top to bottom until the first match and then stops

If/else logic gives you so much power – we’ll use `CASE` statement a lot when calculating rates, percentages, segment users, etc. Go ahead and run these queries in the Playground