“Who are these people?” asked Sandra laughing, “So many users signed up! It would be so great to know more about them, for example how old are they. We do ask for age in our signup flow, I bet we can query this data!”
Time to explore one more column in the users
table – age
. Inside the age
column we store a number that users type in after they signed up. It could be used in many ways – we can look at product usage or marketing efforts per age group. It helps better understand your audience (and adjust copy or product strategy based on data) or make sure you target the right audience (imagine you want 30+ people to use your product and your marketing campaigns only bring you 15-year-olds. It’s better to turn these campaigns off).
Age itself isn’t very helpful – we’ll most likely have all ages between 10 and 100. Usually, we assign an age group to each user and then use groups for analysis. Typical age groups are:
(makes me shiver when I’m writing this, I’m 30 ). All righty, now we have our age groups defined let’s write a query that will assign a group to each user. To do this for each row we’d need to perform the following calculation: “If a user’s age is less than 16 – set age_group as ‘youth’; if age is between 18 and 25 set_age_group to ‘young_adult’ … else set age_group to ‘aged’.” In programming (and in SQL as well) it’s often referred to as if/else logic.
In SQL if/else logic is implemented via the CASE
statement:
SELECT
CASE
WHEN age <= 16 THEN 'youth'
WHEN age BETWEEN 17 AND 25 THEN 'young_adult'
WHEN age BETWEEN 26 AND 35 THEN 'adult'
WHEN age BETWEEN 36 AND 45 THEN 'middle_aged'
ELSE 'aged'
END AS age_group,
*
FROM users
I know this query might be overwhelming, shoot me a message via chat
or leave feedback after you finish the lesson.
Let’s break down what’s going on in this query:
users
table using a wildcard *
AS age_group
BETWEEN
CASE
statement ends with ELSE keyword – it works only if all previous conditions are not valid If you run this query
you’ll see that the
age_group
column will be the first one, followed by the rest of the users
table columns (because we used the wildcard *
in the SELECT
statement). We can totally change the order and have age_group
in the end like this:
SELECT
*,
CASE
WHEN age <= 16 THEN 'youth'
WHEN age BETWEEN 17 AND 25 THEN 'young_adult'
WHEN age BETWEEN 26 AND 35 THEN 'adult'
WHEN age BETWEEN 36 AND 45 THEN 'middle_aged'
ELSE 'aged'
END AS age_group
FROM users
Just a refresher: we can select multiple columns from the table, we just need to list them in the
SELECT
statement and separate with commas. From that point of view the age_group
is just yet another column we’re adding to our result set.
We can organize our query’s result as we want by simply specifying necessary columns in the
SELECT
statement. When it comes to calculating new custom columns (like age_group
) I always add them in the beginning, because usually they’re the most important ones.
Let’s look closely at the CASE
statement:
CASE
WHEN condition_1 THEN result_1
WHEN condition_2 THEN result_2
WHEN condition_n THEN result_n
ELSE default_result
END
Here are more details on how it works:
CASE
keyword and ends with END
WHERE
clauseCASE
statement is evaluated it checks conditions top to bottom until the first match and then stopsIf/else logic gives you so much power – we’ll use CASE
statement a lot when calculating rates, percentages, segment users, etc. Go ahead and run these queries in the Playground
Hi, it’s Anatoli, the author of SQL Habit.
SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst and solve real-life challenges from Business, Marketing, and Product Management.
SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work.