“Who are these people?” asked Sandra laughing, “So many users signed up! It would be so great to know more about them, for example how old are they. We do ask for age in our signup flow, I bet we can query this data!”
Time to explore one more column in the
users table –
age. Inside the
age column we store a number that users type in after they signed up. It could be used in many ways – we can look at product usage or marketing efforts per age group. It helps better understand your audience (and adjust copy or product strategy based on data) or make sure you target the right audience (imagine you want 30+ people to use your product and your marketing campaigns only bring you 15-year-olds. It’s better to turn these campaigns off).
Age itself isn’t very helpful – we’ll most likely have all ages between 10 and 100. Usually, we assign an age group to each user and then use groups for analysis. Typical age groups are:
(makes me shiver when I’m writing this, I’m 30 ). All righty, now we have our age groups defined let’s write a query that will assign a group to each user. To do this for each row we’d need to perform the following calculation: “If a user’s age is less than 16 – set age_group as ‘youth’; if age is between 18 and 25 set_age_group to ‘young_adult’ … else set age_group to ‘aged’.” In programming (and in SQL as well) it’s often referred to as if/else logic.
In SQL if/else logic is implemented via the
SELECT CASE WHEN age <= 16 THEN 'youth' WHEN age BETWEEN 17 AND 25 THEN 'young_adult' WHEN age BETWEEN 26 AND 35 THEN 'adult' WHEN age BETWEEN 36 AND 45 THEN 'middle_aged' ELSE 'aged' END AS age_group, * FROM users
I know this query might be overwhelming, shoot me a message via chat or leave feedback after you finish the lesson.
Let’s break down what’s going on in this query:
userstable using a wildcard
CASEstatement ends with ELSE keyword – it works only if all previous conditions are not valid
If you run this query you’ll see that the
age_group column will be the first one, followed by the rest of the
users table columns (because we used the wildcard
* in the
SELECT statement). We can totally change the order and have
age_group in the end like this:
SELECT *, CASE WHEN age <= 16 THEN 'youth' WHEN age BETWEEN 17 AND 25 THEN 'young_adult' WHEN age BETWEEN 26 AND 35 THEN 'adult' WHEN age BETWEEN 36 AND 45 THEN 'middle_aged' ELSE 'aged' END AS age_group FROM users
Just a refresher: we can select multiple columns from the table, we just need to list them in the
SELECT statement and separate with commas. From that point of view the
age_group is just yet another column we’re adding to our result set.
We can organize our query’s result as we want by simply specifying necessary columns in the
SELECT statement. When it comes to calculating new custom columns (like
age_group) I always add them in the beginning, because usually they’re the most important ones.
Let’s look closely at the
CASE WHEN condition_1 THEN result_1 WHEN condition_2 THEN result_2 WHEN condition_n THEN result_n ELSE default_result END
Here are more details on how it works:
CASEkeyword and ends with
CASEstatement is evaluated it checks conditions top to bottom until the first match and then stops
If/else logic gives you so much power – we’ll use
CASE statement a lot when calculating rates, percentages, segment users, etc. Go ahead and run these queries in the Playground
Hi, it’s Anatoli, the author of SQL Habit.
SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst and solve real-life challenges from Business, Marketing, and Product Management.
SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work.