Product Analytics. Part 2 Aggregating analytics events into daily activity

190. Aggregating analytics events into daily activity

Before we move forward and calculate retention curve (D1-D30 retention rates for a daily cohort) or retention matrix (D1-D30 retention rates for multiple cohorts) let’s learn an important trick. Namely, generating activity data.

Why bother? Same as with activation rate, to calculate retention rate we need to define a metric that tells us that the user is retained. We also could say that if a user is active on a certain day we consider such a user retained on that day.

User activity data based on analytics events

When calculating retention rate, it often helps to imagine a table where we record dates when a user was active. With such table calculations become really simple:

user_id signup_date activity_date
2134 2018-02-01 2018-02-01
2134 2018-02-01 2018-02-02
2134 2018-02-01 2018-02-03
2135 2018-02-01 2018-02-01
2135 2018-02-01 2018-02-02
2137 2018-02-02 2018-02-02

It’s quiz time! 💥 Go ahead and write a query that generates this table from table. Consider a user active if there’s any event happened on this date. Show the query.

Here we go:

WITH users_daily_events AS (
    u.created_at::date AS signup_date,
    e.created_at::date AS activity_date,
    COUNT(*) AS events_counts
  FROM u
    ON e.user_id = u.user_id  
    u.action = 'signup'  
  GROUP BY 1, 2, 3
  ORDER BY signup_date ASC, user_id ASC

FROM users_daily_events

As you can see, it’s a combo of a subquery and our favorite grouping and counting technique. We aggregated all mobile events by user, signup date and the date mobile event was tracked.

This table will be the basis of our further retention analysis.

User activity data without analytics events

⚠ Before we continue with retention curves, let’s talk briefly about the nature of user activity data. Remember our trick with generating events based on production database tables? It allowed us to calculate activation rate without analytics events. Think for a minute, is there a way we could generate activity data based on tables form production database?

Because chances are, we don’t have mobile analytics events. What can we do instead? As you can see, the data in the books_users table is not enough:

FROM books_users

It contains only the timestamp when a user started reading a book. We definitely need something else.

Short answer: it’s impossible to generate user activity data based on existing production tables in the Bindle warehouse. The solution requires some technical support. We need to create a table in the production database and every time a user hits any API endpoint (for login, for example) we’d write to this table that this user was active. As you can see, it’s a very simplified version of a mobile analytics system that would do a good job for retention analytics. It certainly won’t allow us to zoom in and see how users interact with our app (screenviews and interactions often don’t hit API endpoints).

We’re all ready to level up our retention rate game and calculate our first retention curve. See you in the next lesson! 👋 🚀

Anatoli Makarevich, author of SQL Habit About SQL Habit

Hi, it’s Anatoli, the author of SQL Habit. 👋

SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst 📊 and solve real-life challenges from Business, Marketing, and Product Management.

SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work. 🚀

“well worth the money”

Fluent in SQL in a month

Master Data Analysis with SQL with real life examples from Product Management, Marketing, Finance and more.
-- Type your query here, for example this one -- lists all records from users table: SELECT * FROM users
Loading chart... ⏳