This might be a bit silly, I am new to SQL so pardon my simple queries. I was practicing calculating purchase rate in lesson 26 and I decided to create my own queries. The question my own query should answer is “What is the % of users who are trial users and signed up in the first 2 weeks of Bindle existance”.
So I wrote a query that calculates this (Query 1), and I wanted to check this with another query (Query 2) to see if the result is correct. I can’t figure out why I am missing 1% of the users when I run the 2nd query.
Calcualate % of the users who are trial users and signed up in the first 2 weeks of Bindle existance. This query produces a result of 36%:
SELECT 100 * COUNT(CASE WHEN status = 'trial' THEN id END) / COUNT(*) AS trial_users_first_2_weeks_Jan_2018 FROM users WHERE signup_date BETWEEN '2018-01-01' AND '2018-01-14'
Below query should return 64% as the query above returned 36% but it returns 63% and I don’t know why.
SELECT 100 * COUNT(CASE WHEN status = 'customer' OR status = 'free' THEN id END) / COUNT(*) AS non_trial_users_first_2_weeks_Jan_2018 FROM users WHERE signup_date BETWEEN '2018-01-01' AND '2018-01-14'
Below query shows all unique values for status column in the user table for the same time frame as previous queries. I used it to figure out how to structure second query.
SELECT DISTINCT status FROM users WHERE signup_date BETWEEN '2018-01-01' AND '2018-01-14'
WITH spends AS ( SELECT utm_campaign, SUM(amount) AS total_spend FROM marketing_spends GROUP BY 1 ), total_customers AS ( SELECT utm_campaign, COUNT(*) AS customers_count FROM users WHERE utm_campaign IS NOT NULL AND status = 'customer' GROUP BY 1 ) SELECT s.utm_campaign, total_spend / customers_count AS CAC FROM spends s LEFT JOIN total_customers u ON s.utm_campaign = u.utm_campaign ORDER BY 2 ASC NULLS LAST
I think the
utm_campaign IS NOT NULL filter used here is redundant because we’re doing a
LEFT JOIN from the spends CTE and the marketing_spends table has no NULLs under its utm_campaign column. It would, however, make a difference if we did the join the other way round.
I have noticed that there are a lot of places in which people have used subquery within select/from/where or some other clauses. Can you please share some scenrios & guidelines around when to use a sub-query in which clause.
Kind regards, Tushar
With reference to the CTE -
total_users in following query for this chapter, my query is that why isn’t
distinct used with
COUNT(*) AS users_count
WITH spends AS ( SELECT utm_source, SUM(amount) AS total_spend FROM marketing_spends GROUP BY 1 ), total_users AS ( SELECT utm_source, COUNT(*) AS users_count FROM users WHERE utm_source IS NOT NULL GROUP BY 1 ) SELECT s.utm_source, total_spend / users_count AS CPA FROM spends s INNER JOIN total_users u ON s.utm_source = u.utm_source
In 105. a great and important point is made in the difference of limiting a query in the join conditions vs. limiting a query in the where clause.
I understand the scenario of why limiting the query in the join condition makes sense (filtering the purchases table to only consider all purchases that were not refunded) and I see how using the where clause ultimately returns wrong output (filtering in the where clause will ignore the ‘negative space’ of all the NULLs in purchase columns for users who did not make a purchase).
I tried challenging my understanding a bit and came up with the ‘b’ CTE below (‘a’ CTE is the correct query as described in the exercise). My question is then: how am I making a logical error in trying to utilise the where clause in the ‘b’ CTE below? As far as I can see, the difference in counts between ‘a’ and ‘b’ is 104 rows (purchases that were refunded?).
WITH a AS ( SELECT SUM(amount) / COUNT(DISTINCT(u.id)) AS ARPU FROM users u LEFT JOIN purchases p ON u.id = p.user_id AND refunded = FALSE ), b AS ( SELECT SUM(amount) / COUNT(DISTINCT(u.id)) AS ARPU FROM users as u LEFT JOIN purchases AS p ON u.id = p.user_id WHERE refunded IS NULL OR refunded = FALSE ) SELECT * FROM b -- SELECT * FROM a /* The 'a' CTE above is correct. I can see that 'b' CTE is not correct, but I'm unable to wrap my mind around why, for the 'b' version, the 'amount' is greater in relation to the distinct user count. */
Hi:) Why does the following syntax not work and is there a way to write it without CTEs?
SELECT COUNT(DISTINCT(user_id)), custom_parameters ->> 'ab_test_name' AS ab_test_name, custom_parameters ->> 'ab_test_variation' AS ab_test_variation, created_at AS categorized_at FROM mobile_analytics.events WHERE custom_parameters ->> 'ab_test_name' IS NOT NULL AND action = 'signup' AND ab_test_name = 'longer_onboarding_201803' GROUP BY ab_test_variation
Hello, I’m struggling to understand the formula used in the exercise to calculate the churn rate:
SELECT 100 - 100 * COUNT(DISTINCT(b.visitor_id)) / COUNT(DISTINCT(h.visitor_id))::float AS churn_rate FROM web_analytics.pageviews h
I don’t get the 100-….why cannot we leave it as:
100 * COUNT(DISTINCT(b.visitor_id)) / COUNT(DISTINCT(h.visitor_id))::float
I have a question about the funnel analysis query in lesson 129, namely this one:
SELECT COUNT(h.visitor_id) AS homepage_pvs, COUNT(b.visitor_id) AS book_page_pvs FROM web_analytics.pageviews h LEFT JOIN web_analytics.pageviews b ON h.visitor_id = b.visitor_id AND b.url LIKE '%/books/%' AND ( b.referer_url = 'https://www.bindle.com/' OR b.referer_url LIKE 'https://www.bindle.com/?%' ) AND b.created_at BETWEEN h.created_at AND h.created_at + '30 minutes'::interval WHERE h.url = 'https://www.bindle.com/' OR h.url LIKE 'https://www.bindle.com/?%'
My question is: Is the
AND ( b.referer_url = 'https://www.bindle.com/' OR b.referer_url LIKE 'https://www.bindle.com/?%' )
bit necessary, given that we filter using
WHERE further down? Is there a difference? They seem to give the same results.
In the following code from Lesson 109.. how come we dont need to group by utm_campaign at the end when we are calculating CPA? is it because we joined the users to the row by utm_campaign?
WITH spend_per_campaign AS ( SELECT utm_campaign, SUM(amount) AS total_spend FROM marketing_spends GROUP BY 1 ), users_per_campaign AS ( SELECT utm_campaign, COUNT(*) AS users_count FROM users WHERE utm_campaign IS NOT NULL GROUP BY 1 ) SELECT s.utm_campaign, users_count, total_spend / users_count AS CPA FROM spend_per_campaign s INNER JOIN users_per_campaign u ON s.utm_campaign= u.utm_campaign