I am exploring an alternative solution to the one shown for Exercise 204.
My question is, why can we not filter out refunded values within the WHERE
clause?
WITH ab_test_categorization AS (
SELECT
user_id,
custom_parameters ->> 'ab_test_name' AS ab_test_name,
custom_parameters ->> 'ab_test_variation' AS ab_test_variation,
created_at AS categorized_at
FROM mobile_analytics.events
WHERE
custom_parameters ->> 'ab_test_name' IS NOT NULL
AND action = 'signup'
), ab_test_stats AS (
SELECT
ab_test_variation AS variation,
COUNT(DISTINCT(c.user_id)) / SUM(p.amount) AS arpu
FROM ab_test_categorization c
LEFT JOIN purchases p
ON c.user_id = p.user_id
WHERE
ab_test_name = 'longer_onboarding_201803'
AND p.refunded = FALSE
GROUP BY 1
)
SELECT
variation,
arpu
FROM ab_test_stats
Are there any performance differences between INNER JOIN and LEFT JOIN? For instance, this following subquery from lesson 207 could have been written with an INNER JOIN.
WITH user_activity AS (
SELECT
u.user_id,
u.created_at::date AS signup_date,
e.created_at::date AS activity_date,
COUNT(*) AS events_counts
FROM mobile_analytics.events u
LEFT JOIN mobile_analytics.events e
ON e.user_id = u.user_id
WHERE
u.action = 'signup'
GROUP BY 1, 2, 3
ORDER BY signup_date ASC, user_id ASC
)
SELECT *
FROM user_activity
WITH spends AS (
SELECT
utm_campaign,
SUM(amount) AS total_spend
FROM marketing_spends
GROUP BY 1
), total_customers AS (
SELECT
utm_campaign,
COUNT(*) AS customers_count
FROM users
WHERE
utm_campaign IS NOT NULL
AND status = 'customer'
GROUP BY 1
)
SELECT
s.utm_campaign,
total_spend / customers_count AS CAC
FROM spends s
LEFT JOIN total_customers u
ON s.utm_campaign = u.utm_campaign
ORDER BY 2 ASC NULLS LAST
I think the utm_campaign IS NOT NULL
filter used here is redundant because we’re doing a LEFT JOIN
from the spends CTE and the marketing_spends table has no NULLs under its utm_campaign column. It would, however, make a difference if we did the join the other way round.
I’m in lesson 110. CPA Cost per acquisition
In this lesson appears the following query:
WITH spends AS (
SELECT
utm_source,
SUM(amount) AS total_spend
FROM marketing_spends
GROUP BY 1
), total_users AS (
SELECT
utm_source,
COUNT(*) AS users_count
FROM users
WHERE
utm_source IS NOT NULL
GROUP BY 1
)
SELECT
s.utm_source,
total_spend / users_count AS CPA
FROM spends s
INNER JOIN total_users u
ON s.utm_source = u.utm_source
However, I tried to do the exercise by my own using the following query:
SELECT ms.utm_source, SUM(amount), COUNT(DISTINCT(u.id))
FROM marketing_spends ms
LEFT JOIN users u
ON u.utm_source = ms.utm_source
GROUP BY 1
But I get different results for the amount spent, I don’t understand why it is, the users count is correct.
Hi there,
I don’t quite understand why this sub query (below) is turning up zeros, rather than purchase rate to question 87. I see from the answer I could have simplified this into one query, but I don’t understand why this way doesn’t work as well? Thanks for your help!
with tabletime as (
select
utm_campaign,
COUNT(*) AS userscount,
Count(CASE WHEN status = 'customer' THEN id END) AS payingcustomers
from users
WHERE
utm_campaign IS NOT NULL
GROUP BY utm_campaign
)
SELECT
utm_campaign,
100 * (payingcustomers / userscount) as purchase_rate
FROM tabletime
Hello,
with reference to chapter “181. Counting app releases per month” & query - “How many releases Bindle had in February, 2018?” - I’m unable to understand on how’s the following solution different from the one proposed in chapter & why isn’t this correct?
SELECT COUNT( distinct app_version)
FROM adjust.callbacks
where to_char(created_at, 'yyyy-mm') = '2018-02'
Thanks, Tushar
Hello,
With reference to the CTE - total_users
in following query for this chapter, my query is that why isn’t distinct
used with COUNT(*) AS users_count
WITH spends AS (
SELECT
utm_source,
SUM(amount) AS total_spend
FROM marketing_spends
GROUP BY 1
), total_users AS (
SELECT
utm_source,
COUNT(*) AS users_count
FROM users
WHERE
utm_source IS NOT NULL
GROUP BY 1
)
SELECT
s.utm_source,
total_spend / users_count AS CPA
FROM spends s
INNER JOIN total_users u
ON s.utm_source = u.utm_source
I tried solving 106 problem using the following query:
SELECT
utm_campaign,
SUM(amount) AS total_revenue,
COUNT(DISTINCT a.id) AS total_users,
SUM(amount)/COUNT(DISTINCT b.id) AS ARPU
FROM purchases a
JOIN users b
ON a.user_id = b.id
AND a.refunded = FALSE
-- WHERE utm_campaign IS NOT NULL
GROUP BY 1
ORDER BY 4 DESC
which should be quite similar with the proposed solution, the difference is on the order of FROM clause. I used purchase table before user table and it gives a very different results.
I’d like to understand why is this the case as this is a quite surprising SQL behavior to me.
In 105. a great and important point is made in the difference of limiting a query in the join conditions vs. limiting a query in the where clause.
I understand the scenario of why limiting the query in the join condition makes sense (filtering the purchases table to only consider all purchases that were not refunded) and I see how using the where clause ultimately returns wrong output (filtering in the where clause will ignore the ‘negative space’ of all the NULLs in purchase columns for users who did not make a purchase).
I tried challenging my understanding a bit and came up with the ‘b’ CTE below (‘a’ CTE is the correct query as described in the exercise). My question is then: how am I making a logical error in trying to utilise the where clause in the ‘b’ CTE below? As far as I can see, the difference in counts between ‘a’ and ‘b’ is 104 rows (purchases that were refunded?).
WITH a AS (
SELECT
SUM(amount) / COUNT(DISTINCT(u.id)) AS ARPU
FROM users u
LEFT JOIN purchases p
ON u.id = p.user_id
AND refunded = FALSE
),
b AS (
SELECT
SUM(amount) / COUNT(DISTINCT(u.id)) AS ARPU
FROM users as u
LEFT JOIN purchases AS p
ON u.id = p.user_id
WHERE refunded IS NULL OR refunded = FALSE
)
SELECT * FROM b
-- SELECT * FROM a
/*
The 'a' CTE above is correct. I can see that 'b' CTE is not correct,
but I'm unable to wrap my mind around why, for the 'b' version, the
'amount' is greater in relation to the distinct user count.
*/