SQL Habit Forum

Welcome to SQL Habit Community Forum 👋 Ask anything about the SQL Habit platform, Data Analysis, applications of data for running Web or Mobile products, etc.

105. 'Logical fallacy' explanation of my alternative query using 'where' clause

In 105. a great and important point is made in the difference of limiting a query in the join conditions vs. limiting a query in the where clause.

I understand the scenario of why limiting the query in the join condition makes sense (filtering the purchases table to only consider all purchases that were not refunded) and I see how using the where clause ultimately returns wrong output (filtering in the where clause will ignore the ‘negative space’ of all the NULLs in purchase columns for users who did not make a purchase).

I tried challenging my understanding a bit and came up with the ‘b’ CTE below (‘a’ CTE is the correct query as described in the exercise). My question is then: how am I making a logical error in trying to utilise the where clause in the ‘b’ CTE below? As far as I can see, the difference in counts between ‘a’ and ‘b’ is 104 rows (purchases that were refunded?).

    SUM(amount) / COUNT(DISTINCT(u.id)) AS ARPU
  FROM users u
  LEFT JOIN purchases p
  ON u.id = p.user_id
  AND refunded = FALSE

b AS (
    SUM(amount) / COUNT(DISTINCT(u.id)) AS ARPU
  FROM users as u
  LEFT JOIN purchases AS p
  ON u.id = p.user_id
  WHERE refunded IS NULL OR refunded = FALSE


The 'a' CTE above is correct. I can see that 'b' CTE is not correct,
but I'm unable to wrap my mind around why, for the 'b' version, the
'amount' is greater in relation to the distinct user count.
“no fuss, just things you actually need”

Start learning with SQL Habit today

Master Data Analysis with SQL through the story of how a startup succeeded through data.