FORUM Web Analytics Different Results of Amount Spent

Different Results of Amount Spent

I’m in lesson 110. CPA Cost per acquisition

In this lesson appears the following query:

WITH spends AS (
  SELECT
    utm_source,
    SUM(amount) AS total_spend
  FROM marketing_spends
  GROUP BY 1  
), total_users AS (
  SELECT
    utm_source,
    COUNT(*) AS users_count
  FROM users
  WHERE 
    utm_source IS NOT NULL
  GROUP BY 1
)

SELECT
  s.utm_source,
  total_spend / users_count AS CPA
FROM spends s
INNER JOIN total_users u
  ON s.utm_source = u.utm_source

However, I tried to do the exercise by my own using the following query:

SELECT ms.utm_source, SUM(amount), COUNT(DISTINCT(u.id))
FROM marketing_spends ms  
LEFT JOIN users u  
  ON u.utm_source = ms.utm_source
GROUP BY 1

But I get different results for the amount spent, I don’t understand why it is, the users count is correct.

REPLIES

Hi @ecasanova95 👋

Very interesting, let’s see what we have here. 🔍

As usual, let’s start by looking at the raw data:

SELECT 
  ms.utm_source, 
  SUM(amount) AS total_spend, 
  COUNT(DISTINCT(u.id)) AS uniq_users
FROM marketing_spends ms  
LEFT JOIN users u  
  ON u.utm_source = ms.utm_source
GROUP BY 1

As you can see, your query reports a way higher amount of total_spend, hundreds of times more. Actually, if you calculate how much more you’ll see that it’s exactly 421 times more. Your query also report 421 uniq_users. Coincidence? 🙀

Sum and LEFT JOIN == 💔

Let’s debug the issue further: we looked at grouped raw data and we see that the numbers we report are suspiciously high. Let’s go one step back and look at the raw data before aggregation:

SELECT *
FROM marketing_spends ms  
LEFT JOIN users u  
  ON u.utm_source = ms.utm_source

As you can see, the problem is in the LEFT JOIN: we attach multiple marketing spend records to every user. We can’t sum spend in that case because we’ll count the same spend multiple times. 💣

That’s why it’s important to aggregate spends before we join them (as in original query).

Hope that clarifies things @ecasanova95.

💡 Next time try to look at the raw data, then raw data from the previous step of the calculation, etc.

🍻

WRITE A REPLY