FORUM Marketing Analytics 106. How order in FROM clause changes the results

# 106. How order in FROM clause changes the results

I tried solving 106 problem using the following query:

``````SELECT
utm_campaign,
SUM(amount) AS total_revenue,
COUNT(DISTINCT a.id) AS total_users,
SUM(amount)/COUNT(DISTINCT b.id) AS ARPU
FROM purchases a
ON a.user_id = b.id
AND a.refunded = FALSE
-- WHERE utm_campaign IS NOT NULL
GROUP BY 1
ORDER BY 4 DESC
``````

which should be quite similar with the proposed solution, the difference is on the order of FROM clause. I used purchase table before user table and it gives a very different results.

I’d like to understand why is this the case as this is a quite surprising SQL behavior to me.

#### REPLIES

Hi @itslutfi

That’s a great question, JOIN-ing is one of the fundamentals of SQL and Data Analysis. I’d recommend you to skim through the lessons for INNER and LEFT joins.

I’m also glad that you stumbled upon the joining problem with that example, it’s super awesome because you’ll see this pattern in all conversion calculations later in the course and at work.

## Debugging a query

Let’s simplify your query a bit and focus only on joining tables. I think I know what went wrong, let me give you an overview of how I usually debug such problems with queries.

PRO TIP: Use first letters of tables for aliases. That way it’s easy to understand which tables you use for filtering, sorting, selecting, etc.

Here’s our simplified query:

``````SELECT
COUNT(DISTINCT(u.id)) AS users_count,
COUNT(DISTINCT(p.id)) AS purchases_count
FROM users u
JOIN purchases p
ON p.user_id = u.id
AND refunded = FALSE
``````

Here’s the result set:

 users_count purchases_count 805 812

Now let’s change the order of tables:

``````SELECT
COUNT(DISTINCT(u.id)) AS users_count,
COUNT(DISTINCT(p.id)) AS purchases_count
FROM purchases p
ON p.user_id = u.id
AND refunded = FALSE
``````
 users_count purchases_count 805 812

As you can see, the results are identical and the order of tables doesn’t make a difference. It means the problem is somewhere else. Since our query is very simple, the only other place to look at is the type of JOIN we’re using.

## Fixing the problem

Let’s look at the JOIN type in our query. You’re using `INNER JOIN` (`JOIN` is just a shorter alias) and by definition it returns only rows that are present in both tables. That’s why when we change the order of tables the results are identical.

Let’s look at our task again. We need to calculate average revenue per user:

1. We count all users.
2. We sum up their revenue.
3. We divide 2 by 3.

Atm our query has a bug in the first step – `INNER JOIN` removes all users without purchases.

To count all users and still be able to sum up their revenue we need to use `LEFT JOIN` – it keeps all rows from the left table (users) and joins only matching rows from the right table (purchases).

``````SELECT
COUNT(DISTINCT(u.id)) AS users_count,
COUNT(DISTINCT(p.id)) AS purchases_count
FROM users u
LEFT JOIN purchases p
ON p.user_id = u.id
AND refunded = FALSE
``````
 users_count purchases_count 6082 812

As you can see, the amount of purchases stayed the same and the number of users is way higher. In fact, this query counts all users in our database, we can verify it by counting them separately:

``````SELECT COUNT(*) AS users_count
FROM users
``````

Now what if we change the order in the query with `LEFT JOIN`?

``````SELECT
COUNT(DISTINCT(u.id)) AS users_count,
COUNT(DISTINCT(p.id)) AS purchases_count
FROM purchases p
ON p.user_id = u.id
AND refunded = FALSE
``````

Of course, the results will be completely different because of our JOIN type. `LEFT JOIN` will keep all records from the left table (in that case purchases) and only join matching records from the right table (users).

## Summary

To select a correct JOIN type we need to deeply understand what we’re calculating. Ideally, break down our task into a simple algorithm and then validate each step with simpler queries like this one:

``````SELECT COUNT(*) AS users_count
FROM users
``````

Most imporatnly, always remember that `INNER JOIN` returns the records that are present in both table. You can visualize it as an intersection.

The `LEFT JOIN` always keeps all records from the left table and only joins the matching records from the right table. The base of your conversion (the denominator) usually comes from the left table, because you need to count all records from it.

I hope that was useful! Ping me if you have more questions.