DISTINCT keyword in SQL
The DISTINCT keyword in SQL is a powerful tool used to eliminate duplicate rows from a result set.
The DISTINCT command is particularly useful when dealing with large datasets where there may be numerous instances of the same data. These duplicates could be specific values in columns (like user countries) or entire duplicated records (when using LEFT JOIN, for example).
Understanding how to use DISTINCT effectively can help you maintain the integrity of your data analysis and ensure accurate results.
Syntax
When you use the DISTINCT keyword in a SELECT statement, SQL engine filters out all duplicate rows based on the columns you’ve specified and returns only unique rows.
Here’s an example with one column:
SELECT DISTINCT country
FROM users
This query retrieves a list of unique countries from the users table.
Note that you can also use the
DISTINCT command with brackets like so:
SELECT DISTINCT(country)
FROM users
DISTINCT on multiple columns
You can also use DISTINCT with multiple columns to get unique combinations of values across the specified columns.
SELECT DISTINCT country, age
FROM users
This query returns unique combinations of country and age from the users table.
If there are two users from the same country but with different ages, both rows will appear in the result set.
Using DISTINCT in Data Analysis
DISTINCT can be particularly useful in data analysis in many scenarios:
- Identifying Unique Values. When you need to know the different variations of a particular attribute in your data.
- Data Cleaning. Helps in identifying and removing duplicates, which is crucial for accurate analysis (think business revenue reporting).
- Aggregate Functions: Combine
DISTINCTwith functions likeCOUNTto understand the size of a particular subset of data.
For instance, to count the number of unique countries in the users table:
SELECT COUNT(DISTINCT country)
FROM users
Database compatibility
|
MySQL
|
PostgreSQL
|
SQLite
|
Redshift
|
Big Query
|
Snowflake
|
|
|---|---|---|---|---|---|---|
DISTINCT |
|
|
|
|
|
|
DISTINCT