Part 5. SQL & Analysis Fundamental Concepts
- Norma López-Sancho
- Apr 13, 2024
- 6 min read
Updated: May 22, 2024
This will be the last part of the SQL & Analysis Fundamental Concepts series and is an introduction to JOINS
So let’s have fun playing with our 3 mini tables, employees, departments & employee_department that we created on the previous guides.
But first a brief explanation of the most common types of JOINS and their basic use (they have few more uses):
1. INNER JOIN
An inner join returns only the rows where there is a match between corresponding columns in both tables. This is, the intersection (or should we say, the INNERsection.. (such a bad joke!).
It combines rows from both tables based on the specified
join condition.
Example: If you have a table of employees and a table of departments, an inner join would return only the rows where an employee belongs to a department
So if there’s an employee that is not in the department table, will be, sadly, left behind.
2. LEFT (OUTER) JOIN
A left join returns all rows from the left table and the matched rows from the right table.
If there's no match, NULL values are returned for the columns from the right table.
Example: A left join would return all employees, along with their corresponding department if they belong to one.
If an employee doesn't belong to any department, the department-related columns will have NULL values (at least they are not left behind!)
3. RIGHT (OUTER) JOIN
Same principle than left join but the other way around.
Returns all rows from the right table and the matched rows from the left table.
If there's no match, NULL values are returned for the
columns from the left table.
Example: Would return all departments, along with their corresponding employees if they have any.
If a department doesn't have any employees, the employee-related columns will have NULL values.
4. FULL (OUTER) JOIN
This blue yellow gradient colour mix; perfect visual representation!
This returns all rows from both tables.
If there's no match, NULL values are filled in for columns from the table with no matching row.
Example: Would return all employees and all departments, matching them where possible.
If an employee doesn't belong to any department or if a department has no employees, NULL values will be filled in.
5. CROSS JOIN
A cross join returns the Cartesian product of rows from both tables.
Meaning it joins every row from the first table with every row from the second table. It doesn't require any join condition.
Example: If you have a table of colours and a table of shapes, a cross join would return all possible combinations of colours and shapes.
And now, let’s play.
Using JOIN. Practical scenarios
1. INNER JOIN
Bringing the employees table and add the department_id from our index table:
SELECT
e.employee_id
e.first_name,
e.last_name,
e.email,
e.hire_date,
ed.department_id
FROM
employees e
INNER JOIN
employee_department ed
ON
e.employee_id = ed.employee_id;
In the FROM we are creating an alias (e) for our table employees (aka giving it a temporal name)
We are doing the same for employee_department (ed)
These aliases aids on writing a simpler queries and makes them more readable (imagine if every time we have used e or ed we need to write the full name of the table.. phew!)
We decided to bring only these columns, because we don’t want the timestamps, you could always do SELECT * and bring all the ammo!
In the SELECT clause, we are specifying some columns we want from e (=employees table) and from ed (=employee_department index table)
INNER JOIN specifies which column we want to use to join it with our employees table,
ON specifies which is the matching parameter (same concept than Vlookup or Index Match in Excel); You always need a matching column that is reflected in both tables, in this case is our Unique Identifier (ID).
Now, this does not tell us much, right? Given that the index table only contains the ID of the department and not the name.
Let’s complicate the query a bit further:
SELECT
e.employee id,
e.first_name,
e.last_name, e.email,
e.hire_date,
d.department_name
FROM
employees e
INNER JOIN
employee_department ed
ON
e.employee_id = ed.employee_id
INNER JOIN
department d
ON
ed.department_id = d.department_id;
Well, now we are talking!
In here we first join our employees table with the index table so we can link each employee to their respective department.
Then we join the result with the department table based on the department_id that we brought in from employee_department table and is what allows us to include the department name in the final result (which was included on the SELECT as d.department_name)
Given that we are doing an INNER JOIN it has only brought the department names of those who have employees in it
In my employee_department junction table, I have the ID of all employees of my table, which are all the ones you see on the previous image.
What would it happen if we delete couple of records on my index table?
DELETE FROM employee_department
WHERE employee_id IN (7,8);
Let’s see the result now:
As you can see now the employees with ID 7 and 8 are no longer appearing in our result.
This is because there’s no match in our index table, therefore it excludes them
2. LEFT JOIN
Let’s continue with the same example and see what would happen now if we use the LEFT JOIN instead:
SELECT
e.employee_id,
e.first_name,
e.last_name,
e.email,
e.hire_date,
d.department_name
FROM
employees e
LEFT JOIN
employee_department ed
ON
e.employee_id = ed.employee_id
LEFT JOIN
department d
ON
ed.department_id = d.department_id;
There we go, we got them back!
In this instance we don’t lose the information of employee but given that they are not related to any department, we get a NULL value in department name for our IDs 7 & 8
3. RIGHT JOIN
Let’s have one more go with the same example
SELECT
e.employee_id,
e.first_name,
e.last_name,
e.email,
e.hire_date,
d.department_name
FROM
employees e
RIGHT JOIN
employee_department ed
ON
e.employee_id = ed.employee_id
RIGHT JOIN
department d
ON
ed.department_id = d.department_id;
Ekk! We definitely have a lot of departments without employees!
Being serious, may not look pretty, but this way we make sure we get the full information about all our departments.
But wait a minute? Where are our IDs 7 & 8?
We could do a FULL (OUTER) JOIN but MySQL does not support it..
Instead we can do a combination of LEFT and RIGHT with a tiny addition: UNION
Buckle up, this is a long one!
SELECT
e.employee_id,
e.first_name,
e.last_name,
e.email,
e.hire_date,
d.department_name
FROM
employees e
LEFT JOIN
employee_department ed
ON
e.employee_id = ed.employee_id
LEFT JOIN
department d
ON
ed.department_id = d.department_id
UNION
SELECT
e.employee_id,
e.first_name,
e.last_name,
e.email,
e.hire_date,
d.department_name
FROM
employees e
RIGHT JOIN
employee_department ed
ON
e.employee_id = ed.employee_id
RIGHT JOIN
department d
ON
ed.department_id = d.department_id;
And here we have, a perfectly simulated FULL JOIN.
Basically we are combining both statements we did before so:
LEFT JOIN (x2) To bring the whole LEFT table (with the selected columns, of course) and use index table as match between employees and department tables.
Where there’s a match with the index table, will fill the department name
Where there’s no match with the index table, will set as NULL
RIGHT JOIN (x2) To bring the whole RIGHT table (again, with the selected columns only), using index table as link between employees and department tables.
Where there’s a match on my index table, it will fill the employee
Where there’s no match on my index table, will set as NULL
And between these two sets of double JOINS, we UNION the results of both the LEFT and the RIGHT.
Resulting basically, on smashing both tables together, using the index as the link between both main tables.
4. CROSS JOIN
We could use our employees and departments to scramble them, but that would be a bit of a weird example!
Let’s create 2 mini tables with shapes and colours better:
CREATE TABLE colours (
colour_id INT AUTO_INCREMENT PRIMARY KEY,
colour_name VARCHAR(255) NOT NULL
);
INSERT INTO colours (colour_name) VALUES ('blue');
INSERT INTO colours (colour_name) VALUES ('red');
INSERT INTO colours (colour_name) VALUES ('yellow');
CREATE TABLE shapes (
shape_id INT AUTO_INCREMENT PRIMARY KEY,
shape_name VARCHAR(255) NOT NULL
);
INSERT INTO shapes (shape_name) VALUES ('square');
INSERT INTO shapes (shape_name) VALUES ('triangle');
INSERT INTO shapes (shape_name) VALUES (‘circle’);
And now the CROSS JOIN:
SELECT
c.colour_id,
c.colour_name,
s.shape_id,
s.shape_name
FROM
colours c
CROSS JOIN
shapes s;
There we have, all possible combinations!
By default has ordered it by the shape_id because is the table we have specified in the CROSS JOIN you can always do it the other way around (FROM shapes s CROSS JOIN colours c):
Of course we could add a ORDER BY c.colour_id instead but why adding an extra operation that the server needs to perform, less is more!
Though.. given that in our first image you can see it has ordered by shape_id but colour_id came up as descendant, we could indeed sort that out with an ORDER BY:
SELECT
c.colour_id,
c.colour_name,
s.shape_id,
s.shape_name
FROM
colours c
CROSS JOIN
shapes s
ORDER BY
s.shape_id, c.colour_id ASC;
So if this is what we need, then is worth the extra command.
On the workbench you can click the column names on your result grid to sort them, but where’s the fun on that!
And with this we wrap up our series on basic statements surrounded by an intermediate knowledge bringing the analyst mind into them that hopefully has helped on getting that explorer hat and a bit of a more rounded picture than regular courses provide.
Hope you enjoyed the little traps as well!
Happy coding!
Great tutorial on the JOIN operations. Very glad that you included CROSS JOIN! It's a lesser-known type of JOIN that people should know.