SQL JOINing a Table to Itself
Consider that tried-and-tested example: employees and managers. Here’s the staff
table from the database (today’s imaginary data isn’t particularly imaginative, sorry):
mysql> select * from staff; +----+------------+-----------+------------+ | id | first_name | last_name | manager_id | +----+------------+-----------+------------+ | 1 | Hattie | Hopkins | 4 | | 2 | Henry | Hopkins | 4 | | 3 | Harry | Hopkins | 5 | | 4 | Helen | Hopkins | NULL | | 5 | Heidi | Hopkins | 4 | | 6 | Hazel | Hopkins | 1 | +----+------------+-----------+------------+ 6 rows in set (0.00 sec)
In order to see who reports to whom, we need to query the table to get the employees, and again to get the managers’ names. MySQL will get horribly confused if we use the same table twice, so we alias it each time to indicate which one we mean. I do this by drawing the picture of which tables I need and labelling them – here I’ve used “e” for employees and “m” for managers:
mysql> select e.first_name as employee, m.first_name as reports_to -> from staff e -> inner join staff m on e.manager_id = m.id; +----------+------------+ | employee | reports_to | +----------+------------+ | Hattie | Helen | | Henry | Helen | | Harry | Heidi | | Heidi | Helen | | Hazel | Hattie | +----------+------------+ 5 rows in set (0.02 sec)
Did you spot that there’s one row less in this output? That’s because we did an inner join, but Helen has no manager since she owns the company, so she doesn’t appear in the second set of results. We could easily have left joined instead to include her if we wanted to.
You can use this technique regardless of how many other tables are needed for the query and whether you need to join the table to itself or onto other places. You might have a table of users, and a table of user_friends, linking one user to another. You can start from the users table, join to user_friends, then join to user again with a different alias to pull back the information you need – the possiblities are almost endless.
Thanks!
This particular example is really a hierarchy, and does not lend itself well to unknown depths. What if the company grows, and you need to add another layer of management? You’d have to go in and change all your queries that access the hierarchy.
I’ve been struggling with this problem for goat breeding. In this case, you have two hierarchies — one for each animal’s sire (father) and one for the dam (mother). I have found no practical solution in SQL. Although my dataset is small enough that performing multiple queries using recursion in a procedural language like PHP is possible, it grates on my aesthetic sense.
One possible solution I’ve been playing with is the OQGRAPH storage engine, which can be used as a plugin for MariaDB or MySQL 5.5. It isn’t really a storage engine per se, but rather, a computational engine, specialized for graph tracing.
In operation, OQGRAPH has exactly one allowable schema, which can be used any number of times (with different table names, of course):
[code]
CREATE TABLE db.tblname (
latch SMALLINT UNSIGNED NULL,
origid BIGINT UNSIGNED NULL,
destid BIGINT UNSIGNED NULL,
weight DOUBLE NULL,
seq BIGINT UNSIGNED NULL,
linkid BIGINT UNSIGNED NULL,
KEY (latch, origid, destid) USING HASH,
KEY (latch, destid, origid) USING HASH
) ENGINE=OQGRAPH;
[/code]
In graph theory terms, each OQGRAPH row describes an edge in a directed graph, with “origid” and “destid” being the primary keys of connected nodes. You populate this schema using INSERT… SELECT on the table with the self-join relationship, then query the OQGRAPH table by joining it back to your original table.
They have an example based on the Tree Of Life, where you can (for example), show the taxonomic path between humans and a banana. It seems to be lightning-fast, with no arbitrary depth-of-query limit. But I have not been able to bend OQGraph to my will in solving my rather simple goat breeding problem.
LornaJane, have you any experience with OQGRAPH for such things?
Jan Steinman: have a look at nested sets – http://en.wikipedia.org/wiki/Nested_set_model
By numbering the left & right of every node in the hierarchy, to get the descendants of any node is a simple case of using BETWEEN left AND right.
Jan —
Have you looked into Nested Sets at all? I’m not sure if it’s appropriate for your use case or not, but they’re good for representing hierarchical data.
kabel
It sounds like something you could probably do in PostgreSQL with CTEs (common table expressions). I’m not 100% sure, but look into it.
thank you…
Thank you very much. you save my day! :)
This is very good post! Jan Steinman there are many databases that do not have hierarchies
very helpful.
Nested query in a select statement, can it be done in MySql?
I have a table that shows a list of items ordered, however the item name is stored in column fp1, fp2 …etc. Column Fp1q, fp2q….etc stores the quantilty.
What i want to do is use item name (Apples) in place of the column name i.e. fp1q
This does not work:
Select fp1q as (SELECT fp1 from gift_shop.orders) , fp2q, fp3q
FROM gift_shop.orders
Any help would be very much appreciated.
Fp1 is always the same value, but I don’t want to hard code it in case it changes.
It’s not possible and it doesn’t make much sense. How can you have a result set where one of the columns has a different column name in each row? Better just to select from the orders table and handle the rest in your application, in my opinion
This was incredibly helpful and you’re absolutely right that it seems so obvious now that I’ve used. Thanks a ton for writing it up. It saved me a ton of time on a WordPress project.
lornajane, this was quite helpful. Thanks kindly for taking the time to write it up in detailed manner and sharing
Nice Post, Thanks!
thank you it’s so very helpful
It’s almost 5 years from when this post published and I’ll would like just to make sure that you know that it still helps people around :) !
thanx a lot , very helpful ^_^
Fantastically useful and simple post, thank you so much just saved me hours (and possibly my degree!)
thank you for the clear explanation !