SQL JOINing a Table to Itself

Getting two sets of information from one table in a select statement often leads people to write subselects, but it really doesn’t matter that this is the same table twice, we can just give it a new alias and treat it as if it were a different table. This is one of those techniques where, once you’ve seen it, it’s really obvious, but until that point it can be very confusing. I explained this to someone else recently, so I thought I’d capture it here in case it’s helpful to anyone else.

Consider that tried-and-tested example: employees and managers. Here’s the staff table from the database (today’s imaginary data isn’t particularly imaginative, sorry):

mysql> select * from staff;
| id | first_name | last_name | manager_id |
|  1 | Hattie     | Hopkins   |          4 |
|  2 | Henry      | Hopkins   |          4 |
|  3 | Harry      | Hopkins   |          5 |
|  4 | Helen      | Hopkins   |       NULL |
|  5 | Heidi      | Hopkins   |          4 |
|  6 | Hazel      | Hopkins   |          1 |
6 rows in set (0.00 sec)

In order to see who reports to whom, we need to query the table to get the employees, and again to get the managers’ names. MySQL will get horribly confused if we use the same table twice, so we alias it each time to indicate which one we mean. I do this by drawing the picture of which tables I need and labelling them – here I’ve used “e” for employees and “m” for managers:

mysql> select e.first_name as employee, m.first_name as reports_to
    -> from staff e
    -> inner join staff m on e.manager_id = m.id;
| employee | reports_to |
| Hattie   | Helen      |
| Henry    | Helen      |
| Harry    | Heidi      |
| Heidi    | Helen      |
| Hazel    | Hattie     |
5 rows in set (0.02 sec)

Did you spot that there’s one row less in this output? That’s because we did an inner join, but Helen has no manager since she owns the company, so she doesn’t appear in the second set of results. We could easily have left joined instead to include her if we wanted to.

You can use this technique regardless of how many other tables are needed for the query and whether you need to join the table to itself or onto other places. You might have a table of users, and a table of user_friends, linking one user to another. You can start from the users table, join to user_friends, then join to user again with a different alias to pull back the information you need – the possiblities are almost endless.

19 thoughts on “SQL JOINing a Table to Itself

  1. This particular example is really a hierarchy, and does not lend itself well to unknown depths. What if the company grows, and you need to add another layer of management? You’d have to go in and change all your queries that access the hierarchy.

    I’ve been struggling with this problem for goat breeding. In this case, you have two hierarchies — one for each animal’s sire (father) and one for the dam (mother). I have found no practical solution in SQL. Although my dataset is small enough that performing multiple queries using recursion in a procedural language like PHP is possible, it grates on my aesthetic sense.

    One possible solution I’ve been playing with is the OQGRAPH storage engine, which can be used as a plugin for MariaDB or MySQL 5.5. It isn’t really a storage engine per se, but rather, a computational engine, specialized for graph tracing.

    In operation, OQGRAPH has exactly one allowable schema, which can be used any number of times (with different table names, of course):

    CREATE TABLE db.tblname (
        latch   SMALLINT  UNSIGNED NULL,
        origid  BIGINT    UNSIGNED NULL,
        destid  BIGINT    UNSIGNED NULL,
        weight  DOUBLE    NULL,
        seq     BIGINT    UNSIGNED NULL,
        linkid  BIGINT    UNSIGNED NULL,
        KEY (latch, origid, destid) USING HASH,
        KEY (latch, destid, origid) USING HASH

    In graph theory terms, each OQGRAPH row describes an edge in a directed graph, with “origid” and “destid” being the primary keys of connected nodes. You populate this schema using INSERT… SELECT on the table with the self-join relationship, then query the OQGRAPH table by joining it back to your original table.

    They have an example based on the Tree Of Life, where you can (for example), show the taxonomic path between humans and a banana. It seems to be lightning-fast, with no arbitrary depth-of-query limit. But I have not been able to bend OQGraph to my will in solving my rather simple goat breeding problem.

    LornaJane, have you any experience with OQGRAPH for such things?

  2. Nested query in a select statement, can it be done in MySql?

    I have a table that shows a list of items ordered, however the item name is stored in column fp1, fp2 …etc. Column Fp1q, fp2q….etc stores the quantilty.

    What i want to do is use item name (Apples) in place of the column name i.e. fp1q

    This does not work:
    Select fp1q as (SELECT fp1 from gift_shop.orders) , fp2q, fp3q
    FROM gift_shop.orders
    Any help would be very much appreciated.

    Fp1 is always the same value, but I don’t want to hard code it in case it changes.

    • It’s not possible and it doesn’t make much sense. How can you have a result set where one of the columns has a different column name in each row? Better just to select from the orders table and handle the rest in your application, in my opinion

  3. This was incredibly helpful and you’re absolutely right that it seems so obvious now that I’ve used. Thanks a ton for writing it up. It saved me a ton of time on a WordPress project.

  4. lornajane, this was quite helpful. Thanks kindly for taking the time to write it up in detailed manner and sharing

  5. It’s almost 5 years from when this post published and I’ll would like just to make sure that you know that it still helps people around :) !

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.

This site uses Akismet to reduce spam. Learn how your comment data is processed.