MySQL 5.7 Introduces a JSON Data Type

There’s a new JSON data type available in MySQL 5.7 that I’ve been playing with. I wanted to share some examples of when it’s useful to have JSON data in your MySQL database and how to work with the new data types (not least so I can refer back to them later!)

MySQL isn’t the first database to offer JSON storage; the document databases (such as MongoDB, CouchDB) work on a JSON or JSON-ish basis by design, and other platforms including PostgreSQL, Oracle and SQL Server also have varying degress of JSON support. With such wide adoption as MySQL has, the JSON features are now reaching a new tribe of developers.

Why JSON Is Awesome

Traditional database structures have us design table-shaped ways of storing all our data. As long as all your records (or “rows”) are the same shape and have the same sorts of data in approximately the same quantities, this works brilliantly well. There are some common problems that aren’t a good fit, however. How about freeform website content? Or sparsely populated sets of attributes for each row?

A classic example is an online shop which sells a variety of products. A t-shirt is sold by size or colour, but handbags don’t have a size and trousers also have length. We end up with either a very, very wide table which is mostly empty since most attributes don’t apply to most products, or we use a solution like the Entity-Attribute-Value (EAV) pattern, which can be cumbersome to work with.

Enter JSON data: a way of storing nested data with all the required information, and no raft of blank fields. In the web and mobile worlds, JSON is widely used and loved so it makes sense to use it at the database level also. MySQL’s new features allow us to do just that, but also gives us access to query on the values nested inside that JSON data.

An Example Data Set

For these examples, I’m working with a sample data set that I created from JSON Generator to give me something to play with. If you want to try it out too, you can find both my table definition and the data from it in this gist.

Storing JSON Data

To store JSON, we’ll need to use the new JSON data type. To add the tags field to my table, I used the following SQL:

ALTER TABLE people ADD COLUMN (tags json);

Once the column is there, you can insert a JSON string as the value of the field and it will be stored appropriately. When we select a JSON field from the table, we’ll see the value as the string representation of JSON (as shown in the example below) – so far, it just looks like a standard text field, but it does have superpowers!

SELECT name, tags FROM people LIMIT 5;

This gives me the name and (Lorem Ipsum inspired) tags data for the first five people in the table:

| name           | tags                                                                                       |
| Howard Ortega  | ["officia", "et", "anim", "dolore", "ut", "duis", "quis"]                                  |
| Miller Gamble  | ["et", "officia", "culpa", "excepteur", "ullamco", "exercitation", "in"]                   |
| Harriett Leon  | ["laboris", "consectetur", "mollit", "dolore", "aute", "consectetur", "adipisicing"]       |
| Claudia Durham | ["ad", "veniam", "sunt", "eiusmod", "pariatur", "veniam", "reprehenderit"]                 |
| Cox Huff       | ["reprehenderit", "Lorem", "adipisicing", "ipsum", "cupidatat", "deserunt", "consectetur"] |

So the JSON data is alive and well, what else can we do with it?

Work Effectively with JSON Arrays

The tags field has a simple JSON array in it – a list of values without keys. MySQL knows how to work with the data so we can just ask it to add or remove values as we wish. To add a value, we can use the JSON_ARRAY_APPEND() function. Here’s an example of adding the “Lorem” tag to the first person in the table:

UPDATE people SET tags = JSON_ARRAY_APPEND(tags, "$", "Lorem") WHERE id = 0;

The three arguments to JSON_ARRAY_APPEND() are:

  • The column to append to
  • The path within the column, using the JSON Path Syntax
  • The value to append

That second argument can be quite confusing, the single “$” sign here just means “at the top level”. If we run this query, then the dataset from before now looks like this, with the extra “Lorem” entry for the “Howard Ortega” record.

| name           | tags                                                                                       |
| Howard Ortega  | ["officia", "et", "anim", "dolore", "ut", "duis", "quis", "Lorem"]                         |
| Miller Gamble  | ["et", "officia", "culpa", "excepteur", "ullamco", "exercitation", "in"]                   |
| Harriett Leon  | ["laboris", "consectetur", "mollit", "dolore", "aute", "consectetur", "adipisicing"]       |
| Claudia Durham | ["ad", "veniam", "sunt", "eiusmod", "pariatur", "veniam", "reprehenderit"]                 |
| Cox Huff       | ["reprehenderit", "Lorem", "adipisicing", "ipsum", "cupidatat", "deserunt", "consectetur"] |

We can also use the data to filter our results, for example if we needed to find the users with the tag “Lorem”, we could use the JSON_SEARCH() function. This one also takes three arguments (not the same three as before, that would be too easy!):

  • The column to search
  • Either 'one' or 'all', depending on whether you want MySQL to just return the first match it finds, or return all matches
  • The value we’re looking for

The JSON_SEARCH() function actually returns the path of where it found the value – in this case we don’t need that information since we only care if there was a value found or not. We can search for people who are tagged “Lorem” using this query:

SELECT name, tags from people WHERE JSON_SEARCH(tags, 'one', 'Lorem') IS NOT NULL;

There are only 9 people in my whole data set with this tag (including Howard Ortega of course as we modified him ourselves):

| name               | tags                                                                                       |
| Howard Ortega      | ["officia", "et", "anim", "dolore", "ut", "duis", "quis", "Lorem"]                         |
| Cox Huff           | ["reprehenderit", "Lorem", "adipisicing", "ipsum", "cupidatat", "deserunt", "consectetur"] |
| Fitzpatrick Hinton | ["esse", "nostrud", "proident", "laborum", "Lorem", "minim", "sit"]                        |
| Austin Yates       | ["id", "ullamco", "Lorem", "aliquip", "aute", "proident", "laboris"]                       |
| Beasley Mccarty    | ["ullamco", "officia", "minim", "Lorem", "laboris", "culpa", "et"]                         |
| Hamilton Zamora    | ["irure", "sit", "reprehenderit", "anim", "deserunt", "Lorem", "consequat"]                |
| Norton Russo       | ["quis", "Lorem", "enim", "sunt", "proident", "labore", "ea"]                              |
| Savannah Hunter    | ["officia", "non", "Lorem", "officia", "mollit", "ad", "enim"]                             |
| Betty Webb         | ["non", "enim", "et", "cupidatat", "Lorem", "ut", "fugiat"]                                |

Having functionality like this available gives us the ability to store document data with our traditional relational database records, and still be able to use the data within those fields for finding and filtering data. We can also perform updates on the data very easily. This example, of an undefined number of arbitrary tags being applied to each record, is a pretty common pattern and hopefully this example gives you insight into how the implementation could look in the newer versions of MySQL.

We already mentioned that any kind of JSON can be stored so let’s take a look at more complicated data in the next section.

Associative Data in JSON Columns

In addition to the simple array shown above, this table also has a profile column that holds a variety of information about each user. The same fields aren’t present every time and there are various data types, each with a named key. Let’s look at the data in the first few rows:

SELECT name, profile FROM people LIMIT 5;

And the result of the query:

| name           | profile                                                                                         |
| Howard Ortega  | {"email": "[email protected]", "salary": 52000, "twitter": "@estvelit", "direct_reports": 7}     |
| Miller Gamble  | {"salary": 52000, "join_date": "2013-05-13T10:02:22 -01:00"}                                    |
| Harriett Leon  | {"email": "[email protected]", "driver": true, "salary": 63000, "twitter": "@commodoproident"} |
| Claudia Durham | {"email": "[email protected]", "salary": 50000, "vegetarian": false, "direct_reports": 12}      |
| Cox Huff       | {"salary": 61000, "twitter": "@mollitconsequat"}                                                |

Again, MySQL has some great features built in for us to work easily with the elements inside the JSON data. The JSON_SET() function will update an existing value, or insert it if it doesn’t exist. For example, if we use JSON_SET to give the user called “Claudia Durham” (with id = 3) a pay rise and also a field called first_aid on her record, we use the same syntax each time. The existing salary field will update but the first_aid is a new entry in the JSON structure.

Here are the queries to use to achieve this:

UPDATE people SET profile = JSON_SET(profile, "$.salary", 52000) WHERE id = 3;
UPDATE people SET profile = JSON_SET(profile, "$.first_aid", true) WHERE id = 3;

Once we’ve made these changes, you can inspect the state of the dataset again:

| name           | profile                                                                                                       |
| Howard Ortega  | {"email": "[email protected]", "salary": 52000, "twitter": "@estvelit", "direct_reports": 7}                   |
| Miller Gamble  | {"salary": 52000, "join_date": "2013-05-13T10:02:22 -01:00"}                                                  |
| Harriett Leon  | {"email": "[email protected]", "driver": true, "salary": 63000, "twitter": "@commodoproident"}               |
| Claudia Durham | {"email": "[email protected]", "salary": 52000, "first_aid": true, "vegetarian": false, "direct_reports": 12} |
| Cox Huff       | {"salary": 61000, "twitter": "@mollitconsequat"}                                                              |

The JSON_SET() function takes care of the inserting and updating operations, but what about removing unwanted entries? To remove entries from within the JSON data structure, use the JSON_REMOVE() function, it takes the same first two arguments as JSON_SET() does: the column containing the json, and the path.

Once the data is correct, we can use the data inside the JSON field in our queries as we wish. As an example, let’s query the table for each user’s salary, and filter by only those people who have 10 or more direct reports. This example uses the column->path syntax, which is equivalent to the JSON_EXTRACT() function, simply accessing the data at that path from the JSON document held in that column.

SELECT name, profile->"$.direct_reports" reports, profile->"$.salary" salary FROM people WHERE profile->"$.direct_reports" >= 10;

The query shows both accessing fields from within the dataset as fields in our query (note that I’ve aliased these to give rather more humane column headings), and the use of a numeric field in the where clause to filter our results. And the dataset? Here it is:

| name              | reports | salary |
| Claudia Durham    | 12      | 52000  |
| Schwartz Bowers   | 10      | 66000  |
| Carrillo Michael  | 10      | NULL   |
| Miriam Mcgowan    | 10      | 39000  |
| Austin Yates      | 12      | 59000  |
| Beasley Mccarty   | 12      | 45000  |
| Norton Russo      | 12      | 70000  |
| Mccullough Patton | 12      | 46000  |

Using the MySQL JSON features, we can bring the document-style data into our existing database work very easily.

MySQL, JSON and the Future

This introduction has given you a taste of what can be done in the newest MySQL editions with JSON. JSON is pretty familiar, and crucially it’s very approachable, every programming language I know has native handling for the format – so to bring the two together is a very powerful tool for developers.

Recommended Reading

If you want to try these examples for yourself, you’ll need MySQL 5.7 installed. Either use the MySQL Installer, your usual operating system package manager, or there is also a MySQL 5.7 docker image that would make a good quick start.

The MySQL documentation is good, but doesn’t have a lot of realistic examples in it. However, I still found it very helpful to find my way around which features are available – start here on the JSON Function Reference and see where the links lead you.

I am often asked what the performance of the JSON columns is like in MySQL and the answer is: better than I expected! There are also some important techniques to use when working with data from inside the JSON columns. Another feature in the MySQL 5.7 release was the introduction of virtual columns – the ability to have columns made from SQL expressions that are either calculated on the fly or stored (think SQL expression meets materialised view, in a column). It’s possible to index on virtual columns, so by creating a virtual column with a JSON expression and then adding an index, we can improve the performance of queries like these that work with JSON significantly. There’s a good writeup of this approach over on the Percona blog.

5 thoughts on “MySQL 5.7 Introduces a JSON Data Type

  1. Pingback: Community News: Recent posts from PHP Quickfix (04.27.2016) – SourceCode

  2. mmm I don’t know, I don’t like quite much the idea of adding a JSON datatype, I think it will be used by many not because of its intentions, but rather be misused. (Like… A LOT). I mean, it’s good for things like your social media list, but that’s a rather small subset of data. Tags and other stuff for example have to go in a apart table, mainly because it’s easier to keep track of it later on. (Think big).

    I would personally be soooo much happier with an IP, a CIDR, a timezone, or a daterange data type. These are the type of data I actually miss from PostGreSQL because they are actually pretty useful, without me having to call up functions each time I want to try to understand the actual data. Heck, to be able to define your own data structures would be super awesome!


  3. I’ve been using text fields in MySQL to store json encoded associative arrays (sometimes to the degree of a ‘junk drawer’). I use this approach for data that normally wouldn’t be used in queries but would be used in reporting or details of a transaction. This provides a lot of flexibility regarding additional fields of data without having to redefine a scheme every time the application’s storage requirements change (this allows many versions of my code to use the same schemas). As Andraz pointed out this also is great for custom fields (which I use extensively for client projects – where moving targets are a fact of life).

    The ability to have native json support in MySQL means I can now refine the queries that occasionally need to include these fields (I’ve been using special post-select filtering when necessary). I’ve used PostgreSQL’s native json support and had recently migrated two projects to Postgres in order to take advantage of the native json support.

    Another use for this is flexible logging of user requests. Every one of my ajax requests sends data via json and I store every request for auditing purposes. With native json support it will be much easier to report on these requests without having to unwind every text field every time I’m reporting specific information.

    Native json support also lets MySQL (as well as Postgres) play both sides in the SQL vs No SQL game. I don’t have a true need for No SQL very often but I’ll no longer need to compromise when I do now that MySQL supports json.

    Regarding Camilo’s concerns, every datatype can be abused (such as what I’ve done by using text fields as pseudo-json). Not implementing useful features because some may not use them as designed is no reason to invalidate their necessity. Json data is pervasive throughout intranet & internet applications. Native support for json is a welcome change for me (and I assume many others).

  4. Regarding Camilo’s concerns, every datatype can be abused (such as what I’ve done by using text fields as pseudo-json). Not implementing useful features because some may not use them as designed is no reason to invalidate their necessity. Json data is pervasive throughout intranet & internet applications. Native support for json is a welcome change for me (and I assume many others).

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.

This site uses Akismet to reduce spam. Learn how your comment data is processed.