Let's begin by looking at a table called
mysql> desc recipes; +-------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | name | varchar(400) | NO | MUL | NULL | | | description | text | YES | | NULL | | | category_id | int(11) | YES | MUL | NULL | | | chef_id | int(255) | NO | MUL | NULL | | | created | datetime | YES | | NULL | | +-------------+--------------+------+-----+---------+----------------+
If we want to see recipes that have been added this month, we might do something like:
select name, created from recipes where created > '2011-11-01 00:00:00';
This query will work perfectly fine, and on a test data set of perhaps a few thousand rows, you probably won't think it's slow at all! However, if it is called repeatedly, or if the data set becomes very large, you'll start to notice the performance impact of this query. The techniques shown here apply to all select queries, regardless of how big and complicated they are.
We want to know what is taking all the time when we run this query, and to find out the answer, we simply ask MySQL to
explain itself, like this:
EXPLAIN select name, created from recipes where created > '2011-11-01 00:00:00'\G
The elements to notice here are at the beginning and at the end of the line:
EXPLAINat the beginning of the line - this means we'll get information about this query instead of the results of it
- at the end of the line, the semicolon is replaced by
\G- this can be used with any SQL statement and makes a list of column names with values next to them, vertically rather than horizontally. This is useful because the output of EXPLAIN can be hard to read horizontally (and it wraps horribly on this site!)
The output looks like this:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: recipes type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using where
There are a couple of key elements that you want to pick out here. The
table entry tells you which table this relates to - not at all useful in this example but when you run a select statement with multiple tables through explain, you'll get one row per table so this column tells you which table it is. The
possible_keys shows which indexes apply to this query and the
key tells us which of those was actually used - here there are no keys that apply. Finally the
rows entry tell us how many rows MySQL had to look at to find the result set. Five rows isn't very many but in this case, there were five rows in the table (and three in the result of the query), so MySQL has had to check every row in the table, which is a Bad Thing (TM).
To improve the performance of the query, we want to reduce the number of rows MySQL has to examine in order to find our results, and we'll do that by adding an index. Mostly, the EXPLAIN plan shows you which indexes there are, which are used, and how many rows were checked to find the results. The ideal outcome is that the
rows number is the same as the number of results from the query, although this won't always be possible!
I intend to follow up this example with a post about adding good indexes, having written about this for PHP Master and also for some new training courses recently, I'm rapidly realising that if you're starting from nothing, you can do a lot with a little knowledge! What advice would you give to someone starting to look at the area of database performance?