Indexes on Tables

Increasingly I find a very binary split between the professionals I come into contact with. One group of people are very database-aware and take the design of their storage quite seriously - with good results. The other group are more concerned with the functionality of their application, and have little regard for how it is stored other than considering it a keeping-place and making useful table and column names.

Too often though, they don't think about how that data will be retrieved or what the implications are when it gets beyond the thousand records that were used for testing. This is where having an idea of how the data will be retrieved can really help application performance. (note: this article is aimed at users of traditional relational databases, and ignores all other possibilities). This post takes a look at the various index types and when to use them. Continue reading

3 Top Tips for Database Naming

Perhaps this is more of a rant than a post but I do keep running into issues with databases with names that are inconsistent - which makes them really difficult to work with. When designing a database, there are a few points to consider:

Singular and Plural

This goes for table names, and also for the names of join tables. If you call your tables "user" and "group" then you probably want your linking tables to be "user_group". If you go for plurals (my personal favourite) then be consistent over whether the linking tables are called "user_groups" or "users_groups".

ID Columns

I've seen two main variations on the column names for primary keys, one is to call them all simply "id", and the other is to name them after their table name such as "user_id" or "group_id". It doesn't really matter but my recommendation is for the latter - that way, the user_id column in any other table clearly joins on to the user_id column in the users table, making it easy to read and understand.

Case and Capitalisation

Due to my EXtreme DOuble CApitalitis, I prefer everything to be lower case, but the key is consistency, so that it is easy for developers to get used to the patterns in the database setup and to develop against your schema without having to refer back to it all the time.

Consistency is Key

In general, I like database schemas which are predictable and well-laid-out. Although I have my own preferred conventions, I don't mind what is used so long as it is predominantly in step with itself - this makes my life as a developer so much easier! What's your top tip for sane database naming conventions? Leave a comment and let me know!

Simple Database Patching Strategy

One problem that seems to need solving too often is how to keep databases in sync across different platforms, different developers working on a project, and deploying database changes along with code changes. There are lots of ways of approaching this, none of them are really excellent however and personally I tend to err on the side of simple being better. Fewer dependencies means a solution more likely to work on every platform (and no additional complications for the live platform). Usually this means patch files of some kind - here's an outline of my usual approach. For the impatient:

  • add a table for meta data to the database, set a database version parameter to 0
  • export structure and any essential data (lookup tables, etc) into an sql file
  • for any changes, create a numbered patch file e.g. patch-1.sql, including the change and an update statement to bring the version meta data to match the patch file
  • store all of the above in source control
  • for bonus points, create another sql file with some nice sample data

Give the Database Some Version Awareness

Naming is completely changeable but mine looks something like this:

CREATE TABLE `meta_data` (entry varchar(255) primary key, value varchar(255));

INSERT INTO `meta_data` SET entry="version", value="0";

This new row will hold information about what patch version the database is up to. Every patch that is created will update this number so that it is possible to tell which patches have and have not been applied.

Create an Initial Data Structure

Firstly create a database directory for all these files to live in. This should be outside your web root but inside your source control project.

Take your database and dump just the structure of the tables using the --no-data switch to mysqldump like this:

mysqldump -u  -p  --no-data > create.sql

You will also want to populate tables which hold things like lookup values, country lists, that sort of thing, so that these are set up. People starting to develop with this project, or if the application needs to be deployed to somewhere new, can use this file as a starting point.

Create Patches for Changes

This is where the discipline element comes in - no direct edits on the database are allowed! Instead, write the SQL for your change and place it in the patch file, then run it against the database. If that sounds too much like hard work then copy/paste the SQL you use to make changes, or the SQL generated by whatever SQL tool you use, and place it in the file.

Every file must have its own incrementing version number in its file name, e.g. patch-1.sql, patch-2.sql etc. Within the file the version must also be updated to match, with a statement like:

UPDATE `meta_data` SET value="1" WHERE entry = "version";

Recommended Practice

Here are a few pointers on getting the most out of something like this:

  • Under no circumstances is it acceptable to edit a patch file that has been committed to source control. Someone might have run it already and you'll confuse everyone completely.
  • Having matching undo scripts alongside the patches can be really useful in case a change gets deployed and needs to be undone.
  • Make a script to look up the database settings in the config file of your application, query the database for what version it is at, and run any outstanding scripts. This makes life much easier especially if you have large numbers of scripts (I've seen the patch numbers hit the hundreds)
  • A possible adaptation of this approach is to create patch files for use for updating a database, but to also update the install.sql file to make it correct at any point in time, this means a much shorter and easier setup time for new deployements/developers. The version awareness works in the same way regardless
  • Creating a sample database which creates a few records in each table can really help for development purposes - its quicker for people to get set up and attain a working system that they can make progress with.

I'm sure there are many improvements and variations on this theme of simple database patching, leave a comment and let me know what works for you and why!