Taking on a Database Change Process
There are two things you need to come to terms with:
- There is no silver bullet
- Rigid process is required
If you’re still reading, then I assume I haven’t scared you off with the above, so let’s get into the more practical side of things and look at the steps to follow for coherent database versioning …
Hands Off The Database
If you are serious about keeping track of the changes to your database, then you will never make another change to it manually. Never. Not even the test or development databases, nor “to just quickly try” something.
When you want to make a change, make a patch script. Then when you run it against your development database and it works, you know exactly how to replicate the same excellent manoeuvre against any other platform that needs it – including sharing it with other developers in your team. The patches are included in your source control, but ideally not inside your web root (you don’t want to serve these to the outside world!).
Identifying Patches
I wrote a while ago about how to work with database patches, and I’m using basically the same approach on most of my projects today. Each patch file contains: a comment on what this file does, the change(s) to make, and another SQL statement that records that this patch has been run for this database, and when that happened (I had a phase of recording only “latest patch” information but I now prefer a log of what was run and when).
What you name your patch files is up to you – I like sequential numbers but of course they do sometimes collide when branches are merged together so human intervention will be needed in that situation. We are smart and can handle this :)
Some changes can take more than one deploy to enforce – for example I recently deployed a major change to one of my own sites, which changed the database schema enormously and needed me to move data from one table to another. I wrote a patch to do that, and a separate one to drop the old copies of the data. The first patch was deployed with the relevant code changes, but the other patch is currently called “patch-future.sql” – it will be deployed once I’m certain the change is live and I won’t need to roll it back (more on rollbacks in a moment).
Make It Painless
This is really a general piece of advice for deployment and other development process: make your preferred path the easiest choice. This does mean a bit of time automating the processes, but I have yet to regret spending time on these tasks. For database patching we’ll write a script which will look at the database and the patch files, and run any that haven’t already run. Anyone pulling in these changes only needs to run one script, and you can automate that step in your deployment stages.
Do observe this one golden rule: never edit a patch file after it has been pushed!
Tools for Database Patching
There are lots of tools that wrap up what you’ve seen here; but the basics are the same:
- Do not touch the database directly
- Write patch files
- Record what patches are applied
- Automate patching
I’ve used or heard good things about DbDeploy which integrates with Phing very nicely, Rob Allen’s Akrabat_Db_Schema_Manager for ZF, so either of those is worth a look and if you have more recommendations, then please share them in the comments!
Rolling Back
As a final note, I will mention rolling back of database changes; this is a really tricky one and I haven’t got any solutions that I think have been worth the hassle. Personally I like to do lots and lots of incremental changes, deploying as often as humanly possible (often several times each day) so large breakages are less likely. Academic theory says that for every forward patch, you should have an undo one, but practical experience tells me that this makes it harder to finish features and the rollbacks are almost never used. For my own projects (none of them especially high traffic or commercially critical) I have undertaken to run without any rollback scripts required. If the unthinkable happens (and it will, one day!), I will write the script to mend the damage as a new patch and deploy it. How you choose to approach this issue will vary hugely for different scenarios – what do you do?
Have a look at the DCM solution TeamWork http://www.dbmaestro.com/dbmaestro/448.aspx
– allows you to both enforce change policy and best practices and execute merge and deploy automation – with quick and easy rollback options when needed.
> For my own projects (none of them especially high traffic or commercially critical) I have undertaken to run without any rollback scripts required.
I’ve been using Phing and DbDeploy for over a year now, and my patches always stop at –//@UNDO. Although theory dictates you should be able to “go back in time”, this does create more of a hassle than it is worth. I have one giant “roll-back” strategy though: I dump the database in a backup file before executing a patch, so if it does destroy everything, I can always recover from backup. If I made an error in a patch, I create a new patch rectifying that error and deploy that. Effectively, that’s a really good roll-back strategy.
I use Phing and DbDeploy and I actually do build the rollback. That said I also have my entire database backed up each night so I can easily pull down last night’s and start using it as my dev database with just a quick phing command to bring it up to date with my development branch.
Have you ever needed to rollback? If so, what exactly is the difference between rolling back and restoring a dump? I imagine the outcome is the same, yes?
On the one occasion I messed up the database update (no where clause …) and needed to roll back, I got to learn about MySQL’s binary log files. With a bit of messing, you can restore a known date old dump of your table, then run the bin log file data forwards (on just that table) to the last thing before you broke it all. The key requirement is that you have a database dump which is newer than the old end of your binary logs (assuming you have some mechanism for managing server binlogs, and access to them).