Wednesday, May 2, 2007

Development (not news per se)

I'm getting my ass kicked by Aufero today.

Aufero has a database which contains all the data gathered over time, and due to it taking quite some time to build a somewhat large dataset (which is what you want) the database cannot be rebuilt from scratch every release. This means that anyone can pick up any version of Aufero at any time and take for granted that their database will always work with that version, unless you downgrade. So, there had to be some functionality in Aufero that upgraded the database without any action from the user.

Sounds simple enough? Yeah, I thought so too. The other day I was at the point where I had to rewrite some sections of the code and I was realizing that in order to keep things tidy I had to rebuild large parts of the database.

The database provider I am using does not support renaming or deleting of columns, which you can work around by doing the following CPU-consuming operation:
1. Create *temporary* table
2. Copy original data over to the temporary table
3. Delete original table
4. Create new table with the final layout (eg, the renamed column)
5. Copy temporary data back to new table
6. Delete temporary table

[You can argue about whether to use a temporary table or using "alter table" instead of the steps 4/5. Not much difference in execution time for me. But knock yourself out.]

At the end of the day, it was quite depressing to have to go through all that just to rename a column. And this was in six places affecting virtually all data.

Throw in the fact that there can be quite a few threads in Aufero that works independently of each other on the same data. This was something I had never thought of as a problem prior to doing time consuming operations like the one above. Right after setup was finished there would only be one thread working on the database and whatever got to the database first would start patching happily and be done in a few milliseconds. Now that the patch took up to 2-3 minutes I noticed that I had several threads patching the same database at the same time.

The dataset this happened to is some 150 megabytes and let's just say that it created a bit of a mess.

The lesson here? If you end up writing something like this, think about it first -- don't just throw something together because you *have* to deploy a patch right now. And use transactions which will roll-back if something fails. I was stupid for not having done that from day one, to be honest.

Well, at least this has taught me a thing or two about deploying software to end-users. It's much easier to deploy software for some local server in the organization, which is something I've been doing a lot more frequently. This is fairly obvious thing to say, but I had never really given it any thought except for the fact that you need to worry about more configurations and creating a real installer. Alas. It didn't stop there.

But, others are having a worse day: Tech Magazine Loses June Issue, No Backup

Until next time... :)

No comments: