Rules of Database App Aging «
»


Code: , , ,
26 comments

I mentioned I’ve learned some rules of how database apps change over time, now that I’ve done a few dozen. They are:

  1. All Fields Become Optional

    As your dataset grows, exceptions creep in. There’s not enough research time to fill in all your company profiles, there’s one guy in Guam when you expected everyone to be in a U.S. state, there’s data missing from the page you’re scraping, you have to pull updates from a new source.

    Every field eventually loses that beautiful NOT NULL sheen, your code gets filled up with guard clauses of one kind or another and every <div> in your template is wrapped by an if statement. And this happens to foreign keys, too, so OR IS (NOT) NULL sneaks in and left joins mutate into outer joins.

    This is by far the biggest effect on apps over time. It’s getting to the point that I can gauge age by eyeballing the number of fields that retain their NOT NULL constraint.

  2. All Relationships Become Many-to-Many

    Some guy works in DC but lives in Virginia, so he needs two Locations. A new type of incoming email needs to be shoveled out to different feeds. A state has both a primary and a caucus. Someone eventually realizes categories never really were mutually exclusive.

    The modern database paradigm is defined by relations, so of course that's what falls apart as soon as you get an app into production. The urge to hack is overwhelming, fudge in a little denormalization or duplicate a row and the pressure's off for now, but it's like freezing a bottle of water, it always grows and breaks worse in the end.

  3. Chatter Always Expands

    All the little oddities that change database schemas affect the user presentation as well. Chatter is the intro and outro text around the content of a page that almost no one ever reads. But it has to be there to explain what's going on, the source of information, why things may seem peculiar, the limitations of the dataset, etc. Add in the difficulty of writing succintly and chatter grows until you burn it all down by rebuilding the app.

And when you do rebuild the app from the ground up, you have your chance to slip some NULLs back on, renormalize your data to have easy one-to-many relationships, and present the data in a self-evident and consistent fashion. Then, about a week later, there's a politician who's a Democrat but running for re-election as an Independent...


Comments

  1. One more: constraints on related data are replaced by multiple, overlapping, non-orthogonal columns. Where once you had a finished boolean flag, you now have three different fields which in various combinations can mean something is finished. Your beautiful sql statements for pulling out all finished records instead turn into enormous requests and code-side filtering.

  2. Good article. Thanks. And the client can’t really afford to pay you for the real time it takes to avoid these problems so you end up doing a lot of it for free.

  3. One more you missed:

    4. Existing fields get “re-purposed”

    So the Notes column suddenly has a bunch of State Codes in it, “Select distinct Gender from SiteUser” returns seven rows, and some clever junior dev notices that the TripID column in the Image table doesn’t get used in the case of Outfitters, so really we can just store OutfitterIDs in that column too. Of course we’ll need to drop that foreign key constraint…

  4. Another one:
    – enumerations in applications often become user definable, usually resulting in a mess

  5. We try as best as we can to continually refactor our code and database in step with feature requests. I’ve seen far too many projects where that doesn’t happen at an appropriate pace to changes in an application and virtually every point you listed happens to some extent or another and more.

  6. Relations are not relationships. Relations is the logic/math geekspeak for what SQL calls tables (though tables aren’t _quite_ the same as proper relational relations).

    The “relational” in relational databases has nothing to do with relationships, rather it’s called relational because the data is stored in relations (tables).

  7. This sounds more like database aging when your original schema was poorly planned.

  8. Pingback: Pinderkent
  9. Jonathan: Yes and no, sometimes the information for great planning doesn’t exist or isn’t available. The best plan is then to fit the circumstances as you know them and change when they change, which is what’s described above. It only looks like poor planning in hindsight.

  10. Then, about a week later, there’s a politician who’s a Democrat but running for re-election as an Independent…

    Ahh, Lieberman. I witnessed that blow up the Library of Congress’ data set. He insisted on being called an “Independent Democrat” but party was only 1 char, so they had to use “J”. Of course, they’re running up against their schema-imposed hard limit of 26 parties, and a lot of guard clauses will have to be concocted when that happens.

  11. You young folk and your text-based HTML templates! You would never get away with an unmatched <code> tag if you were using Genshi or some other XML-based templating system… :-)

    Gay marriage: the database engineering perspective illustrates these trends well. I would add one other change I have occasionally made, which is replacing a boolean (bit) field with a nullable timestamp.

Leave a Reply

Your email address will not be published.