Legacy Bitmask Puzzle «
»


Code: , , ,
1 comment

My friend David had a puzzle in his legacy app. There’s a bitmask called ErrorCode. The table ErrorCodes lists the meaning of each bit:

David needs to look up a complete list of the descriptions of every flag from the database. Normally this would just be a simple select * from ErrorCodes.

However, there’s a wrinkle that adds difficulty to the puzzle. The site was maintained by a DBA named George who didn’t understand bitmasks and degraded the code rather than learn. (George is a psuedonym, it came to mind when I thought about an incompetent steward).

An ErrorCode of 6 indicates both an invalid entity and an invalid address, but that confused George, so now the table actually looks like:

There are 17 error codes, so the table could have 131,072 rows instead of the 17 we want to select. (Though some combinations are mutually exclusive or didn’t happen to occur in production before George departed.) The table is a kludge, a redundant mess, and there’s an unknown amount of code depending on it.

David’s task was to build this query and get out with minimal risk and change, not find and fix all the misguided code that George left behind. This is how legacy code stays legacy: it’s too risky or expensive to fix, but too valuable to delete. In this case, not only would it cost too much developer time to fix all the code, it would be too risky to add a column to the table to mark which errors are the primary errors we want to select out.

(If you want to solve it for yourself, do so now. Here’s how we solved it.)

David’s first idea was to create a user function that uses bit shifting to check to see if only one bit was set, but that didn’t pass his smell test. When IM’d for my help, I realized a less ugly way to do it and responded with:


select * from errorcodes where log(code, 2) == floor(log(code, 2))

Because there’s just 1 bit set, the primary error codes are all powers of two, so the log base 2 of any of them is a whole integer. It means doing math for every row in the table, but that’s one of those small, annoying pains that accumulate when you don’t pay down technical debt.

If you’re curious, David translated this into a LINQ query for his codebase (and accounted for error code 0, “No Error”, an odd exception in its own right), getting:


PpErrors.Where(pe => pe.ErrorId == 0 || Math.Log(pe.ErrorId, 2) == Math.Floor(Math.Log(pe.ErrorId, 2))).OrderBy(pe => pe.ErrorId)

David notes: it’s mostly only “long” because I’m putting it in method notation, instead of query notation, and because I have to reference libraries instead of just having a “floor” and “log” static method in the local namespace. I could write it 12 different ways. You know how that works.


Comments

  1. While I do think the log version is a little bit easier to understand, In my opinion using a Standard Hamming Weight algorithm is better.

    http://stackoverflow.com/questions/109023/best-algorithm-to-count-the-number-of-set-bits-in-a-32-bit-integer

    The article implies this code will be used later, so efficiency is a concern, but it’s written in C#, so perfect efficiency might not be important enough to counteract using bit math.

    One comment on the stack overflow accepted answer sums up my feeling on the matter nicely:

    “It’s write-only code. Just put a comment that you are not meant to understand or maintain this code, just worship the gods that revealed it to mankind. I am not one of them, just a prophet. :)”

Leave a Reply

Your email address will not be published.