Spreadsheet Errors

Last week the magazine The Nation hurried to correct a story that they had suffered the worst drop in advertising of any weekly magazine. Their loss was actually in the middle of the pack, but the story was written from a spreadsheet that overstated their advertising for last year, giving them the appearance of a step decline.

Spreadsheets fascinate me. Lisp proponents point out that “code is data” but in a spreadsheets data is code. Formulas inhabit the cells of the data they represent, a much closer pairing than code files on the same drive as data files.

Spreadsheets are used in institutions of every size, from businesses to government agencies and all the individuals along he way. They were the first killer app and are nearly as ubiquitous as desktop computing. I suspect that most small businesses would be out of business in weeks if they lost the key spreadsheets that get forwarded between office workers and, perhaps, only one employee is allowed to update a master copy of. (This is part of why I’m not surprised that DropBox is doing very well.)

Errors are endemic. I found a nice roundup of research into spreadsheet errors and, even in constrained experimental conditions the cell error rate is above 1%. In a spreadsheet with 100 cells of data, there’s probably at least one error; if there are formulas the rate jumps to 5-15%.

Sometimes this disturbs me, to think of how many errors are creeping along in everything I rely on. How many numbers could I type before I transposed two digits without knowing? Formulas are small, untested programs written in a haphazardly designed language by people who have not studied programming. The design of the software makes mistakes as trivial (for example, click a formula term, change your mind, hit the arrow keys, and you’ve changed the cell the formula is reading data from). It’s astonishing that so many institutions rely on something so unreliable.

But on reflection I think it’s a sign that institutions are more robust than they might appear to first glance. The lights are still on, police still go to the right neighborhoods, products arrive for store shelves. I’ve seen a few attempts to make more rigorous spreadsheets, with data typing, validity checks, embedded programming languages. But I don’t see people lining up to pay that overhead, the benefits they would bring aren’t worthwhile, or at least the losses they would avert are not conspicuously attributable to the lack of those controls.

Thanks for Chrys Wu for the Nation story.