Skyscrapers and Doghouses

Jorg Breu I Illustration to "Fortuantus" Plate VIII, p154,

If you want to build a skyscraper, you need math, physics, engineering, prototypes, studies: in short, you need rigorously applied formal methods. Mistakes can waste millions of dollars or kill people.

If you want to build a doghouse, you need some lumber, a hammer, and a beer.

Generally, software developers act like our problems are towards the doghouse end of the spectrum. It looks like developers in the aeronautics and space industries understand they have skyscraper problems and build accordingly. But in general business and consumer software, even in fields with lives on the line like medicine and automobiles, we use slipshod practices to produce unreliable, unmaintainable software slowly.

We’ve built software as loose bags of features atop a teetering tower of unreliable abstractions. Functionality is slow, regressions are common, and that’s even before you throw in the now-standard business practice of shipping as part of figuring out what’s worth building and how to build it. Maintenance often means burning it down and rebuilding from scratch.

Enterprise software development has a lot of formality around process for organizing software, but little around the methods of writing it. Inside the culture it seems like the only rational way to work, but from the outside it’s a tremendous amount of added process for marginal improvement in quality. I think this has the rest of us shying away from anything that smells formal as “ceremony”.

“There’s no engineering without dollars in the equation”

I fell in love with this quote from a coworker’s professor. The harder I work to understand and build complex software systems, the better I understand the larger systems of businesses and communities that we write our software for. Technical decisions are infrequently wrong when they’re business decisions. I’ve seen a number of startups make the correct decision to write a lot of terrible code when they have no idea if their basic business will work.

When I make technical decisions I try to think about return on investment (ROI). Weighing the options, what are the risks and the potential rewards? We have colorful aphorisms like “penny-wise and pound-foolish”, “picking up pennies in front of a steamroller”, and “boiling frogs” to refer to common mistakes in ROI calculations.

That last one has really captured my attention. It’s a hyperbolic way of describing the way changes creep up on us and incremental costs are ignored as our definition of normality drifts. I see a lot of relevance to software quality, especially when we pay costs in time rather than directly in dollars.

That happens sometimes. Software’s always late. Try reloading. Give it more time. The update broke it. It worked after I rebooted a few times. Printers are always crappy. It works on my machine.

Software developers are boiled frogs. I’m not surprised we’re comfortable with low quality: it’s hard to say you could do better but won’t, especially when you see low quality reflected back in other people’s work.


I remember the third time I tried writing unit tests, when I finally got it, when I had enough basic techniques and tools to write tests that prevented bugs from slipping into production. I didn’t feel proud, confident, professional. I felt relieved. Finally, I could write code without being afraid I was breaking things. I could stop fearing that I’d forgotten two pieces of code were related, or to try some corner case, or let a typo slip in at the last minute.

The return I get from the investment in testing hasn’t only been good feelings, of course. As I got better I let fewer bugs into production, I finish faster, the code makes more sense and is more easily changed. Most developers still don’t agree testing is worthwhile, so it’s not a standard technique of development.

Tests are too hard. Tests won’t catch everything anyways. You’re just doubling the work. I tested it by hand. The other system is unreliable anyways. You can’t really write a test for graphics.

If you read these statements generously, almost all of them address ROI and claim that the cost outweighs the likely returns. A debate on that shared metric can be productive, where weigh the evidence of better code or measure how much extra time it takes to write tested code or do studies on how many bugs show up in the modules. Too often we talk past each other and polarize into mutually incomprehensible camps. (A nice rule of thumb: be able to summarize your opponent’s argument in a way they’d agree with.)

Tests are overwhelmingly popular in the Ruby world I’ve been working in for years. They’re accepted as having a valuable ROI. They’re culturally expected. This isn’t true in other languages, even similar dynamic languages like Python and JavaScript. I think that’s where that word “culture” is important.

I was convinced of the value of tests by my experience in the Ruby community. As a hobbyist and consultant in dozens of environments I saw tests return my investment over and over. But outside of a niche that expects you test before you’re convinced? We don’t have much high-quality empirical research in software development; generally studies have tiny samples of inexperienced developers with so many variables that it’s impossible to find meaning in the noise. There’s no compelling public evidence that tests are valuable.

A lot of the developers who popularized Ruby in English were strong advocates for testing, making it the community default. Python didn’t have those influencers and JavaScript had a unique path to incredible popularity that skipped the usual formation of a culture, so now neither community sees the value of tests like Ruby. (Though both have growing subcultures that do, much luck to them.)

Without excellent empirical evidence the decision of whether to write automated tests is an open question, settled by personal anecdata and cultural influence. I see unquestionable value from tests, but outside of the minority of developers who read long-winded essays on programming and attend conferences to grill each other in the hallways and do personal experiments and make noise to drag the industry forwards, developers generally don’t write tests. Ruby was popularized by a few of that tribe, so some open questions default to different answers.

Speaking of Ruby, let’s jump into a thought experiment. It’s secretly on the same topic and you can figure out where this essay will end up, but if you do me a favor and don’t think too hard I’ll get to look smart in a couple hundred words.

What if Ruby didn’t verify a program had correct syntax before running it?

Same language, change nothing, but it doesn’t look for missing parentheses or extra quote marks until it happens to try to run that line of code. You can imagine it. It would work, right? Programs would look the same as they do now.

You’d even have a bit of extra convenience: if you’re working in a few files and you want to run tests that will only touch one, you can ignore the invalid syntax elsewhere. Easy-peasy. Programming gets easier.

Except you’ll pay for that convenience. Whether you count production defects or measure time spent on maintenance, the return on that particular investment is obviously a negative number. Hitting syntax errors at runtime is part of why eval is avoided. No one wants Ruby-but-with-runtime-syntax-checking. It would be terrible.

I think the reason it’s so obviously terrible is that Ruby’s not compiled and is (in)famously flexible at runtime. Syntax checking is the only writetime prevention of errors Ruby has at all.

Ruby leans very hard towards giving you power and convenience at the expense of reliable, predicable code. Tests are so valuable in Ruby because we need to protect our programs from the mistakes those conveniences allow.

Ruby without tests is also terrible, but it’s not quite as obviously terrible as runtime syntax checking. It looks like programming is Just Really Hard, that defects are hard to track down, that problems sprawl throughout the codebase. Except for the occasional spectacular bug, we pay in time spent writing and maintaining and fixing, and developers are happy to boil ourselves in a pot of
lost hours.

Cheat codes

Make illegal states unrepresentable.

I never understood this common saying in functional programming. Martian moon alien talk.

My programs were all about taking invalid data from users and rejecting or transforming it to save into the database. Objects and data structures are complex and have interdpendent attributes. If the Order has a finalization_date, you’re not allowed to add more LineItems even if ActiveRecord will let you call order.line_items.create until the Accounting department rats you out to the IRS for unbalancing the books. Illegal states are always possible, you just don’t do them and you write tests to reduce the chance someone accidentally does.

And then I spent a lot of time talking to functional programmers and applying their ideas to Ruby and studying Haskell and reading their books and generally trying to crawl inside their heads, because I was seeing fantasic claims that a proper type system makes testing practically irrelevant. That’s so laughably false on its face, why would so many otherwise smart people say that? I can’t even argue against it until I can explain it in a way they’d agree with, so I had a lot to learn.

I started my professional career in PHP, which at the time was an incredibly practical language useful for web programs up to about a hundred lines of code and then an insane impossible mess thererafter. Everything I built was in a constant state of collapsing under its own weight. But it mostly worked, and I was driven to learn more, to write the next function a little better, to reflect on how I could’ve caught that bug faster, to find techniques that would’ve prevented that bug in the first place, to write tests.

And after a few years of this I had some reusable code that I copied and pasted between projects. It had some functions for creating clean URLs like /users/3 instead of /users.php?id=3, to do object-relational mapping between the User class and the user table, and a command-line script to run a test suite, organized into subdirectories of the test folder. Instead of three days to add a table to the database and wire it into the web app, it’d take me an afternoon and probably not even have any bugs in it.

And then I bummed a ride from Adrian Holovaty to a ChiPy meeting to hear him talk about some kind of web software because I wrote web software all day and played with Python when I was sick of PHP confusing hashes and arrays (which was convenient for a month but terrible forever). Adrian talked about this new thing he’d made called Django. It had clean URLs and object-relational mapping and a test harness and things I’d never thought of but instantly recognized from the dim shapes I’d made out of the fog of my own code.

And then a couple months later I saw a guy redo what I knew perfectly well to be a full day of work in my codebase in this language I’d never heard of called Ruby and he did it in 15 minutes!

And that’s it, right? Django and Rails, that’s what I was stumbling towards and they were right there in front of me. If I took my whole codebase that I’d been refining and threw it the fuck away I’d be a year ahead of where I was. It was a cheat code for programming. I’d have to learn their code, but there was low cost, low risk, and great rewards. I wanted that return on investment.


It’s been a long time, but I had that cheat code feeling again last week.

I’ve been reading about functional programming for a couple years, practicing Haskell more or less in earnest for 8 months, trying to understand the wild claims and abstract ideas.

Thought experiment: what if we didn’t have arrays? You could have a slew of individual variables in your program, user1 to user50 and work with them. It would be tricky, but it would work. When listing the users, load the first one from the database into the user1, the next into user2, and so on.

Maybe after a while we’d use eval to write slightly more general code, to climb up out of the error-prone tedium to more general, reliable code. As we had more and more places in our code working with many similar values we’d build towards the abstraction of arrays. Other developers might shrug at us.

Variables work. They’re easy to reason about. What are these array things? What’s the value? Only mathemeticians talk about arrays. A user never talks about an array, are they even useful in real programs for real problems? You need specially trained programmers to maintain that program. Even if arrays were useful, would we get a return on investment for learning such a vague, abstract concept?

When you’re building a doghouse and you’ve only ever built doghouses and all your friends build doghouses and the other houses you see look like doghouses, all that skyscraper stuff is bullshit. It’s not talking about anything real. It can’t be made to work in practice, and if it did, it wouldn’t be worth the effort.

Arrays are a abstraction over a bunch of variables like user23. Solutions that are more general are more powerful. They give us vocabularies for thinking about problems and we can adopt other people’s reusable solutions for our problems. Arrays are worth the investment. They are a step towards better programs. The confusing things that functional programmers talk about like monoids are the next steps.

But it’s so abstract! What customer asks for a monoid? There’s no return on investment! Other ungrounded objections that also applied to the array example!

Monoids are an abstraction over many types of data and lots of different data structures. When we know what a monoid is we go one more step up in abstraction and get more general, powerful programs. We get other people’s solutions for our problems. There are more abstractions alongside and built on top of monoids, more to use and learn.

The danger is that it’s ceremony, or it’s so far up the spectrum of abstractions from the doghouse that you drift off into space, you become an architecture astronaut who designs useless castles in the sky that only make life harder for those of us with the hammers and beers. But that’s not my estimation of functional programming, I expect a lot of value for my time.

And after all the studying and practicing I’ve done to write good tests, I saw an incredible cheat code.

The value of tests is the value of types

I write tests because I want reliable, maintainable programs faster. That’s the return on investment. A test is an example ensuring the code behaves as expected. It’s a runtime proof that the system works.

A powerful type system with sum and product types checks things about a program at writetime rather than runtime. It’s more general than a test, it says the pieces of the system are connected correctly.

That recommendation I quoted to “make illegal states unrepresentable” isn’t martian moon alien talk, it’s something we can do more cheaply than we can write an exhaustive test suite to ensure our software doesn’t put put data in illegal states. As Amanda Laucher mathematically put it:

Types = for all

Tests = there exists

Types give us universal assertions instead of piecemeal examples, before runtime. Types can’t give every assurance we use tests for, but they give a lot of assurance for not much cost. There’s a real return in reliable programs, and there’s even more when they give us ways to talk about higher abstractions like monoids.

Last week at the Lambda Jam conference, Ranjit Jhala gave a workshop on refinement types. You can annotate a function with its preconditions and postconditions beyond what a basic type system allows and it gives assurances about what can and can’t happen in your program.

The absolute value function abs takes a number and strips off a negative sign if it has one. A basic type system tells you that it take a number and returns a number. A test tells you that passing in -3 results in 3, but it doesn’t say anything about -4 unless you write that example, too.

A refinement type lets you annotate the function to say that negative numbers are transformed into positive ones, and that every number returned is positive.

When you take the first n elements from an array, you’re reqired to pass in a positive n, it makes no sense to take a negative amount of elements from an array. A test can tell you that you get the right exception for the example negative numbers you thought of.

A refinement type keeps track of the conditions that must be true or false for the values in the system. You’ll know at writetime that every value you use to take elements from an array is positive. The compiler ensures that n passed through a function like abs that only produces positive integers or that you’ve written the if statement to otherwise account for the possibility. More exciting: even if the array is a list of characters the user typed in, you can ensure that you never try to take more elements than exist in the array. The take function doesn’t need to detect and report these errors and you don’t have to write exhaustive tests for every boundary and off-by-one. You think about those preconditions as you write the function and the program cannot compile if you haven’t ensured those possibilities are handled properly.

The big example in the workshop was writetime checking that runtime pointers can’t overrun array bounds. If the program had a buffer overflow, it wouldn’t compile. Instead, at runtime, it was working safely with pointers without a system of automatic runtime checks. I would not have believed that was possible before I saw the work up to it.

Every few slides I saw another whole class of errors disappear. Tests I don’t have to write, errors I can’t make, maintenance I don’t have to do. Real problems, real bugs I’ve seen happen in programs in the decades I’ve been programming, real bugs I’ve seen happen in the last month, were impossible. Refinement types are a cheat code for writing reliable programs.

The system I saw was in-progress, young and unpolished. The syntax wasn’t final, the error messages were sometimes obtuse, the standard libraries aren’t annotated, the environment is finicky to set up, the compiles are ten times slower. The developers are still figuring out what refinement types are capable of, what to offer, how to make sense of it, how to change the practice of programming.

But that’s it, that’s the future. Maybe it’s 5 years out from a fairly stable release, maybe it’s 10 years before most functional programmers use it, maybe it’s 30 years before it’s in Java 18. But I’m certain it’s coming. The return on that investment is huge.

On the journey

More and more, I’m focusing my attention on functional programming. The concepts and tools have a great return for the investment of my time and effort. I want to start on a safe foundation, take abstractions from the toolbox of functional programming, and build rigorously.

The understanding I got of mutability and side effects has already helped me write noticeably higher quality programs outside of functional programming languages, as I explored in my talk, and they were covered in the first chapter or two of any textbook on functional programming.

I started this blog to build on my talk, to share the practical things I’m continuing to research, and to help other folks learn the concepts and write better code with me. If you want that return on investment of functional programming and computer science, please enter your email below to join my announcement list.

Not all of our programs are skyscrapers, but they’re office buildings and apartment buildings, houses and parks. People carry our software in their pockets, work in software, are managed by software, get paid by software, are intimate with loved ones via software, live and die with software.

Let’s get out of the doghouse.