Integration as Composition
Code: Chibrary, composition, design, functional programming, literate programming, pointfree
I’m puzzling over the design for a worker and would appreciate your comments on it. I started with the pain of an ugly test, made an interesting refactoring, and decided to drop the test entirely, but I’m not at all sure this is the right decision.
In my mailing list archive Chibrary, I want to sum up the number of threads and messages in a month to present on archive pages. The
MonthCountWorker takes a
Sym, a unique identifier for a list’s slug + year + month, fetchs the threads for that month, sums them, and stores the sum. The code is trivial, right?:
Two Designs for SequenceIdGenerator
Code: Chibrary, design, object orientation, Ruby, state
In my previous, marathon post on id generation, I mentioned that I was generating unique sequence numbers. It was straightforward to write a class to generate them:
Distributed ID Generation and Bit Packing
Code: bit packing, bit twiddling, Chibrary, concurrency, design, stemwinder
There are two ways for programs to collaborate: they can communicate their shared state or they can partition the work so they don’t need to know each other’s state. As I’ve been rehabbing the code for my mailing list archive Chibrary I ran into this issue, and the design constraints made for an interesting puzzle. I need to generate unique IDs in multiple programs at once, and they need to be short enough to fit in URLs.
Writing is nature’s way of letting you know how sloppy your thinking is.
At more than 3,000 words, this is an unusually long blog post for me. This problem has a lot of considerations to balance (and you’ll see a lot of parenthetical caveats). I like to work through problems by writing out my thoughts, constantly tinkering with possibilities and revisiting ideas in light of new constraints or insight. I thought my (heavily) revised notes might make for an interesting post as I attack the problem from several different approaches and end with a solution that nicely connects theoretical and practical considerations. As a bonus, maybe someone will tell me I’m approaching this all wrong or ignorant of a known best practice for this sort of thing before I go live. With that preamble rambled, let’s dig into the problem.
Extracting Immutable Objects
Code: design, design patterns, email, ListLibrary.net, mailing lists, object orientation, Ruby
In the last few weeks I’ve been rehabilitating some of the first object-oriented code I wrote in the hopes of bringing my mailing list archive back online. Lately I’ve been refactoring some of the earliest, core code: the
Message class. It manages the individual emails in the system and, because I didn’t understand how to extract functionality, had turned into something of a God Class.
Yesterday I tweeted about a really satisfying cleanup:
I’ve been pondering, and it’s related to my old Rules Of Database App Aging post, which has aged well (unfortunately). One more thing I’ve recognized is that those join tables between what I think of as two records very quickly pick up their own life.
To use the examples from that post, the guy who lives in one state and works in another needs to track what dates he started that relationship in each for taxes. The email going to multiple feeds needs different formatting applied for each. A state’s caucus may refer back to the results of the previous caucus to determine ballot order or election fund matching. Things with multiple categories may care about the order their categories are displayed in.
In all these cases and more, it’s plausible for that relationship to become a domain object in its own right (most often with timestamps and active/inactive flags, in my experience). It happens often enough that I find myself almost never using the implicit relationships provided by has_and_belongs_to_many in favor of new, explicitly-named objects with two belongs_to/has_many relationships. These can take on data and responsibilities in a natural way as needed and make the app easier to talk about.
I liked this blog post Solving vs. Fixing (via). In my first job out of college I did support and maintenance on a medium-sized (250kloc) system that had spent a year looked after by a developer who only fixed things, never solved them. The code had started poor and gotten gotten steadily worse, but I always tried to fix bugs twice and slowly ground out improvements in the system.
So this blog post caught my eye, though I think this it misses two vitally important steps.
After you have fixed the bug and committed your code, stop and think, “How could I have found this bug faster?” Are there tools that you could’ve used to locate it quicker? Hypotheses you discarded too quickly? Some pattern in the code you overlooked? Why did it take you the time it did to find this bug, and what have you learned about how to do it better?
And then second, “Where did this bug come from?” Was it a typo? When you were originally writing the code, what test could you have written to catch it? If you misunderstood a requirement, what question could you have asked that would’ve exposed the mismatch? How could the system be redesigned to make this kind of bug impossible? What have you learned about how to code better?
Software development is design, it’s all mental processes, not a production process. The hard problems are all in the thinking, not in the typing. So you have to introspect and improve your thinking to get better at it. I don’t feel like an amazing developer, but when I recognize a design issue or a bug as something I’ve seen before I can jump ahead of colleagues to an answer. There’s no magic developer talent at work, it’s just deliberate, conscious practice, and these two questions are most of how I do that.
(If you’re wondering, the blog has been quiet lately because I’ve been putting all my free time into an art project. I’m now waiting on production and have spare brainpower again. I plan to post about it in six weeks or so.)
I mentioned I’ve learned some rules of how database apps change over time, now that I’ve done a few dozen. They are:
All Fields Become Optional
As your dataset grows, exceptions creep in. There’s not enough research time to fill in all your company profiles, there’s one guy in Guam when you expected everyone to be in a U.S. state, there’s data missing from the page you’re scraping, you have to pull updates from a new source.
Every field eventually loses that beautiful NOT NULL sheen, your code gets filled up with guard clauses of one kind or another and every <div> in your template is wrapped by an if statement. And this happens to foreign keys, too, so OR IS (NOT) NULL sneaks in and left joins mutate into outer joins.
This is by far the biggest effect on apps over time. It’s getting to the point that I can gauge age by eyeballing the number of fields that retain their NOT NULL constraint.
All Relationships Become Many-to-Many
Some guy works in DC but lives in Virginia, so he needs two Locations. A new type of incoming email needs to be shoveled out to different feeds. A state has both a primary and a caucus. Someone eventually realizes categories never really were mutually exclusive.
The modern database paradigm is defined by relations, so of course that's what falls apart as soon as you get an app into production. The urge to hack is overwhelming, fudge in a little denormalization or duplicate a row and the pressure's off for now, but it's like freezing a bottle of water, it always grows and breaks worse in the end.
Chatter Always Expands
All the little oddities that change database schemas affect the user presentation as well. Chatter is the intro and outro text around the content of a page that almost no one ever reads. But it has to be there to explain what's going on, the source of information, why things may seem peculiar, the limitations of the dataset, etc. Add in the difficulty of writing succintly and chatter grows until you burn it all down by rebuilding the app.
And when you do rebuild the app from the ground up, you have your chance to slip some NULLs back on, renormalize your data to have easy one-to-many relationships, and present the data in a self-evident and consistent fashion. Then, about a week later, there's a politician who's a Democrat but running for re-election as an Independent...
Code: AASM, Bort, ConfReader, design, haml, open_id_authentication, Paperclip, planning, Rails, RailsRumble, resource_this, restful_authentication, RSpec, scheduling, teamwork, web
I failed to launch my Rails Rumble project ConfReader. Why?
Couldn’t stay up because work is demanding in the election season, so working late would mean I’d limp through the next week of important coverage. More than that, though, was a bad mistake in scheduling a social outing. I didn’t figure in some of travel time and the friend driving wanted to spend a lot more time out, so instead of having a nice morning out I got home at 7 PM tired and in a foul mood. I was able to relax a bit and get some code done, but it cut my available time by more than half.
Too many new toys
I knew I hadn’t played with encoding video before, so I asked an expert for some tips and tinkered around breaking test videos down into short, easily-transcribed segments. Not coincidentally, this is basically the only feature that works. I started with bort to get a lot of basic functionality quickly, but I had no experience with most of the plugins it provided (and those I otherwise planned to use).
- Differences between acts_as_state_machine (which I started with) and aasm (the successor that bort ships).
- Translating scaffolding from erb to haml and a (still-unsolved) bug that causes it to render content twice on one page. (And I’ve even used haml before.)
- I wanted users to be able to contribute anonymously or use OpenID to receive credit for their contributions without having to set up an account, and the plugins came configured for conventional user accounts.
- I had to fix a templating bug, and there’s a fair amount of magic to learn to work with.
- Scaffolded tests needed slow, tedious tweaks to work with resource_this. I think I like the idea of rspec more than the implementation.
- Sorting out finicky path/url issues, and I couldn’t find a way to mock validates_attachment_presence so several models went unspeced..
Each of these issues ate 30-60m, and there went the time to build. Some plugins (acts_as_versioned, footnotes, random_finders) just worked great, but none was really related to core functionality. Speaking of which:
Core featureset too big
The line between nothing and application includes site graphic design, listing conferences, uploading presentation videos, splitting videos into segments, adding transcripts, viewing all of the these, editing transcripts… I had a lot of features listed as optional, but there was still a long set of base features and each uncovered another one of those frustrations.
It would’ve helped a lot to have someone to work with. Not just because it would have taken less time to finish the core features, but for the enjoyment of teamwork and to complement my weaknesses.
I’m glad I spent time I spent hanging out in #railsrumble or on the RailsRumble present.ly. Even if it didn’t directly result in items checked off the todo list, it’s been a fun community to be in, and I was glad to help folks out. I’m eager to see what other people came up with.
I didn’t enter RailsRumble to win, I entered for the excuse to create what I thought was an interesting small project. I’ll pick ConfReader back up in a little while, probably after the election. I still think it’s too good an idea not to build, but I’m a bit tired right now.
This morning, at about 4:30 AM, I awoke and just knew the Right Way to rebuild RegistryPro to be completely reliable, even more compact, and provide meaningful reporting. It would take less than two weeks of coding time and the pitfalls are well-demarcated and avoidable. It would be really great if I’d thought of it two years ago when I still worked there. Thanks, brain.
Fatal error: Call to undefined function twentyseventeen_get_svg() in /home/malaprop/push.cx/wp-content/themes/pushcx/archive.php on line 45