Wrapping Large-Scale Refactors

Published: ← 2024-01-27 →
Category: ← Code →
Tags: large codebases practices ← refactoring

I really liked a “Long Term Refactors” by Max Chernyak explaining a nice development practice. I was reminded of a thing that surprised me about refactors and dependency management.

My last job had a codebase large enough (~50M LOC ⊕Edit 2025-02: I recently reread this post. I think this number must be too high, but I can’t remember what the right number was anymore, maybe 7M? But I don’t know how I would’ve made this mistake, it’s not clearly a simple typo. I guess I’ll just reiterate that it was by far the largest codebase I’ve worked in and not alter this post.) that there were always large-scale refactors in flight, which was a new experience for me.

One thing this article doesn’t specifically call out is that any kind of dependency update or replacement, whether an internal or external library, is a large-scale refactor. You benefit enormously if you can do this incrementally as the author describes instead of a flag day or One Giant Merge to update all uses. A counterintuitive result is that replacing one dep with another (foolib -> barlib) is easier than updating one (foolib 1 -> foolib 2)! Most languages do not allow you to depend on multiple versions of a package and have different sections of your codebase call different versions. Sometimes internally-maintained dependencies will rename just to get around this limitation.

There’s a style of managing dependencies that mandates you must wrap usage of libraries or APIs. Rather than calling Foolib::Thing.new, you’ll create your own FooThing (maybe using the decorator or facade patterns) and that class is the only place allowed to import from or call into foolib. With less exposure of foolib, it’s easier to create internal documentation, audit or control usage, or replace foolib with barlib. I don’t find this a cost worth paying in smaller codebases, but easily worth it in large ones.

Part of why it’s worthwhile is that it gives you two new methods for dealing with dependency updates. First (hopefully), you have a single codesite that uses foolib so a single team can make a small change to update foolib. Or second, if there are extensive changes that mandate changes at callsites, you can rename FooThing to FooThing1 (usually an easy, if large diff), introduce FooThing2 with the new API, and then use a process like the one this article describes to make that change incrementally to the entire codebase. Either you update foolib at the start of this process and FooThing1 maps old usage to new, or FooThing2 maps new usage to old and you bump foolib at the end. This process works quite well, whether foolib is an internal or external dependency. Whereas, say, emailing all-dev@example.com a link to the foolib release notes and dictum that on some particular date that all foolib usage must be updated will inevitably produce significant internal discord and never, ever an on-schedule completion. An even worse and more common failure mode for internal libs is to quietly mark foolib deprecated and direct people to rewrite to barlib when they show up with urgent questions about foolib during an outage - but of course good sense and steps 6-8 of the process described in the post would avoid such an outlandish footgunning.

(This post was originally a comment on Lobste.rs but then I realized it’s a nice excuse to break the 5.5 year dry spell here.)