Extracting Immutable Objects
In the last few weeks I’ve been rehabilitating some of the first object-oriented code I wrote in the hopes of bringing my mailing list archive back online. Lately I’ve been refactoring some of the earliest, core code: the Message
class. It manages the individual emails in the system and, because I didn’t understand how to extract functionality, had turned into something of a God Class.
Yesterday I tweeted about a really satisfying cleanup:
{.aligncenter .size-full .wp-image-2067 .content width=”517” height=”99”}
Scott Vesely replied to ask to see the code, so I figured I’d post it.
First off, I broke the Message
down into many classes:
{.aligncenter .size-full .wp-image-2068 .content width=”838” height=”558”}
The MessageStorage
, EmailStorage
, and ListAddressStorage
classes are part of a Table Data Gateway layer that separates object de/serialization, saving, and searching from the domain objects themselves.
I’ve been increasingly dissatisfied with the Active Record pattern. Domain and persistence code comingled in the same object quickly become hopelessly intertwined, leading to tests that have to persist large object graphs to test simple things (eg. any time FactoryGirl is needed outside of a full-stack integration test something is very wrong), and code that treats the database as the ultimate global variable. Gary Bernhardt’s videos “What Goes In Active Records” Part 1 and Part 2 crystallized this realization. In them he strips out his domain logic to service classes in a Rails-sick implementation of the Row Data Gateway pattern (though he never uses that name).
An Aside About PoEAA
You may have noticed I’ve linked to the site for Martin Fowler’s book Patterns of Enterprise Application Architecture three times in the last two paragraphs. It’s a great book on organizing code, and I wanted to point out two cute things.
First, the site mentions “Many of these diagrams demonstrate the rather poor GIF output of Visio. The nice diagrams were redrawn for me by David Heinemeier Hansson” and you can see the patterns that appear in Rails are all nicer.
Second, the book presages the last two years of the Rails community struggling loudly with “fat models” and “fast tests”:
As the domain logic gets more complicated and you begin moving toward a rich Domain Model (116), the simple approach of an Active Record starts to break down. The one-to-one match of domain classes to tables starts to fail as you factor domain logic into smaller classes. Relational databases don’t handle inheritance, so it becomes difficult to use strategies [Gang of Four] and other neat OO patterns. As the domain logic gets feisty, you want to be able to test it without having to talk to the database all the time.
What about Immutability?
Right, the reason I started this post. (I know I’ve gone too long between blog posts when I can’t say anything without digressing into mini-rants).
Every email client should generate a globally unique “Message Id” for every email it sends; in practice there are significant numbers of omissions, mis-formattings, and duplications to deal with. The Message
class had a fair amount of code related to extracting them from headers, validating them, and generating them. And it was a soupy mess:
# If you skimmed down here: this is awful old code I would never write now class Message attr_reader :message_id
# ... lots of other code ...
# called from initialize() def load_message_id if message_id = get_header(‘Message-Id’) and message_id =~ /\^<?[a-zA-Z0-9%+-.=]+@[a-zA-Z0-9-.]+>?$/ and message_id.length < 120 @message_id = /\^<?([\^@]+@[\^\>]+)>?/.match(message_id)[1].chomp else generate_message_id end end
def get_header header match = /\^#{header}:\s*(.*?)\^\S/mi.match(headers + “\n.”) return nil if match.nil? # take first match so that lines we add_header’d take precedence match.captures.shift.sub(/(\s)+/, ‘ ‘).sub(/\n[ \t]+/m, “ “).strip end
def headers return ‘’ if message.nil? message.split(/\n\r?\n/)[0] end
def add_header(header) name = header.match(/\^(.+?):\s/).captures.shift new_headers = “#{header.chomp}\n” new_headers += “X-ListLibrary-Added-Header: #{name}\n” unless name.match(/\^X-ListLibrary-/) @message = new_headers + message end
def generate_message_id @message_id = “#{call_number}@generated-message-id.listlibrary.net” add_header “Message-Id: <#{@message_id}>” end end
There’s a lot wrong with this code, and it can be laid at the lack of separation of concerns. My representation of a Message
is bound to the email it came from, so Message
knows how to parse email headers, validate message ids, generate replacements (and this gets called by outside code to prevent duplicates), and — perhaps worst for an archive — it edits its generated ids into the original email headers! I knew that was bad when I wrote it (hence adding an extra header to tell me which I edited so I could later clean them out), but with messages so closely linked I felt like I didn’t have anywhere else to store the data.
I extracted this to a MessageId
class (which a Message
’s Email
creates from its EmailHeaders
), but it continued to bother me that load_message_id
(now MessageId.extract_id
) would set its @message_id
instance variable to what it extracted (a simple memoization) or, if that turned out to be invalid, generate a new message id and overwrite @message_id
with that.
Mutation makes code much harder to reason about. Tests have to check interactions and properties rather than just return values. So I pushed that conditional up into a new class factory method extract_or_generate
that creates a MessageId
with the given message id and, if it’s not valid, instead returns a new, generated MessageId
. Here’s the complete listing; you’ll see I’ve also broken things like the validity check down into small, well-named methods:
class MessageId attr_reader :raw, :message_id
def initialize raw @raw = raw || ‘’ end
def valid? !raw.empty? and raw.length <= 120 and has_id? end
def has_id? raw =~ /\^<?[a-zA-Z0-9%+-.=]+@[a-zA-Z0-9-.]+>?$/ end
def extract_id @message_id ||= /\^<?([\^@]+@[\^\>]+)>?/.match(raw)[1].chomp end
def to_s if valid? extract_id else “[invalid or missing message id]” end end
def == other other = MessageId.new(other.to_s) return false unless valid? and other.valid? to_s == other.to_s end
def eql? other self == other end
def hash to_s.hash end
def inspect “#” % [(object_id << 1), to_s] end
def self.generate_for call_number raise ArgumentError, “Missing call number to generate MessageId” if (call_number || ‘’) == ‘’
new “#{call_number}@generated-message-id.chibrary.org” end
def self.extract_or_generate str, call_number mid = new(str) return mid if mid.valid? return generate_for call_number end end
MessageId
objects are now small value objects. They’re easy to use and understand. Previously the tests were a slow, cluttered mess that instantiated Message
s from fixtures or reached deep inside to set headers. As Sandi Metz said, “When testing is hard, I’m missing an object.”.
Now I test more conditions more thoroughly and they’re totally straightforward reading. Because you really get to see code shine when it’s used, here’s the complete spec for MessageId
:
describe MessageId do describe ‘#valid?’ do it ‘is valid if well-formatted’ do expect(MessageId.new(‘’)).to be_valid end
it ‘is invalid if nil’ do expect(MessageId.new nil).to_not be_valid end
it ‘is invalid if empty string’ do expect(MessageId.new(‘’)).to_not be_valid end
it ‘is invalid if longer than 120 characters’ do str = (‘x’ * 109) + “@example.com” expect(MessageId.new str).to_not be_valid end
it ‘is invalid if it does not include an id’ do expect(MessageId.new ‘I like cats’).to_not be_valid end end
describe ‘#has_id?’ do it ‘does with and without angle brackets’ do expect(MessageId.new ‘’).to have_id expect(MessageId.new ‘without@example.com’).to have_id end
it ‘does not with multiple @s’ do expect(MessageId.new ‘bad@heart@example.com’).to_not have_id end
it ‘does not with no text before or after the @’ do expect(MessageId.new ‘bad@’).to_not have_id expect(MessageId.new ‘@example.com’).to_not have_id end end
describe ‘#to_s’ do it ‘returns the extracted id’ do expect(MessageId.new(‘’).to_s).to eq(‘id@example.com’) end it ‘does not pass invalid ids’ do s = MessageId.new(‘cats are great’).to_s expect(s).to_not include(‘cats’) expect(s).to include(‘invalid’) end end
describe ‘#inspect’ do it ‘includes the .to_s’ do expect(MessageId.new(‘id@example.com’).inspect).to include(‘id@example.com’) expect(MessageId.new(‘cats rule’).inspect).to include(‘invalid’) end end
describe ‘#==’ do it ‘considers equal based on extracted id, not raw’ do expect(MessageId.new(‘id@example.com’)).to eq(MessageId.new(‘’)) end
it ‘coerces strings to MessageIds to test’ do expect(MessageId.new(‘id@example.com’)).to eq(‘id@example.com’) end
it ‘does not consider invalid ids equal to themselves’ do expect(MessageId.new(‘cat’)).to_not eq(MessageId.new(‘cat’)) end end
describe ‘#hash’ do it ‘hashes consistently’ do expect(MessageId.new(‘id@example.com’).hash).to eq(MessageId.new(‘id@example.com’).hash) end
it ‘uniqs’ do # http://stackoverflow.com/questions/20388090/arrayuniq-ignoring-identical-hash-values a = [MessageId.new(‘id@example.com’), MessageId.new(‘id@example.com’)] a.uniq! expect(a.length).to eq(1) end end
describe ‘::generate_for’ do it ‘creates based on call number’ do expect(MessageId.generate_for(‘0123456789’).to_s).to include(‘0123456789’) end
it ‘raises without a call_number’ do expect { MessageId.generate_for ‘’ }.to raise_error(ArgumentError) end end
describe ‘::extract_or_generate’ do it ‘given a valid message id, creates from that’ do mid = MessageId.extract_or_generate(‘id@example.com’, ‘call’) expect(mid.to_s).to include(‘id@example.com’) end
it ‘given an invalid message id, generates from call_number’ do mid = MessageId.extract_or_generate(‘srsly cats man’, ‘call’) expect(mid.to_s).to include(‘call’) end end
end
Reward for scrolling down here: a sneak peek of the keyboard I assembled myself and typed this post on.