Extracting Immutable Objects

In the last few weeks I’ve been rehabilitating some of the first object-oriented code I wrote in the hopes of bringing my mailing list archive back online. Lately I’ve been refactoring some of the earliest, core code: the Message class. It manages the individual emails in the system and, because I didn’t understand how to extract functionality, had turned into something of a God Class.

Yesterday I tweeted about a really satisfying cleanup:

tweet{.aligncenter .size-full .wp-image-2067 .content width=”517” height=”99”}

Scott Vesely replied to ask to see the code, so I figured I’d post it.

First off, I broke the Message down into many classes:

Classes Extracted{.aligncenter .size-full .wp-image-2068 .content width=”838” height=”558”}

The MessageStorage, EmailStorage, and ListAddressStorage classes are part of a Table Data Gateway layer that separates object de/serialization, saving, and searching from the domain objects themselves.

I’ve been increasingly dissatisfied with the Active Record pattern. Domain and persistence code comingled in the same object quickly become hopelessly intertwined, leading to tests that have to persist large object graphs to test simple things (eg. any time FactoryGirl is needed outside of a full-stack integration test something is very wrong), and code that treats the database as the ultimate global variable. Gary Bernhardt’s videos “What Goes In Active Records” Part 1 and Part 2 crystallized this realization. In them he strips out his domain logic to service classes in a Rails-sick implementation of the Row Data Gateway pattern (though he never uses that name).

An Aside About PoEAA

You may have noticed I’ve linked to the site for Martin Fowler’s book Patterns of Enterprise Application Architecture three times in the last two paragraphs. It’s a great book on organizing code, and I wanted to point out two cute things.

First, the site mentions “Many of these diagrams demonstrate the rather poor GIF output of Visio. The nice diagrams were redrawn for me by David Heinemeier Hansson” and you can see the patterns that appear in Rails are all nicer.

Second, the book presages the last two years of the Rails community struggling loudly with “fat models” and “fast tests”:

As the domain logic gets more complicated and you begin moving toward a rich Domain Model (116), the simple approach of an Active Record starts to break down. The one-to-one match of domain classes to tables starts to fail as you factor domain logic into smaller classes. Relational databases don’t handle inheritance, so it becomes difficult to use strategies [Gang of Four] and other neat OO patterns. As the domain logic gets feisty, you want to be able to test it without having to talk to the database all the time.

What about Immutability?

Right, the reason I started this post. (I know I’ve gone too long between blog posts when I can’t say anything without digressing into mini-rants).

Every email client should generate a globally unique “Message Id” for every email it sends; in practice there are significant numbers of omissions, mis-formattings, and duplications to deal with. The Message class had a fair amount of code related to extracting them from headers, validating them, and generating them. And it was a soupy mess:

# If you skimmed down here: this is awful old code I would never write now class Message attr_reader :message_id

# ... lots of other code ...

# called from initialize() def load_message_id if message_id = get_header(‘Message-Id’) and message_id =~ /\^<?[a-zA-Z0-9%+-.=]+@[a-zA-Z0-9-.]+>?$/ and message_id.length < 120 @message_id = /\^<?([\^@]+@[\^\>]+)>?/.match(message_id)[1].chomp else generate_message_id end end

def get_header header match = /\^#{header}:\s*(.*?)\^\S/mi.match(headers + “\n.”) return nil if match.nil? # take first match so that lines we add_header’d take precedence match.captures.shift.sub(/(\s)+/, ‘ ‘).sub(/\n[ \t]+/m, “ “).strip end

def headers return ‘’ if message.nil? message.split(/\n\r?\n/)[0] end

def add_header(header) name = header.match(/\^(.+?):\s/).captures.shift new_headers = “#{header.chomp}\n” new_headers += “X-ListLibrary-Added-Header: #{name}\n” unless name.match(/\^X-ListLibrary-/) @message = new_headers + message end

def generate_message_id @message_id = “#{call_number}@generated-message-id.listlibrary.net” add_header “Message-Id: <#{@message_id}>” end end

There’s a lot wrong with this code, and it can be laid at the lack of separation of concerns. My representation of a Message is bound to the email it came from, so Message knows how to parse email headers, validate message ids, generate replacements (and this gets called by outside code to prevent duplicates), and — perhaps worst for an archive — it edits its generated ids into the original email headers! I knew that was bad when I wrote it (hence adding an extra header to tell me which I edited so I could later clean them out), but with messages so closely linked I felt like I didn’t have anywhere else to store the data.

I extracted this to a MessageId class (which a Message’s Email creates from its EmailHeaders), but it continued to bother me that load_message_id (now MessageId.extract_id) would set its @message_id instance variable to what it extracted (a simple memoization) or, if that turned out to be invalid, generate a new message id and overwrite @message_id with that.

Mutation makes code much harder to reason about. Tests have to check interactions and properties rather than just return values. So I pushed that conditional up into a new class factory method extract_or_generate that creates a MessageId with the given message id and, if it’s not valid, instead returns a new, generated MessageId. Here’s the complete listing; you’ll see I’ve also broken things like the validity check down into small, well-named methods:

class MessageId attr_reader :raw, :message_id

def initialize raw @raw = raw || ‘’ end

def valid? !raw.empty? and raw.length <= 120 and has_id? end

def has_id? raw =~ /\^<?[a-zA-Z0-9%+-.=]+@[a-zA-Z0-9-.]+>?$/ end

def extract_id @message_id ||= /\^<?([\^@]+@[\^\>]+)>?/.match(raw)[1].chomp end

def to_s if valid? extract_id else “[invalid or missing message id]” end end

def == other other = MessageId.new(other.to_s) return false unless valid? and other.valid? to_s == other.to_s end

def eql? other self == other end

def hash to_s.hash end

def inspect “#” % [(object_id << 1), to_s] end

def self.generate_for call_number raise ArgumentError, “Missing call number to generate MessageId” if (call_number || ‘’) == ‘’

new “#{call_number}@generated-message-id.chibrary.org” end

def self.extract_or_generate str, call_number mid = new(str) return mid if mid.valid? return generate_for call_number end end

MessageId objects are now small value objects. They’re easy to use and understand. Previously the tests were a slow, cluttered mess that instantiated Messages from fixtures or reached deep inside to set headers. As Sandi Metz said, “When testing is hard, I’m missing an object.”.

Now I test more conditions more thoroughly and they’re totally straightforward reading. Because you really get to see code shine when it’s used, here’s the complete spec for MessageId:

describe MessageId do describe ‘#valid?’ do it ‘is valid if well-formatted’ do expect(MessageId.new(‘’)).to be_valid end

it ‘is invalid if nil’ do expect(MessageId.new nil).to_not be_valid end

it ‘is invalid if empty string’ do expect(MessageId.new(‘’)).to_not be_valid end

it ‘is invalid if longer than 120 characters’ do str = (‘x’ * 109) + “@example.com” expect(MessageId.new str).to_not be_valid end

it ‘is invalid if it does not include an id’ do expect(MessageId.new ‘I like cats’).to_not be_valid end end

describe ‘#has_id?’ do it ‘does with and without angle brackets’ do expect(MessageId.new ‘’).to have_id expect(MessageId.new ‘without@example.com’).to have_id end

it ‘does not with multiple @s’ do expect(MessageId.new ‘bad@heart@example.com’).to_not have_id end

it ‘does not with no text before or after the @’ do expect(MessageId.new ‘bad@’).to_not have_id expect(MessageId.new ‘@example.com’).to_not have_id end end

describe ‘#to_s’ do it ‘returns the extracted id’ do expect(MessageId.new(‘’).to_s).to eq(‘id@example.com’) end it ‘does not pass invalid ids’ do s = MessageId.new(‘cats are great’).to_s expect(s).to_not include(‘cats’) expect(s).to include(‘invalid’) end end

describe ‘#inspect’ do it ‘includes the .to_s’ do expect(MessageId.new(‘id@example.com’).inspect).to include(‘id@example.com’) expect(MessageId.new(‘cats rule’).inspect).to include(‘invalid’) end end

describe ‘#==’ do it ‘considers equal based on extracted id, not raw’ do expect(MessageId.new(‘id@example.com’)).to eq(MessageId.new(‘’)) end

it ‘coerces strings to MessageIds to test’ do expect(MessageId.new(‘id@example.com’)).to eq(‘id@example.com’) end

it ‘does not consider invalid ids equal to themselves’ do expect(MessageId.new(‘cat’)).to_not eq(MessageId.new(‘cat’)) end end

describe ‘#hash’ do it ‘hashes consistently’ do expect(MessageId.new(‘id@example.com’).hash).to eq(MessageId.new(‘id@example.com’).hash) end

it ‘uniqs’ do # http://stackoverflow.com/questions/20388090/arrayuniq-ignoring-identical-hash-values a = [MessageId.new(‘id@example.com’), MessageId.new(‘id@example.com’)] a.uniq! expect(a.length).to eq(1) end end

describe ‘::generate_for’ do it ‘creates based on call number’ do expect(MessageId.generate_for(‘0123456789’).to_s).to include(‘0123456789’) end

it ‘raises without a call_number’ do expect { MessageId.generate_for ‘’ }.to raise_error(ArgumentError) end end

describe ‘::extract_or_generate’ do it ‘given a valid message id, creates from that’ do mid = MessageId.extract_or_generate(‘id@example.com’, ‘call’) expect(mid.to_s).to include(‘id@example.com’) end

it ‘given an invalid message id, generates from call_number’ do mid = MessageId.extract_or_generate(‘srsly cats man’, ‘call’) expect(mid.to_s).to include(‘call’) end end

end

Reward for scrolling down here: a sneak peek of the keyboard I assembled myself and typed this post on.