Extracting Immutable Objects

In the last few weeks I’ve been rehabilitating some of the first object-oriented code I wrote in the hopes of bringing my mailing list archive back online. Lately I’ve been refactoring some of the earliest, core code: the Message class. It manages the individual emails in the system and, because I didn’t understand how to extract functionality, had turned into something of a God Class.

Yesterday I tweeted about a really satisfying cleanup:


Scott Vesely replied to ask to see the code, so I figured I’d post it.

First off, I broke the Message down into many classes:

Classes Extracted

The MessageStorage, EmailStorage, and ListAddressStorage classes are part of a Table Data Gateway layer that separates object de/serialization, saving, and searching from the domain objects themselves.

I’ve been increasingly dissatisfied with the Active Record pattern. Domain and persistence code comingled in the same object quickly become hopelessly intertwined, leading to tests that have to persist large object graphs to test simple things (eg. any time FactoryGirl is needed outside of a full-stack integration test something is very wrong), and code that treats the database as the ultimate global variable. Gary Bernhardt’s videos “What Goes In Active Records” Part 1 and Part 2 crystallized this realization. In them he strips out his domain logic to service classes in a Rails-sick implementation of the Row Data Gateway pattern (though he never uses that name).

An Aside About PoEAA

You may have noticed I’ve linked to the site for Martin Fowler’s book Patterns of Enterprise Application Architecture three times in the last two paragraphs. It’s a great book on organizing code, and I wanted to point out two cute things.

First, the site mentions “Many of these diagrams demonstrate the rather poor GIF output of Visio. The nice diagrams were redrawn for me by David Heinemeier Hansson” and you can see the patterns that appear in Rails are all nicer.

Second, the book presages the last two years of the Rails community struggling loudly with “fat models” and “fast tests“:

As the domain logic gets more complicated and you begin moving toward a rich Domain Model (116), the simple approach of an Active Record starts to break down. The one-to-one match of domain classes to tables starts to fail as you factor domain logic into smaller classes. Relational databases don’t handle inheritance, so it becomes difficult to use strategies [Gang of Four] and other neat OO patterns. As the domain logic gets feisty, you want to be able to test it without having to talk to the database all the time.

What about Immutability?

Right, the reason I started this post. (I know I’ve gone too long between blog posts when I can’t say anything without digressing into mini-rants).

Every email client should generate a globally unique “Message Id” for every email it sends; in practice there are significant numbers of omissions, mis-formattings, and duplications to deal with. The Message class had a fair amount of code related to extracting them from headers, validating them, and generating them. And it was a soupy mess:

# If you skimmed down here: this is awful old code I would never write now
class Message
  attr_reader :message_id
  # ... lots of other code ...
  # called from initialize()
  def load_message_id
    if message_id = get_header('Message-Id') and message_id =~ /^<?[a-zA-Z0-9%+\-\.=_]+@[a-zA-Z0-9_\-\.]+>?$/ and message_id.length < 120
      @message_id = /^<?([^@]+@[^\>]+)>?/.match(message_id)[1].chomp
  def get_header header
    match = /^#{header}:\s*(.*?)^\S/mi.match(headers + "\n.")
    return nil if match.nil?
    # take first match so that lines we add_header'd take precedence
    match.captures.shift.sub(/(\s)+/, ' ').sub(/\n[ \t]+/m, " ").strip
  def headers
    return '' if message.nil?
  def add_header(header)
    name = header.match(/^(.+?):\s/).captures.shift
    new_headers = "#{header.chomp}\n"
    new_headers += "X-ListLibrary-Added-Header: #{name}\n" unless name.match(/^X-ListLibrary-/)
    @message = new_headers + message
  def generate_message_id
    @message_id = "#{call_number}@generated-message-id.listlibrary.net"
    add_header "Message-Id: <#{@message_id}>"

There’s a lot wrong with this code, and it can be laid at the lack of separation of concerns. My representation of a Message is bound to the email it came from, so Message knows how to parse email headers, validate message ids, generate replacements (and this gets called by outside code to prevent duplicates), and — perhaps worst for an archive — it edits its generated ids into the original email headers! I knew that was bad when I wrote it (hence adding an extra header to tell me which I edited so I could later clean them out), but with messages so closely linked I felt like I didn’t have anywhere else to store the data.

I extracted this to a MessageId class (which a Message‘s Email creates from its EmailHeaders), but it continued to bother me that load_message_id (now MessageId.extract_id) would set its @message_id instance variable to what it extracted (a simple memoization) or, if that turned out to be invalid, generate a new message id and overwrite @message_id with that.

Mutation makes code much harder to reason about. Tests have to check interactions and properties rather than just return values. So I pushed that conditional up into a new class factory method extract_or_generate that creates a MessageId with the given message id and, if it’s not valid, instead returns a new, generated MessageId. Here’s the complete listing; you’ll see I’ve also broken things like the validity check down into small, well-named methods:

class MessageId
  attr_reader :raw, :message_id
  def initialize raw
    @raw = raw || ''
  def valid?
    !raw.empty? and raw.length <= 120 and has_id?
  def has_id?
    raw =~ /^<?[a-zA-Z0-9%+\-\.=_]+@[a-zA-Z0-9_\-\.]+>?$/
  def extract_id
    @message_id ||= /^<?([^@]+@[^\>]+)>?/.match(raw)[1].chomp
  def to_s
    if valid?
      "[invalid or missing message id]"
  def == other
    other = MessageId.new(other.to_s)
    return false unless valid? and other.valid?
    to_s == other.to_s
  def eql? other
    self == other
  def hash
  def inspect
    "#<MessageId:%x '%s'>" % [(object_id << 1), to_s]
  def self.generate_for call_number
    raise ArgumentError, "Missing call number to generate MessageId" if (call_number || '') == ''
    new "#{call_number}@generated-message-id.chibrary.org"
  def self.extract_or_generate str, call_number
    mid = new(str)
    return mid if mid.valid?
    return generate_for call_number

MessageId objects are now small value objects. They’re easy to use and understand. Previously the tests were a slow, cluttered mess that instantiated Messages from fixtures or reached deep inside to set headers. As Sandi Metz said, “When testing is hard, I’m missing an object.”.

Now I test more conditions more thoroughly and they’re totally straightforward reading. Because you really get to see code shine when it’s used, here’s the complete spec for MessageId:

describe MessageId do
  describe '#valid?' do
    it 'is valid if well-formatted' do
      expect(MessageId.new('<valid@example.com>')).to be_valid
    it 'is invalid if nil' do
      expect(MessageId.new nil).to_not be_valid
    it 'is invalid if empty string' do
      expect(MessageId.new('')).to_not be_valid
    it 'is invalid if longer than 120 characters' do
      str = ('x' * 109) + "@example.com"
      expect(MessageId.new str).to_not be_valid
    it 'is invalid if it does not include an id' do
      expect(MessageId.new 'I like cats').to_not be_valid
  describe '#has_id?' do
    it 'does with and without angle brackets' do
      expect(MessageId.new '<with@example.com>').to have_id
      expect(MessageId.new 'without@example.com').to have_id
    it 'does not with multiple @s' do
      expect(MessageId.new 'bad@heart@example.com').to_not have_id
    it 'does not with no text before or after the @' do
      expect(MessageId.new 'bad@').to_not have_id
      expect(MessageId.new '@example.com').to_not have_id
  describe '#to_s' do
    it 'returns the extracted id' do
      expect(MessageId.new('<id@example.com>').to_s).to eq('id@example.com')
    it 'does not pass invalid ids' do
      s = MessageId.new('cats are great').to_s
      expect(s).to_not include('cats')
      expect(s).to include('invalid')
  describe '#inspect' do
    it 'includes the .to_s' do
      expect(MessageId.new('id@example.com').inspect).to include('id@example.com')
      expect(MessageId.new('cats rule').inspect).to include('invalid')
  describe '#==' do
    it 'considers equal based on extracted id, not raw' do
      expect(MessageId.new('id@example.com')).to eq(MessageId.new('<id@example.com>'))
    it 'coerces strings to MessageIds to test' do
      expect(MessageId.new('id@example.com')).to eq('id@example.com')
    it 'does not consider invalid ids equal to themselves' do
      expect(MessageId.new('cat')).to_not eq(MessageId.new('cat'))
  describe '#hash' do
    it 'hashes consistently' do
      expect(MessageId.new('id@example.com').hash).to eq(MessageId.new('id@example.com').hash)
    it 'uniqs' do
      # http://stackoverflow.com/questions/20388090/arrayuniq-ignoring-identical-hash-values
      a = [MessageId.new('id@example.com'), MessageId.new('id@example.com')]
      expect(a.length).to eq(1)
  describe '::generate_for' do
    it 'creates based on call number' do
      expect(MessageId.generate_for('0123456789').to_s).to include('0123456789')
    it 'raises without a call_number' do
      expect {
        MessageId.generate_for ''
      }.to raise_error(ArgumentError)
  describe '::extract_or_generate' do
    it 'given a valid message id, creates from that' do
      mid = MessageId.extract_or_generate('id@example.com', 'call')
      expect(mid.to_s).to include('id@example.com')
    it 'given an invalid message id, generates from call_number' do
      mid = MessageId.extract_or_generate('srsly cats man', 'call')
      expect(mid.to_s).to include('call')

Reward for scrolling down here: a sneak peek of the keyboard I assembled myself and typed this post on.

Want more? I'm not as good at forgetting to update @pushcx on Twitter.