How side effects make ActiveRecord models unreliable

Published: ← 2016-03-04 →
Category: ← Code →
Tags: ← ActiveRecord → ← Rails → ← side effects

The underlying faults in familar code that breaks in surprising ways

Let’s look at some code from an app that processes Orders. It’s not a simple online cart system, an Order for tens or hundreds of thousands of dollars is built up over weeks by customers and sales agents interacting. The LineItems might have many Adjustments that nudge up or down the price of an Order as the customers negotiate or prices from suppliers fluctuate.

Deep in the Order model I found:

class Order < ActiveRecord::Base
  has_many :line_items
  has_many :adjustments, through: :line_items

  # ...500 lines of code...

  def adjustments_subtotal
    sum = adjustments.sum('amount_cents')
    Money.new(sum)
  end
end

A simple two-line method, right? The first line finds the adjustments and fetches the overall sum. The second line wraps it up in Money object.

(If you don’t recognize that call to Money.new, the app uses Money-Rails, your best choice for handling amounts of money. It prevents subtle bugs caused by storing money in floats and, if your app needs it, keeps track of multiple currencies.)

This method looks totally boring, but I was looking at it because the accounting team was far from bored: they were freaked out.

Customers were sometimes paying the wrong amounts for their Orders. Or the accounting department was trying to bill the wrong amount. Wait a minute: the invoices the app gave to customers and accounting didn’t agree! Who was even right?

Soon enough the customers were freaked out because they’d finished negotiating an order but the salesmen were calling back to “double-check the numbers” and sometimes increasing the total. C’mon, you’re pulling this shady sales stunt even after I’ve got a printed invoice?

And accounting was getting even more freaked out and throwing around terms like “prior period adjustment” and “restatement” and “lawyers”. It turns out the government cares quite a bit that your books balance so they get paid the right amount in taxes. (Who knew?)

What went so wrong with this simple two-line method?

Well, I opened by saying I wanted to talk about why I want to write Adapters for ActiveRecord models, so you know the high-level answer is “side effects”. Let’s dig in.

Starting from the biggest symptom: the invoices that went to customers and accounting didn’t agree. Customers had theirs displayed after they finalized their Orders and tended to immediately print them out so their accounts payable can sit on them until the “net 60” period ends and it’s time to pay. The accounting team generally looks at the invoices when the checks come in.

So someone changed numbers in the month between those two, right? There’s a bunch of code to prevent someone changing an Order after it’s finalized, but a month is a long time for things to go wrong, it’s the likeliest place for a bug.

Except that as incorrect invoices got reconciled, accounting’s copy of the invoice was always right. The customer’s was always incorrect (and wow was that fun to explain to the customer). That’s the opposite of what we’d see if Orders were changed after they’re finalized.

Reconciliation also revealed that the invoices only differed in their Adjustments. And never in adjustments negotiated by sales, only the automatic ones that were applied based on supplier inventory, season, or favorite-customer discounts. An error that only occurs in an automatic process is very likely to be a bug rather than human error.

So I went over the code applying adjustments and their tests, but both were solid. I couldn’t even reproduce this bug manually.

I investigated from the outside in. The Order view displaying the invoice had two places where it dealt with adjustments:

# when displaying all the line items:
<% @order.line_items.each do |lineitem| %>
  <tr class="lineitem">
    <td><%= lineitem.description %></td>
    <td><%= lineitem.price %></td>
    <td><%= lineitem.quanity %></td>
    <td><%= lineitem.total %></td>
  </tr>
  <% lineitem.adjustments.each do |adjustment| %>
    <tr class="adjustment">
      <td><%= adjustment.description %></td>
      <td></td>
      <td></td>
      <td><%= adjustment.amount %></td>
    </tr>
  <% end %>
<% end %>

# and later, when displaying subtotals:
<tr class="subtotal-lineitems">
  <td>Line Items subtotal:</td>
  <td></td><td></td>
  <td><%= @order.lineitems_subtotal %></td>
</tr>
<tr class="subtotal-adjustments">
  <td>Adjustments subtotal:</td>
  <td></td><td></td>
  <td><%= @order.adjustments_subtotal %></td>
</tr>
<tr class="total">
  <td>Final total:</td>
  <td></td><td></td>
  <td><%= @order.total %></td>
</tr>

This looks tame. There’s no complex business logic that might conditionally hide anything. There’s also no crazy code updating models as they get displayed to give me flashbacks to my PHP days.

As long as they’re mentioned, here’s those two other Order methods that work with adjustments_subtotal:

class Order < ActiveRecord::Base
  def lineitems_subtotal
    sum = line_items.sum('total_cents')
    Money.new(sum)
  end

  def total
    lineitems_subtotal + adjustments_subtotal
  end
end

Let’s move from the view down into the controller:

class OrdersController < ActionController::Base
  def finalize
    @order = Order.find(params[:id])
    @order.finalize!
    CustomerMailer.order_finalized(@order).deliver
    render :invoice
  end
end

We could debate all day about whether that should be wrapped up in a service object (which is a big part of why I don’t like the term), but this is also boring. Let’s look at Order.finalize!:

class Order < ActiveRecord::Base
  def finalize!
    Order.transaction do
      adjustments.each(&:finalize!)
      line_items.each(&:finalize!)
      line_items.each(&:create_automatic_adjustments)
      self.status = :final
      save!
    end
  end
end

A bit more work, but there’s no conditionals and a transaction tells us that Orders weren’t saved halfway through a change and left in a weird state.

Adjustments are our big suspect and they’re up first:

class Adjustment < ActiveRecord::Base
  def finalize!
    self.status = :final
    save!
  end
end

OK, that’s boring, there must be something happening in those LineItem methods?

class LineItem < ActiveRecord::Base
  belongs_to :product

  def finalize!
    self.total = price * quantity
    self.status = :final
    save!
  end

  def create_automatic_adjustments
    if SupplierInventory.count_for(product) == 0
      Adjustment.create!({
        status: :final,
        description: "Restocking fee",
        amount: Adjustment::RESTOCK_FEE
      })
    end

    standing_dicount = order.customer.standing_discount_for(product.type)
    if standing_discount.present?
      Adjustment.create!({
        status: :final,
        description: "Standing discount",
        amount: standing_discount.amount
      })
    end

    # ...and more similar business rules with the same struture...
  end
end

Let me admit I simplified create_automatic_adjustments for this example: those conditionals were encapsulated in other methods and the Adjustment creation actually happened in a method on Adjustment. I wanted to save the distaction of following this code another layer or two down the stack.

And really, however it’s broken down, all this code is all obvious and boring. There’s no smoking gun. But believe it or not, you’ve seen the bug.

I’ve told you it’s caused by side effects and I’ve cleaned away a lot of distractions, but the bug is still subtle. If you want the practice, take a minute to scroll up and down the code before you go on to the answer.
Last chance to solve it for yourself.

The bug is happening between calls. The first call to adjustments in Order#finalize! triggered an SQL query, and cached it. Later, only for some customers and some Orders, LineItem#create_automatic_adjustments created Adjustments.

When the view got around to calling #adjustments_subtotal (directly and via #total) it used the cached list of Adjustments instead of find them all again. If the customer printed the invoice right then and there, they didn’t see those automatic Adjustments.

Five things kept this bug subtle and alive long enough to cause a lot of stress:

Customers had to print the very first view of an finalized invoice
Few Orders got automatic Adjustments
Customers were used to them and taxes, so they weren’t bothered to see an invoice change a small bit
Customers paid net 60, so accounting didn’t see it for two months
It struck at the end of a quarter when Order volume spiked up as sales cut deals to make their bonuses

The integrated tests failed for a few reasons: they trigged an extra page load, they they had their own copy of the Order model that had an empty cache, or they fetched the adjustments from the database to see that they were correct.

From the description of the bug, you might think the problem is caching and go to the one-line band-aid:

class Order < ActiveRecord::Base
  def adjustments_subtotal
    sum = adjustments(true).sum('amount_cents')
    Money.new(sum)
  end
end

All you have to do is add (true) to the end of the association call, which flushes the cache. Poof, bug gone. Right?

The bug is not gone, it’s lurking. Any other piece of code that looks through adjustments might need to flush the cache. We could add some more (true)s to our associatoins, or call .reload on the model. More explicitly, Order#finalize! could call to clear_association_cache after it called create_automatic_adjustments.

Oh, but there’s a subtle race condition remaining: if outside code had looked through an Order#adjustments before #finalize! and created one (or, less likely, even another user to the site), another user created one), that new Adjustment wouldn’t have its #finalize! method called. So we need another clear_association_cache.

I’m going on about what other pieces of code might do because that’s the real problem here: it’s possible to write totally innocent code elsewhere in the codebase and bring back this bug. What we need is a way to write software that is more predictable and reliable, that doesn’t need more and more integrated tests. Bugs like this aren’t in associations or caches, they’re lurking every place that any model has the side effects that look normal and familiar to us. I want to isolate them to Adapters because they’re always doing surprising things in caching, in updating the database, in tweaking other objects. When we have side effects we lose the ability to reason about what’s happening when in our programs.

This is the source of that fear you feel when you don’t know what other code might do to your models, or whether you’ll dig into a simple-looking call and find slow calls to an external network API four methods down, or wonder whether an ActiveRecord callback is a good idea but you can’t see any other way to make sure code runs every time a model is saved. All these things look innocuous and might be introduced months from now by someone who can’t possibly know all the consequences of tweaking a side effect.

What’s the real fix? Look back to the previous post about how to extract side effects to Adapters.

When ActiveRecord models don’t have side effects they become Entities. Those side effects all live in Adapters and Shells, which corral and minimize the parts of our code that are hardest to reason about. You can use and test your models without them having access to the database (or even knowing if they’re persisted in the first place), which makes their tests far easier: no saving to the database, no mocks, few or no stubs, and simple assertions that check the right value is returned from a method or that an Entity’s attributes have the correct values. That’s the value of Adapters.

Pop quiz, hotshot. Open a terminal to your Rails app and run:

grep -r '(true)|breloadb|clear_association_cache' app

This will find most of these band-aid fixes for side effect bugs. It will miss roundabout patches like order = Order.find(order.id)) and .save calls that are carefully planned to leave records in the database before another method happens to search for them. It will also miss tests that are “fixed” by changing FactoryGirl.build to FactoryGirl.create, but it still gives us a rough idea of how much pain there is.

Do you have a handful or were you scrolling the terminal back up to see them all?

I haven’t gathered statistics on these band-aids, but it when I look at lines of non-test code (run ‘rake stat’ and look at the leftmost column of numbers) in apps that feel reliable I’ll see 1 or 2 hits per 10,000 lines of app code. In the apps where I’m constantly firefighting I’ll see 5-20, and they tend to multiply. In the apps I want to restart from scratch I see even more. The highest I’ve seen so far is 39, and that app was only half-finished. I think the ideal number is probably zero.