Should multiple Rails Apps and APIs share a database?

Published: ← 2016-03-05 →
Category: ← Code →
Tags: ← ActiveRecord ← Rails →

Having two (or more) Rails apps/APIs talk to the same database is a bad idea I’ve seen a few times in the last few years at larger/older clients. While it is a nice project for a consultant like me to undo (long but not risky = good profitability), you’re better off not doing it in the first place.

The technical approach is simple: you set up database.yml (or DATABASE_URL in the environment variables) to point to the same database. Voila! You’re sharing the database between two apps.

Except the only constant is change, and now you have to revise two codebases in sync. Let’s look at some common problems and their problem-causing solutions:

Migrations: Which app do they live in? “Both” is the most common answer when the two apps are maintained by different teams. Someone who only cares about one app has to remember to have a copy of the other and run migrations or their local development will mysteriously be out of sync with prod.

schema.rb: Which is canonical? The answer is both must be, which means they need to be in sync. Whenever on app has run a migration the other hasn’t, they’re not in sync. Hopefully you can limit this to just a few seconds during deployments. Now, I know the git experts are saying:

Git submodules/subtrees: I can guarantee you will not commit submodules and their dependencies correctly. No one can, the command line UI is best understood as a temple from an Indiana Jones movie: it has a glittering treasure, but every ordinary-seeming hallway is full of deathtraps. Subtrees are more reliable, but you are going to make merge conflicts a part of your daily life. I’m not aware of any of the pretty git GUIs supporting them either, so you’re looking at extra command-line training for your entire team. Oh, but we’ve missed something important:

Models: they have to stay in sync. Or shouldn’t. What? They have to stay in perfect sync on validations because Rails doesn’t have good support for database constraints, so any time the two have different validations one of the apps will be surprised and dismayed to learn that Model.find(123).valid? can return false. If the two apps have different purposes for models they can (and should!) have different code in models, though devs who work on both will have a hard time remembering that Order#finalize only sends email confirmations on the app but not the API because the API is used by dropshippers who handle their own customer engagement, etc. ActiveRecord violates Single Responsibility Principle, so really what you need to do is understand every purpose of models (user input sanitization, database access, data normalization, domain logic, subtype junk drawer…) and keep exactly the right subset of them in sync.

Gems: Move the migrations and models into a private shared gem. Now you have a third git repo to keep commits in sync with… with the bonus that you have to manually maintain the Gemfile’s version specification. And your deployment story needs to account for having read access to your GitHub repo. And now a bunch of your dependencies are either transitively loaded from the gem’s gemspec (so anyone touching the gem needs to know how they’re used in the apps) or repeated in the app Gemfiles (they’ll drift out of perfect sync and you will have some incredibly subtle and awful bugs when two versions are loaded at once). But there’s still one last handhold to scrabble at as we fall off the cliffs of despair:

One repo to rule them all: Surprisingly decent! Put both apps in one git repo and have one symlink what it needs from the other: schema.rb, db/migrate, app/models, spec/models, maybe parts of lib. You can’t do different things in the models unless you’re really un-railsy kinda–bernhardty about how you use AR models, but that may not be a huge loss. I haven’t had to put up with Windows professionally for a few years, but I think its symlink support is still not great. I have seen this approach work if Windows is not a deal-breaker.

OK, Smartypants Mc GrizzledVeteran, what’s the right approach?

Let’s take a step back. There’s a lot of pain, and everything we can think of doing to address it comes with its own pains. This is usually a sign that we’re violating a fundamental assumption somewhere.

Rails assumes that it’s the only app talking to the database and that it controls access to the database. It enforces validations and other database concerns up at the app layer.

One thing worth recognizing is that your app probably already talks to many databases: you can think of contacting some third party’s API as accessing their database via the wrapper of their API. Many apps access that database, but only one has direct access and it has the responsibility of enforcing lots of business rules that would otherwise be duplicated in those apps. This points us towards two viable approahces:

First, put one Rails app in charge of database access and make the rest consumers of that app’s API. Use ActiveResource or one of the other API consumption gems to hit the API. You can rev separately, model data separately, and you don’t have to keep them in lockstep sync. The cost of the trade-off is you have a slower web app with new failure modes like API down/slow, breaking changes, etc. But you’re eating your own dogfood on the API, which is good if you’re providing an API to external customers.

The better approach is to scrap the idea of separate apps. Have one Rails monolith that servers your app and API needs. I know, microservices are so cool and you want to be able to scale them separately because the API is so much thinner than the app, but you need a scaling plan for your app anyways, and it’s worth minimizing moving parts by having one generic “app server” that you crank the dial on. Everything’s tightly coupled and poorly delineated: that’s the Rails Way.

Yeah, that was kind of snarky, but it’s a not-too-weird thing to do (especially if only one app writes and the others only read) and all the approaches have non-obvious failure modes because they’re violating a Rails assumption. The worst part is that because of that assumption Rails made some design choices that lead to unreliable database use – but that’s a topic for a new talk I have out for consideration by a few Ruby conferences now.