The Well-Sorted Version «

Code: , , , ,

The Well-Sorted Version is a longtime project I recently finished. I wanted to blog a bit about the technical production of it, so please check out that link if you want this discussion to make any sense.

a page from Well Sorted Version

I’m glad I didn’t start this project ten or twenty years ago or it would’ve been an order of magnitude harder to prepare camera-ready copy for printing. Instead, I only had to send a PDF and a check to receive my 26 bound volumes. Let me work backwards from that and down into the lowest levels of what it took to produce the WSV.

I found Grimm Bindery after an exhaustive search of several hundred print-on-demand printers. Most had websites that made it clear they weren’t able to print with the quality and materials I needed, but I had to email and call dozens to find the handful that met my needs. I picked Grimm because they were the best-organized and had no trouble answering my many questions. If I were printing a run that was larger or using more common options, though, I think the only way to pick a printer would be to winnow down to finalists and have each print a single copy of your work so that you can judge based on actual output. POD is cheap enough that this is entirely reasonable.

Before I had a PDF to send them I had to write the code to generate it. I believe almost all books are produced with Adobe InDesign nowadays, but I was already familiar with the free LaTex typesetting system and it was easy to integrate into the alphabetizing code I was writing, so I never seriously considered it.

Typesetting the book was a long, interesting challenge. I had eventually had a 32 item task list for things to address, from laying out each type of text (book heading, chapter heading, verse and chapter numbers, main text) to laying out the cover to checking the letterspacing on every combination of lower and upper-case letters (104 in all). Accomplishing this meant finding TeX packages to achieve the effects I wanted (like Lettrine for the inset chapter numbers). As an aside, TeX produces significantly better output if you use semantic markup like \par instead of \hspace{0 pt}\newline.

TeX shows its age with some annoying misfeatures (still there for backwards compatability), though, like it assumes things are installed system-wide instead of bundled with your project, and it treats relative imports as relative to the directory you ran the tex command from rather than relative to file doing the include. In the end I had a shell script to work around these and generate the pdf:

set -e
cp input/wov.tex output/wov.tex
cp garamond/*ggm* .
TEXINPUTS=.:input//:/usr/share/texlive// /usr/bin/pdflatex -jobname kjv wov.tex
TEXINPUTS=.:output//:/usr/share/texlive// /usr/bin/pdflatex wov.tex
/usr/bin/pdflatex two-up.tex
#/usr/bin/pdflatex sample.tex
rm -f *.aux *.log *ggm*

That sample.tex you see commented out there was a simple test file I used for experimenting with packages or small snippets of markup. I also organized the bible by breaking out each book into its own .tex file that would be included by a master project file that included all the styling. When I wanted to test layout or book-specific issues I could comment out all the other includes and render a pdf with just one book (~3s) instead of the entire bible (~70s). Small, rapid iteration was vital as I pushed text around by 72nds of an inch.

I had to write a program to alphabetize TeX files with the bible in them. Here’s an example:

\BBook{The First Book of Moses, called Genesis}\BFont
\BChap{1}\BVerseOne{}In the beginning God created the heaven and the 
earth. \BVerse{2}And the earth was without form, and void; and darkness was
upon the face of the deep. And the Spirit of God moved upon the face 
of the waters.

The text is clearly there to alphabetize, but the alphabetization code had to be smart. It couldn’t alphabetize the text of commands (\BBook) but it DID have to alphabetize the arguments to some of those commands (\BBook but not \begin). I was tempted to do awful things with regular expressions (it’s a weakness) but instead I wrote my first real parser using Parslet.

I slowly built up the parser from individual rules to recognize commands and text. I ignored a lot of the complexity of TeX (like optional arguments and all math) because I was only interested in parsing this one set of files that didn’t use those things. A lot of this work was a dance between improving the parser and tweaking the source markup to make it easier to parse.

class Tex < Parslet::Parser
  rule(:backslash) { str '\\' }
  rule(:command) { (backslash >> match('[a-zA-Z]').repeat(1)).as(:command) >> option?.as(:options) }
  rule(:command?) { command.maybe }
  rule(:option)   { str('{') >> (command | match('[a-zA-Z0-9\[\]. ,\r\n]').repeat()).as(:option) >> str('}') }
  rule(:option?)  { option.repeat(0) }
  rule(:comment) { str('%') >> }
  rule(:texspace) { str('\\/').as(:texspace) }
  rule(:text) { (backslash.absent? >> any).repeat(1).as(:text) }
  rule(:line) { comment | (command | texspace | text).repeat }

Then there’s a bunch of glue code to schlep in the data from the files and out to sorted versions. This was originally a big imperative mess, but I used some of the ideas in Gary Bernhardt’s excellent Functional Core, Imperative Shell to separate out the logic of pulling out and replacing letters from reading and writing files.

I’m not going to paste all that code, but it looks pretty much like you’d expect. It pulls in all the files, parses them, extracts and sorts all the letters, and then pours them back into files. There’s also a couple hundred lines of specs.

I’ve wanted to type the words “the hard part was” several times while writing this, but really there were several large, hard parts: parsing TeX, typesetting, find a reliable printer, and having the endurance to keep moving forward one small step at a time over 14 months. This blog has been pretty quiet lately, in part because I’ve been busy with the WSV and in part because I’ve been discouraged by feeling like I can’t finish things. Now I’ve finished a big thing and I’m feeling really good about it, so fingers crossed for more blog posts. :)

You sit in a room for years making up a story & drawing little pictures & then someone asks “what was the hardest part?” … It was the years Bryan Lee O’Malley
The rather thick WSV


  1. The front cover has Bbe ehHi lloTy, but the spine has Bbe eHi lloTy, missing the first h.
    I hope those pictures are of a proof copy.

  2. Pingback: The Plan - Push cx

Leave a Reply

Your email address will not be published.