Compressing Sudoku

person solving sudoku in a book At least a few times a week I see someone playing Sudoku while commuting on the L, and it’s always annoyed me. The problem is that Sudoku is a small 9x9 grid with 10-20 numbers on it, and a book is a chunk of dead tree. How inefficient! There’s a tiny amount of very compressable data in a large, uncompressable phsyical object. I can’t do anything about the book, but I could imagine one with a grid on the back, a dry-erase marker in the spine, and two dozen puzzles per page in some kind of notation. I’m not going to start publishing specialty books, but I’d like to know what that notation will be.

Example Sudoku Puzzle The straightforward approach is to look at the puzzle as a Cartesian grid, as seen here. (And I’m going to skip obviously bad approaches, like having a string with a character for each cell, eg. “53--7----6...”.)

The problem with numbers is that it’s confusing to keep track of which are labels and which are game symbols. It’s a step in the right direction to replace those numbers with letters, though the partial step of replacing only one axis with letters is actually a big step backwards -- who can remember which it is?

A problem with coordinates is that while nearly everyone will number the x-axis left-to-right, the y-axis could plausibly go either way. The puzzle will still work, but it makes it awkward to compare with someone else when they’ve done the puzzle upside-down. And really, they’re no sense in doing the puzzles if it’s inconvenient to gloat. Another problem for coordinates is that lots of people simply don’t understand them. They don’t know which is x or y, and they have terrifying flashbacks to high school algebra. Sudoku already looks dangerously enough like math. Coordinates, even with letters, don’t work.

Example Sudoku Puzzle{.content width=”302” height=”254”}

This is my approach, drawing on what makes Sudoku more than just a grid. Each region is lettered and (though it’s only shown in C) the nine cells in each region are labeled in the same pattern, lowercase. So the Harkins Notation for this puzzle is: Aa5b3d6h9i8 Bb7d1e9f5 Ch6 Da8d4g7 Eb6d8f3h2 Fc3f1i6 Gb6 Hd4e1f9h8 Ia2b8f5h7i9. It may not quite fit on one line, but it’s decently compact and straightforward. An especially nice feature is that it’s also easy to go from puzzle to notation.

[2006-02-16]{.date} Snarky has done what I didn’t bother to do -- solve the puzzle I used as an example. He gave me the answer in my own notation, making it clear that it’s worth removing the lowercase letters when you have all of the cells in a region. So the solution to the example above is: A534672198 B678195342 C912348567 D859426713 E761853924 F423791856 G961287345 H537419286 I284635179