Hard Lessons

Having worked on email-related code before, I have been morbidly fascinated by one of the founders of handmade.network writing an email client. Handmade Network is trying to reinvigorate programming by emphasizing small teams and from-scratch performant code. It’s a great way to write small, self-contained projects (games, libraries, utilities) that can be done, but fell out of favor two decades ago for complex user-facing software.

This update included a few sentences I’ve been waiting for:

The biggest lesson is that not everyone is RFC-compliant. It was a shock seeing some companies accept ill-formed e-mail addresses, developers showing their best-but-still-inaccurate regular expressions for compliance, and security agents from company’s mail servers trumping simple IMAP requests that should have yielded a proper response, but didn’t. Look, I always knew commercial software packages don’t fully adhere to a spec—not even language compilers achieve 100% accuracy—but seeing violations led to unfortunate wrinkles and hard-coding in specific recovery points when I try to talk to some servers.

From what he lists, he’s only seen the tip of the iceberg. For example, he hasn’t mentioned some of the fun problems of IMAP or talked about the woes of email encoding and attachment. Specifically, this strategy of “seeing violations led to unfortunate wrinkles and hard-coding in specific recovery points when I try to talk to some servers” is really, really not going to scale. C is a tough language for the tower of abstractions he’s going to build and rebuild in the face of unexpected inputs and dusty corners of the spec.

And email is a particularly hard domain because it’s old and *looks* simple, so there’s an incredible amount of errors you have to cope with from version 0.1. Users will never accept “Yeah, you just can’t read email from people using Outlook, it’s Microsoft’s bug.” And then on top of all that, many emails are shifting to HTML-only with increasing expectations of CSS support and you’re implementing or talking to a huge browser engine. Email was a big factor in ending my support for Postel’s maxim.

It might be another 5 months before I reach a working prototype for that [GUI], and probably another two months of polish before I consider the possibility of releasing some build publicly.

I wish him a lot of luck and there’s a tiny, windmill-tilting bit of me that hopes he’ll succeed, but I’m watching this race for the crash.

Posted in Code at 2016-09-28 10:39 | No comments | Tags: ,

Queue Zero

Almost exactly a year ago, I posted about Sizing Up My Queue to count up how much video and audio I had downloaded to watch. The final tally?

queue: 877 files, 674007 seconds = 7d:19h:13m:27s total duration

I’ve been running that script almost every day, and for the first time it said:

queue: 0 files, 0 seconds = 0d:00h:00m:00s total duration

I did track values over time, and after a lot of frustration LibreOffice permitted this hideous graph – the Y axis is how many days of media remain:

media-queue
  • Most of the early drop was me shrugging and saying “yeah, OK, I’m really not interested enough in that podcast to actually listen to it”
  • The half-day jump in December is when I fixed the script to include .mov files
  • Big gap and accumulation in March/April is when I was working on my talk and book
  • About half of the drop at the end was archiving a video site I finished scraping
  • I almost exclusively listen to podcasts when doing chores or playing video games, so I’d have hit zero a month earlier if I didn’t play ~65 hours of Crypt of the Necrodancer
  • The last file was the recording of my 2015 RailsConf talk – watching my own presentations really makes me squirm, though it’s invaluable for improving as a speaker

This was a fun little project. There’s still a few thousand files in ~/queue. It’s a bit of a junk drawer (games waiting for me to have Windows again, photos to file away, archived web pages), but the majority of it is books and papers. I suppose next I could write a script to take the word count of epub/mobi/pdf/html files… it’d be a bit of fiddling running different commands to dump word counts from the various formats, but it could work.

Well, it could work in that, yes, I could technically write that script. I’ve known for decades that my to-read list has been growing faster than I read.

Posted in Life at 2016-06-15 09:32 | No comments | Tags: , , ,

Vim: highlight word wrap column in insert mode

I like vim’s colorcolumn for highlighting where word wrap will occur, but I consider it a distraction when I’m not in insert mode. After some tinkering, I wrote this in my .vimrc:

" highlight textwidth column in insert mode
highlight ColorColumn ctermbg=0*
function! HighlightOn()
  if &textwidth > 0
    " the +1 feature in the 'colorcolumn' docs doesn't work for me
    let &colorcolumn=&textwidth + 1
  else
    let &colorcolumn=""
  endif
endfunction
autocmd InsertEnter * :call HighlightOn()
autocmd InsertLeave * let &colorcolumn=""

That note about +1 is me working around a bug. I should be able to just write:

" highlight textwidth column in insert mode highlight ColorColumn ctermbg=0* autocmd InsertEnter * let &colorcolumn=+1 autocmd InsertLeave * let &colorcolumn=""

Unfortunately, some tweak or plugin breaks this feature for me. I wrote this workaround rather than diagnose and fix it properly because the process just seemed too tedious.

Posted in Code at 2016-06-09 13:07 | No comments | Tags:

Recursive Sum

In #ruby on Freenode, platzhirsch asked about how to total an array of Transactions when the Transactions may have parents. The two obvious approaches have pitfalls when there are a lot of Transactions, and he said he expects to have 23 million that may be deeply nested. Here’s his sample code:

 
class Transaction
  attr_accessor :amount, :parent
 
  def initialize(amount, parent)
    self.amount = amount
    self.parent = parent
  end
end
 
t1 = Transaction.new(100, nil)
t2 = Transaction.new(250, t1)
t3 = Transaction.new(300, t2)
 
current = t3
all = []
while current != nil
  all << current
  current = current.parent
end
total = all.inject(0) { |total, transaction| total + transaction.amount }

Spoiler warning: last chance to solve it for yourself before I dig into the solutions.

This code sample expressed the first obvious solution: build a list of all the transactions. The problem is that you’ll spend RAM and time building a data structure you expect to use once.

One person offered an addition to Transaction to solve it recursively, the second obvious approach:

 
class Transaction
  def total_amount
    (parent ? parent.total_amount : 0) + amount
  end
end

This pitfall is that this risks blowing the stack when Transactions are deeply nested: recurse too many times and you’ll run out of RAM. It’s also super-specialized, if you want to do anything else with every Transaction you’ll have to write another custom method. And despite the specialization, you might end up writing this logic again if you have a collection of child transactions:

 
t4 = Transaction.new(400, nil)
 
total = [t3, t4].inject(0) { |sum, t| sum + t.amount }

Here’s my approach:

 
# because Transaction doesn't have any logic, I made a shorter version:
Transaction = Struct.new(:amount, :parent)
 
# And I like using powers of two when testing recursion, because the sum
# will come out obviously different for different combinations of items:
t1 = Transaction.new 1, nil
t2 = Transaction.new 2, t1
t3 = Transaction.new 4, t2
t4 = Transaction.new 8, nil
 
TransactionEnumerator = Struct.new(:collection) do
  include Enumerable
 
  def each
    collection.each do |t|
      yield t
      yield t while t = t.parent
    end
  end
end
 
ts = TransactionEnumerator.new [t3, t4]
total = ts.inject(0) { |total, transaction| total + transaction.amount }

This little wrapper doesn’t recurse, doesn’t duplicate Transactions, and doesn’t build a data structure. It can work on any collection of Transactions that exposes each, the sum logic is expressed only once and separately from the control flow, and it provides the powerful Ruby Enumerable interface.

Hope you enjoyed this little puzzle! If you had an alternate solution, please wrap it in <code></code> tags below.

And let me tag on my own fun exercise: add an ID field to Transaction and implement TransactionEnumerator#uniq yield the transactions exactly once, so this returns true (Array has #uniq, but you shouldn’t assume the collection is an Array):

 
TransactionEnumerator.new([t1, t1]).uniq == [t1]
Posted in Code at 2016-02-13 09:36 | No comments | Tags: ,

2016 Media Reviews

I’ve appreciated when people take the time to write reviews and highlight connections to other good works. This post will be regularly updated through 2016. Previously: 2014 2015

Continue this post…
Posted in Life at 2016-02-04 13:26 | No comments | Tags: , ,

The Plan

In January I planned to blog every two weeks. This is my 26th post of 2015. A few of them were finished last-minute, but they were finished. It was a great writing exercise, and I’m going to let my posting frequency drop a bit as I write elsewhere.

I didn’t mention the plan because I didn’t want to jinx it. When I worked on the Well-Sorted Version I didn’t talk about it. I sometimes said I was working on an art project. When the final printing was in progress I slipped and called it “my book” a couple times, but otherwise I only said anything about it after the boxes arrived.

books-arrived

When I started the WSV I was scared I wouldn’t finish it. I’m pretty sure I got the idea from Derek Sivers writing “Keep your ideas to yourself“: talking about a project acts relieves the pressure to finish it. The idea seemed sound, the research was plausible, so I shut my mouth and worked. I succeeded.

Lately I’ve been thinking a lot about projects and my long-term plans. I’ve thought hard about my motivations, picked my goals, winnowed my projects, and planned my systems.

There is a particular glory to a called shot, and I’ve envisioned at least a decade of work. So the compromise is this: I’m going to name these projects, but it’s the last I’m going to say about them until they’re well on the way to completion.

  • Fulcrum
  • Solver
  • Formula
  • Eleven
  • Control
  • Workbook
  • bhsh
  • From A to B
  • Math
  • Bibliography
  • Glossary
  • Edit
  • Terminal
  • Historiography
  • Typesetter
  • Hypertext

Life happens. Maybe a project will be superceded by someone else’s work, maybe I’ll add a good idea along the way. But I’m certain I want to see these done.

If you’ll excuse me, I have some work to enjoy.

Posted in Life at 2015-12-28 19:06 | No comments | Tags: ,

Advice for First-Time Attendees to MicroConf

In April, I attented MicroConf. The talks and conversations were invaluable to my business. As the tickets for MicroConf 2016 are going on sale shortly, I wanted to write up advice for first-time attendees, especially those who are early in their business and want to learn a lot.

If you’re attending, read the “Preparation” section now to get ready. The rest is best read in the days before the conference to start off right, but might also be useful if you’re on the fence about whether to get a ticket. But the short version is that if you’re starting or growing a tech-related, self-funded business, yes, you absolutely want to go.

Preparation

First, read How to Win Friends and Influence People. This mindset of being curious about and generous to people is exactly right for MicroConf. You’ll get practical advice that significantly improves your experience (not to mention the rest of your life). Prefer the original version to the 1970s revised edition, but don’t skip it if you can’t find the original.

The only other thing you need to do more than a week in advance is get business cards. There’s only ~250 attendees and you’ll meet a quarter to half of them so you don’t need a ton, but really don’t want to be without. Leave the back blank and stick a pen in your pocket. Then, when you talk to someone, write them a note to jog their memory about who you are, what you’d like to hear more about, and what you can do for them. This will dramatically increase the quality of the connections you make and the conversations you have after the conference.

Think hard about what you’re working on or could be working on, and choose the most important thing for the “business” field when you register. This will be printed on your badge, so almost everyone will ask you about it.

Speaking of your badge, when you pick it up, they’ll offer a ribbon you can stick on reading “First-Time Attendee”. Don’t decline it because you feel self-conscious. People will use it as a conversation starter and be happy to see you; no one will roll their eyes at the newbie.

You should prepare a short description of each of your business projects. Not an “elevator pitch”, it’s rude to try to sell to other attendees, but a few sentences to introduce people to what you’re doing and what you’d like advice on. It’s worthwhile to finish by saying that you’re looking for practical, unfiltered advice rather than polite questions.

Some questions are very common, so also think of what your answers to them will be:

  • Has that launched?
  • Who are your users? How many do you have? Are they all paying customers?
  • Are you full-time on it?
  • Where are you going with that?
  • What’s your goal for the business?
  • What’s your marketing plan?
  • What brought you to MicroConf?
  • What’s your revenue? (It’s OK to decline to answer or be vague, but there’s a trusting atmosphere and sharing more will get you better info. Likewise, keep the revenue/profit amounts that the speakers or other attendees share with you private.)

Some questions you should be ready to ask are:

  • Design questions specific to your business
  • How should I market this?
  • What else can I offer my customers?
  • What’s going to get your unlaunched business bringing in revenue?
  • What’s going to make your existing business many times more profitable?

Mark three hours on your calendar for the day after the conference to reread your notes, move your to-do list into whatever you use to track to-dos, and email all the people you met.

Presentation

The talks are generally goldmines of real business experience. Most slides will be available after (though unfortunately not every speaker will mention this in the first minute…), and if the trend of previous MicroConfs holds, someone will be taking detailed notes.

So you should plan to take notes on things that are especially interesting to you or relevant to your business. Don’t try to take everything down, it’ll distract you from learning. If you’re using a laptop, close your email, close Slack, close Twitter, close everything that might distract you.

The conference will probably have an official “backchannel” website or iOS/Android app for attendees to chat amongst themselves. I was frustrated in 2015 because didn’t have an iOS/Android device to follow it, so I missed out on kibitzing and some social planning. This year I plan to try setting up an Android emulator so I can get it, though maybe it’ll be easier to buy a cheap Android phone to use a few days, I dunno. Phones are awful.

In any case, Twitter will also be a backchannel. Set your client to search for “MicroConf” and “MicroConf2016”.

As you listen to talks you’ll start putting things on your to-do list. Don’t intersperse these with your talk notes, you want all the to-dos in one list so you can review them after the conference.

The Hallway Track

This is the best part of MicroConf. While it’s fun to meet the people who are famous in our little community, it’s not worth your time to seek them out. You’ll get far more out of a random chat with someone you’re surprised to learn has done something exactly like what you’re doing, and the event is small enough you’ll run into the famous people anyways.

MicroConf is not mercenary. Don’t pitch, even if your thing is valuable for other entrepreneurs.

When you talk to people, take notes during the conversation. It’s not rude, it’s practical. Get their name, business, major challenges, and top-of-mind topics. Follow-up after the conference with more information, advice, or questions for them. Don’t forget to give them your card.

Use the talks as conversation topics, but mostly think about and ask how you can help them. Think about who you could introduce them to here at the conference or by email. If you thought it sounded silly and skipped it, seriously, read How to Win Friends and Influence People. The conference is about generous collaboration, be prepared to give as much as you get.

If you need polite conversation-enders: say that you’re going to wander around some more, you see someone you’re meaning to catch up with, you want a cup of coffee, you’re heading to the restroom, or simply, “It was good talking to you, I’ll send you an email about [topic].”

Then look for an unfamiliar face and do it again. It’s loose and unstructured and far more social than technical conferences. If you keep talking to new people you will have conversations that significantly improve your business and your life.

Evening Socializing

In the evenings folks head out to dinner and social activities in groups. These are generally informal, arranged after the last talk as clumps of people head out. It’s not impolite to ask a group where they’re headed and if they’d like one more person.

If you’re organizing a group, don’t have a democratic group where you try to poll everyone about what they’d like to eat and reach a considered consensus. If someone isn’t the leader of the group, you are the leader. Nobody cares much where they go as long as they can keep talking. Pick a place you can walk to on Yelp, say that’s the plan to give anyone with dietary restrictions a chance to object, and go.

If your clump of people has 4-6, go, don’t wait to grab one more person. A restaurant can easily seat up to 6 people without a reservation. And past 6 you get coordination issues because in the time it takes someone who wants to stop off and do a thing or grab one more person or finish a conversation, another person will think of some little thing they need to do and you’ll all stand around until you starve to death. If you walk into a restaurant and it’s too noisy for easy conversation, turn right back around and eat next door.

After dinner, check the backchannel or contact someone you met earlier to ask what they’re up to. People will often socialize over drinks, in the hotel bar/restaurant, or at a gambling table, so you’ll recognize faces and can join groups by walking around. No one will pressure you to drink alcohol or gamble, even if you are the only person in the group not doing so. I was at a craps table with a handful of people who were drinking and betting $20-200 per roll and nobody batted an eye that I was sipping water, totally ignorant of the rules, not betting, happily chatting about life and business, or that I called it a night earlier than they did.

Don’t go to play casino poker in a group. There’s always a bunch of people who want to play poker, and a casino will not seat you all together (you might be cheating collaborators) or start a private game for you (apparently that’s illegal). You’d walk all the way over there, get shot down, and then stand around at loose ends. This happened to at least three groups last year.

Leave time at the end of the night to glance at your notes and flesh out anything that’s occurred to you or that you can jot down about the people you met. Then go to bed early enough that you can get up and do it all again.

After the conference, email the people you met to talk about what they’re doing, what they told you, and thank them for your conversation. If the previous 1,600 words haven’t made it clear, it’s all about the people. Humans are socially driven, so the connections you make and community you participate in will is vital to your success. That’s the moral of MicroConf.


That’s all I got about MicroConf, I hope to see you there. And if you’re living in Chicago and reading this, you are now required to email me at ph@ this domain so we can meet up.

Posted in Biz at 2015-12-14 04:09 | No comments | Tags:

Peter Bhat Harkins

On the happy occasion of of our marriage, my spouse and I have adopted the shared last name Bhat Harkins. (I’m also dropping my little-used middle name.) Please do us the favor of updating your contact lists and email clients, and we’ll get started on the exciting task of updating all the state agencies, businesses, and sites over the next few weeks. Thanks!

(Though it may bring some hassles due to poor programming, we’re not hyphenating as double-barrreled names are not part of either of our traditions.)

Posted in Life at 2015-11-30 00:24 | 1 comment | Tags:

Battery Longevity

I switched to a Lenovo X1 Carbon (3rd gen) in January, and one of the delights of a new laptop was a new laptop battery. I chuckle when I get a stern notification that my battery is running low: it’s fallen to 20% charge! And it can only last for another… two hours and ten minutes. Well, I’m not in a big hurry to find a plug when I see that.

In September I read a story titled The Life and Death of a Laptop Battery, with a grim chart of a laptop battery losing charging capacity:

grim graph

That only covers two years!

The lifetime of a lithium ion battery is reduced by the cycle of charge and discharge… but also by being completely discharged, and by being completely charged. (Older readers may remember being told to occasionally completely discharge batteries — that was Nickel-metal Hydride Batteries that had a memory effect; it’s bad for modern batteries.) Manufacturers put a circuit in to stop discharge shortly before hitting 0% charge (though they present this stop point as 0%), but with every reviewer judging devices based on battery life they’ll charge the battery as high as possible.

I wanted a graph to track my battery status, so I installed Petter Reinholdtsen’s battery-status package to my home directory, tweaking the battery-status-collect and battery-status-graph to point to a log file in my home directory, and the sleep.d config files to point to my personal install, like 20_hjemmenett-battery-status:

 
#!/bin/sh
# Action script to collect battery status just before and just after
# going to sleep.
#
# Copyright: Copyright (c) 2013 Petter Reinholdtsen
# License:   GPL-2+
#
 
PATH=/sbin:/usr/sbin:/bin:/usr/bin
 
case "1" in
        hibernate|resume|thaw)
		if [ -x /home/pushcx/.bin/battery-status-collect ]; then
		    /home/pushcx/.bin/battery-status-collect
		fi
                ;;
esac

All-in-all that required three config files to make sure the battery status is recorded regularly as the machine sleeps and wakes (and one fiddly bit of debugging):

  1. /etc/pm/power.d/20_hjemmenett-battery-status
  2. /etc/pm/sleep.d/20_hjemmenett-battery-status
  3. /etc/pm/utils/sleep.d/20_hjemmenett-battery-status

That’s it for monitoring.

My next step was to reduce charge cycling and time spent at maximum charge. For my Lenovo X1 Carbon 3rd Gen, I had to install tpacpi-bat and its system dependency acpi_call.

I installed tpacpi-bat from AUR using aura:

 
$ sudo aura -A tpacpi-bat

Then I tweaked /usr/lib/systemd/system/tpacpi-bat.service to add two ExecStop lines so that stopping the service sets the battery to immediately charge fully. I switched to Arch to get back in touch with low-level system settings, so this was a nice excuse to learn more about systemd unit files. (I found “ExecStop” in the man page for systemd.unit.)

[Unit] Description=sets battery thresholds [Service] Type=oneshot RemainAfterExit=yes ExecStart=/usr/bin/tpacpi-bat -s ST 0 40 ExecStart=/usr/bin/tpacpi-bat -s SP 0 80 ExecStop=/usr/bin/tpacpi-bat -s ST 0 0 ExecStop=/usr/bin/tpacpi-bat -s SP 0 0 [Install] WantedBy=multi-user.target

Finally, I enabled it at startup and started it:

 
$ sudo systemctl enable tpacpi-bat
$ sudo systemctl start tpacpi-bat

One quirk: when the laptop is running off the charger but not charging the battery, acpi -s and cbatticon both say the battery status is “Unknown” instead of the usual “Charging” or “Discharging”. I haven’t looked into patching those to say “Not charging” or “Powered” or something.

Otherwise, it’s worked great the last 8 weeks. You can see near the left side of the graph where I started using tpacpi-bat and two periods in the middle where I had it off for maximum charge:

my battery graph

There’s a lot of noise in the red battery capacity line, but I hope it’ll stay quite level. Unfortunately this is an experiment with no control group: I don’t know what it would look like if I’d never made this change. But charging to 80% is only a minor inconvenience and charging fully, mostly for plane flights, is one command (sudo systemctl stop tpacpi-bat), so it will very likely be worth it over the next few years.

Posted in Code at 2015-11-16 20:53 | No comments | Tags: ,

Have You Seen This Cache?

It looks like syntax highlighting, image thumbnails, and compiling object files. Let me explain.

 
$ time vi -i NONE -u NONE app/models/god_object.rb -c ":quit"
 
real    0m0.020s
user    0m0.010s
sys     0m0.007s

The client’s GodObject is 2,253 lines long and Vim takes .020 seconds to load it.

 
$ time vi -i NONE -u NONE --cmd "syn on" app/models/god_object.rb -c ":quit"
 
real    0m0.079s
user    0m0.070s
sys     0m0.007s

Syntax highlighting adds .059 seconds. A twentieth of a second is barely noticeable to humans. At twice the speed of the fastest blink it feels like the the smallest possible pause.

That was enough time to plant the seed of this idea.

A function is “referentially transparent” when it depends only on its arguments and, if it’s run again, any later call with the same arguments could be replaced by the value returned by the first call.

Common referentially transparent functions do things like perform arithmatic, split a string into an array, or parse the bytes of a file into a data structure representing how to color Ruby source code.

That last one is exactly the situation Vim is in: there’s some uncertainty to reading a file off disk, maybe it’s there one run and not the next, but somewhere downstream there’s a function that takes the contents of the file as its argument and returns a data structure annotating where every token starts and ends so that the frontend can highlight them in the proper colors. Any time this function is given the same bytes it generates the same data structure.

It doesn’t care what day of the week it is, how many rows are in my postgres tables, what a random number generator invents, or anything else. Stable input equals stable output.

This is very similar to a key -> value dictionary. The key is the arguments to the function. The value is whatever the function returns for those keys. Looking up the answer is the same as calculating it and, indeed, many dictionaries can be used as caches this way. For an arithmetic example in Ruby:

 
square_of = Hash.new do |hash, key|
  hash[key] = key * key
end
 
square_of[3] # => 9

When you call square_of[19] you might be running a function, you might be retrieving a cached value. It doesn’t matter unless you have a practical reason to care about the details of CPU and memory usage. This isn’t useful for a simple operation like squaring numbers, but when there’s thousands of slow steps it’s quite valuable.

Every time I open god_object.rb in vim it reparses the Ruby to figure out how to highlight it. Even if the data hasn’t changed, the function runs again. It’s referentially transparent, it’s slow enough to be noticeable, so why not cache it?

Well, maintaining this kind of cache (a “read-through cache”) has a lot of busywork. Aside from the reading and writing to some data structure, there has to be an eviction policy to determine when to throw away data that’s unlikely to be requested or to free up room for new data. People get grumpy when their text editor or web browser swells to eat two gigabytes of RAM, and they don’t connect this to usage being 10 or 50% faster as the program avoids repeating work.

Additionally, Vim would really like that cache to persist across program runs. Why re-parse a file that hasn’t changed because someone quit Vim for a few minutes?

This prompts a whole new round of busywork managing disk quota and, as large as hard drives are getting, you’d have increased hassles because a program wouldn’t be able to free up space until it happened to run again.

I was kicking this around in my head, and I realized I’d seen it done before.

When I browse my folders and see thumbnails for images, they’re stored in ~/.cache/thumbnails so that when I re-open the folder they appear instantly instead of taking a half-second per file.

When I build a C or C++ project, the compiler outputs a bunch of object (.o) files, one per input source code. If I build the project a second time, only the source files that have changed are rebuilt (though this is based on the timestamp on the source code rather than its contents – with a whole host of predictable bugs ensuing).

In fact, Python is quite similar to Ruby and generates .pyc files to cache its compilation of source code.

Which reminds me, every time I start rails server to load up my development server for this client, Ruby has to re-parse source code like Vim. (That’s not to say they should share a cache, they build different data structures and don’t want to have to synchronize releases, but it’s the same problem again.) Wait, how many files is that each time?

 
$ bundle clean --force
$ find app lib -name "*\.rb" | wc -l
750
$ find $GEM_HOME/gems -wholename "*/lib/*\.rb" | wc -l
6247

Oh, it’s 6,997 files. That’s going to take a little while. And Ruby’s going to do it all from scratch every time it starts, even though the parsing is a referentially transparent, temptingly cacheable function.

Over in the web world, there’s a really nice cache system available called memcached that’s often used in a read-through cache. Memcached is a key -&gt value store. Memcached will evict data from the cache when it needs room, generally on a “Least Recently Used” (LRU) basis as old data is least likely to be asked for again. The usual memcached use looks like this with the dalli gem:

 
def action
  key = request.url
  page = Rails.cache.fetch key do
    # page wasn't found, so generate it
    # whatever the block returns is cached under the key,
    # and is returned for the `page` variable
  end
  render html: page, layout: false
end

Let me generalize that a little:

 
def read_or_generate *args, &blk
  key = md5sum(*args.map(:&to_s).join)
  Rails.cache.fetch key, &blk
end
 
def action
  page = read_or_generate request.url do
    # generate and return page, may not be called
  end
  render html: page, layout: false
end

Squint a little and this is our pattern again: read_or_generate takes arguments and generates or retrieves the value; we don’t care which happens. (And squint a lot more for the fact that the block is unlikely to be referentially transparent; it probably queries a database but that input is stable until the cache is deliberately cleared, or “stable enough” until it expires.)

I’d like to see a filesystem-level cache like this for Vim, for Ruby, for Python, for C, for every random program that has a referentially transparent function that might as well be a cached value. It’s enough functionality that an individual program doesn’t want to take on the problem, it wants to call a cache system. (The programs that do so usually dump to files like the image thumbnails and object files, ignoring expiration: browsing my 556M thumbnail folder shows tons of images I deleted months ago; `find ~ -name “*\.o” | wc -l` turns up 1,020 object files littered through my home directory.)

The computer would run a daemon like memcached that saved keys to disks, managed expiration, and kept the buffer to a particular size. Vim doesn’t have to take on the whole problem and I don’t have to run out of disk space because a program cached two gigs of data when I last ran it a year ago.

I went looking for this software and couldn’t find it. I’d love to set aside a gig or two of disk space to faster operations and having my directories free of .o and .pyc clutter. There’d have to be some locking (like holding file handles) so that when, say, gcc finishes compiling 30 files, it doesn’t go to link them into a binary only to find that half of them have been evicted from the cache because I was downloading podcasts at the same time.

Does this system sound useful to you?

Before you answer, I thought of something clever for a second version.

Back when Vim read god_object.rb off the system, the kernel did quite a bit of clever caching to speed up reads. The short version is that the kernel caches recently file reads and writes in RAM. Rather than allocate some amount of RAM for this, the kernel uses all the free RAM that programs haven’t asked for. When a program requests more RAM, the kernel shrinks the file cache and gives RAM to the program. There’s as much room for the cache as possible, and when there’s no room free everything continues to work (but slower).

This cache system I’m considering gets a nice benefit from this feature: if Vim caches the couple kilobytes of parsed Ruby code, it’ll probably be accessed via very fast RAM instead of even having to have the disk. The kernel has lots of very clever and reliable code for doing this responsibly, it’s a wheel that shouldn’t be reinvented.

But the clever thing is that if this cache system were in the kernel, it could use all free disk space as a cache like the kernel file cache uses all free RAM. There’d be no fixed-sized allocation to weigh convenience against resources.

This seems like a nice big win to me. Enough of one that I’m puzzled that I haven’t seen anything like it. Maybe I’m not searching well, maybe I haven’t explored enough unix esoterica. Would anyone be able to point me to something like this?

Or be able to build it with me?

Posted in Code at 2015-11-02 23:38 | 2 comments | Tags: , , , , ,
  • Twitter: pushcx

  • More tweets below and @pushcx