Caching Dictionaries in Python vs. Ruby «
»


Code: , , ,
8 comments

A while ago I made a slightly-underinformed post (see the corrections in the comments) trying to draw a difference between Python and Ruby. I’ve finally got a decent example and can explain what I’m getting at.

I’m processing all the items in a big list, and part of that is performing some expensive calculation on an item attribute. There are only a few dozen values for the attribute, so I cache them:

results = {}
 
for item in really_big_list:
    if item.attribute in results:
        result = results[item.attribute]
    else:
        result = expensive_calculation(item.attribute)
    # ... do something useful with item and result

It’s simple and readable, but it’s also lengthy. If I’m doing several expensive calculations on different attributes (and I am), my actual work gets lost in the noise. So I defined a dictionary that can do the heavy operation and cache the result:

class LambdaDict(dict):
    def __init__(self, l):                                                                                                                       
        super(LambdaDict, self).__init__()
        self.l = l
 
    def __getitem__(self, key):                                                                                                                  
        if key in self:
            return self.get(key)
        else:
            self.__setitem__(key, self.l(key))
            return self.get(key)
 
# and now my code becomes
 
results = LambdaDict(lambda  key:expensive_calculation(key))
 
for item in really_big_list:
    result = results[item.attribute]
    # ... do something useful with item and result

That’s really nice, clean code that I’m satisfied with. The difference in Python and Ruby here is that Ruby hashes (the equivalent of dicts) include this behavior by default, just pass a block (lambda) when constructing the hash and it’ll be called for every missing key.

I’ll have the same behavior in Python or Ruby, it’s just that the default Python object doesn’t give me a handy method for building a dictionary that might perform some arbitrary expensive method on a simple lookup. Ruby is in favor of implicit magic, so it holds out the hook. This is the difference I was trying to get at: Python’s builtin objects have fewer methods and convenient hooks for me to do weird and useful things than Ruby’s, and Ruby will even let me tinker with them. Ruby has open classes, so I can extend both the builtin and defined classes with my own methods:

module Enumerable # included by Arrays and similar objects
  def sum
    inject(0) { |x, y| x + y }
  end
end

This adds a sum() method every Array in my program, whether I construct it (like I created my own results dict) or get it from some library code. In Python I’d have to define my own Array type and I’d be out of luck if I’m getting back arrays from a library. Monkey patching is Python’s (deliberately clunky) name for adding a method to a single object at runtime. I’d have to monkey patch every object as it’s returned from the library rather than just being able to declare that every object of that class should have my method.

If this is the first time you’ve seen open classes: yes, when I first saw it I felt exactly the same way you do. Dangerous unreliable hackery, a recipe for disaster. But I’ve seen a lot of useful things happen in Rails because of it, and my Ruby projects have benefited from being able to add a method to an existing library’s objects, to overwrite (or just chain my code before or after) another method.

It’s something of a last resort that lets me build the cleanest object system I can as I interface with builtin objects and library code. I don’t have any utility methods floating around or have to tweak each object as I get it back from an API call. As I cook, Ruby’s metaprogramming is the garnish that finishes the dish. I called Python academically inconvenient because it feels like there’s an academic designer at my shoulder saying, “No, you don’t want to do that, it’d be messy and might be be abused. Pedagogically unsound.” I love that Python pushes me to build an explicit and obvious code, but I find myself wanting to tuck in one little bit of magic to make the code perfect.


Comments

  1. Yes, some of us have seen open classes in Python. For example, Plone’s Archetypes takes a schema declaration and dynamically generates methods on a class based on that information, similar to much of the class re-opening stuff that goes on in ActiveRecord. We have even seen open classes before Rails since development of Archetypes goes back to 2002 :)

    http://plone.tv/media/430268733

    Note that in the Zope world where the term monkey patch originated from, the dynamic class modifications of Archetypes is *not* called monkey patching. Monkey patching only applies where your reason for re-opening a class is in order to patch it to fix bugs or mis-features similar to traditional software patches:

    http://www.catb.org/esr/jargon/html/P/patch.html

  2. I think one of the reasons why Python does not have a default way of doing this is because there are many ‘correct’ ways to do it depending on the problem you are trying to solve.

    For instance, your implementation has some issues. It will not work if your attribute is a dictionary as dictionaries are not hashable and can’t be used as keys. If it is a complex object, the hash will be on the instance id (pointer) verses the data it contains unless you override __hash__. I have seen a number of examples of caching using decorators and descriptors (I often use a descriptor decorator to do dynamic caching of attributes).

    I have used ruby extensively and run into similar issues with using the builtin caching behavior. In the end, most of the time I had to write my own (as I rarely deal with simple things).

    In the end, ruby has some native helpers for the most common use cases, but it is easy enough to write generic python decorators or descriptors to do the same thing in python.

    As for monkey patching… Well one of the things I hate about Ruby is injection. It has cost the company I work for hundreds of hours of grief. The real problem are the people using the language all injecting their crap without regard to what other people are doing, but that is what researchers do. And I hate it in python even more (as ruby gives you some protection from bad things where python gives you none).

    yes its cool, yes it allows you to do some very powerful things… but it is a bad thing to do in a large codebase with multiple people who don’t talk to each other about every change they make (which is 90% of ‘professional’ software development.) As one researcher said after a week of debugging, ‘That is some beautiful rope we hung ourselves with.’

  3. Python 2.5 already includes collections.defaultdict which is exactly what you’re looking for.

    Open classes are an entirely different issue. It can be really handy when tinkering around with things. Though, like Doug points out, when it comes to a large application with many libraries, all maintained by multiple developers, it becomes an easy way to shoot yourself in the foot

  4. Why not:

    results = {}
     
    for item in really_big_list:
        result = results.setdefault(key, expensive_calculation(key))
        # … do something useful with item and result

    Then you don’t need a custom class.

  5. @philip: How would you do this with defaultdict? AIUI defaultdict lets you specify a default, but doesn’t do any kind of caching – in fact, it doesn’t even let you specify the default as a function of the key.

    @david: I like that solution.

  6. The setdefault will not work, since the arguments are evaluated before the call to setdefault. So expensive_calculation will always be called, even if the result is already in the dictionary.

    The pythonic solution is to write a memoizing decorator – or use one of the many already in the ASPN Python Cookbook.
    Then you can do

    @memoize
    def expensive_calculation(...):
        ....
  7. Justin: You’re correct there. I simplified from my examples that passed keyword and other arguments to expensive_operation.

    Doug: It sounds like your criticism is that the LambdaDict can’t cache things the builtin Dict it’s based can’t cache. I’m OK with that limitation. I agree that injection is a great way to shoot yourself in the foot, that’s why I talked about it as garnish rather than a side dish. I’ve avoided the pitfalls, though I admit that’s at least partially luck.

    Philip: Defaultdict doesn’t work here, it can’t pass arguments or do the other things that Justin noticed I elided from my example.

    Dave: Great, got a link? I spent a while searching for code before writing my dict, but I didn’t think to search for “memoize” (obvious now that you’ve said it).

    (Thanks for the great comments, folks. I’ve add <code lang=”python”>…<code> where needed.)

Leave a Reply

Your email address will not be published.