Craftsmanship Tour: New York Times
Code: , , ,
No comments

In May, while visiting New York City, I dropped by the New York Times to code with Derek Willis and, impromptu, Dan Berko. I worked with both at the Washington Post (and saw many other familiar names on doors, online journalism is a small town).

Derek’s got a great career arc. He climbed up the ranks of journalism, covered Congress, and got involved in data-heavy projects. “Computer assisted reporting” is one of those terms that nobody quite loves but nobody’s successfully replaced (though it seems “database journalism” is gaining ground) and refers to collecting and analyzing data in databases. Derek got as interested in the “computers” as the “reporting” and has deliberately pushed his skills and career into software development. (*cough* sounds like a good topic for a quiet blog, eh, Derek? *cough). I look at his GitHub profile today and he’s been busily merging in contributions to his open source projects like his USA Today Census gem.

We started out the day looking around the FEC scraping code. In the States, the Federal Election Commission gives out lots of data from candidates filing required disclosure statements. We tidied up the database a little and then turned to the project for the day, which has just been publicly announced:

Today we’re announcing the addition of paper campaign filings from Senate candidates and two party committees to our Campaign Finance API, which previously had only provided details of electronically filed reports. Now users can request and view the filings of any committee registered with the Federal Election Commission.

Unlike House and presidential candidates, current and would-be senators file their campaign reports first with the Secretary of the Senate, who then forwards them to the F.E.C. That agency then scans in the images from the paper filings and makes them available for viewing (an example). While an effort to require electronic filing for Senate candidates hasn’t gotten much traction this year, we have at least made the API’s set of filings more complete.
New in the Campaign Finance API: Paper Filings

The scraper was previously ignoring the paper-only reports, but we updated it to recognize and categorize them. The categorization was a huge bit of nostalgia for me: take the noisy and sometimes inconsistent provided categories and map them onto a standard set of database categories (form_type, in the screenshot in the announcement).

When we ran the scraper, it would complain and halt each time it reached an unknown category. We’d add that to a mapping table and restart from that point, but it was frustrating to have to keep an eye on it. So we set the scraper to ignore records that it didn’t have a mapping for and warn about the problem. We set the scraper to run (and hit the amazing Shake Shack for burgers) and came back to find a list of missing mappings. After adding that, we ran the scraper again to fill in the missing entries.

This worked because the scraper only added entries it didn’t already have recorded. The term for this is idempotency, and it’s useful from the level of individual functions up to large, fairly complex programs like web scrapers. Every program fails, having an idempotent approach to the problem means you don’t have to keep careful track of many types of failure because you can fix things and re-run your program without worrying about duplicate records or updating things twice.

Derek had to run off to catch a train, so I dropped in on Dan Berko. He was on one of the Post’s several other “web innovation” teams while I was there, so we helped each other with code occasionally but didn’t spend a lot of time coding together.

The New York Times has large and well-maintained internal tools for reporters and editors. The reporters have a CMS for writing stories and the editors have a budgeting system for planning what goes where in the paper. We improved communcation between these two a bit, so the budgeting tool could refer to a story in the CMS and pull metadata from there instead of requiring an editor to re-input it.

The UI was simple: if the editor links a story, several fields should be grayed out and a checkbox should indicate the link. If the editor unchecks the box, the link is broken and the fields become editable again. This started out with two code paths – one for linking a story, one for unlinking – and making sure on pageload that the UI was in the proper state. We’d barely started writing that when we saw it could be implemented even simpler:

disable_if_linked_to_cms: function() {
  var checked = $('asset_cms_id').checked;
  ['asset_home_status_id', 'headline', etc.].each(function(id) {
      $(id).disabled = checked;
  });
},
document.observe('dom:loaded', function(){
  Event.observe('asset_cms_id', 'click', Budget.disable_if_linked_to_cms);
  Budget.disable_if_linked_to_cms();
});

When the checkbox is checked, all the form fields are disabled. When it’s unchecked, they’re not. The code runs on pageload and anytime the checkbox is toggled. I really liked this bit of code: we started out writing the simplest thing that came to mind, but soon we realized it could be reduced. The resulting code probably elicits a “So what, it’s not doing much?” reaction, which is far better than the previous “Now, let’s see, what’s this doing?” we would’ve had at first. The sign of the best code is that you immediately understand it, not that you have to stretch yourself to follow its solution.

Software Craftsmanship Tour: Aidan Rogers
Code: , , , ,
No comments

A few weeks ago I met up with Aidan Rogers to hack on some code. Aidan and I were coworkers at Cambrian House a few years ago.

Aidan and I hacked away at adding a feature to a Web App That Shall Not Be Named. User authentication was handled by Facebook or Twitter, so the feature was to fetch and list their friendships from their social network. The code in app/models/user.rb was straightforward, taking an object from authentication:

# Update the database with the user's current list of friends.
# Compares a fresh list to the existing list to minimize database writes.
def update_friends(auth)
  # fetch current list of friendships as hashes
  provider_friendships = auth.retrieve_friendships
  # load already-stored friendships to hashes
  stored_friendships = friendships.send("on_#{auth.provider}").map { |friend| {:id => friend.friend_id, :name => friend.friend_name} }
 
  # compare the two lists of hashes
 
  # add any new friends to the database
  (provider_friendships - stored_friendships).each do |friend|
    friendships.create!({
      :friend_id => friend[:id],
      :friend_name => friend[:name],
      :provider => auth.provider,
    })
  end
 
  # delete any removed friends from the database
  (stored_friendships - provider_friendships).each do |friend|
    friendships.destroy_all(:conditions => { :friend_id => = friend[:id], :friend_name => friend[:name] })
  end
end

That code isn’t very complicated, but it’s worth noting before you read the next snipped that we were both new to the Facebook/Twitter APIs and spiking functionality rather than trying to write production quality code.

Keep that in mind as you read the code we added to the authentication object. It works, but it could definitely be better written (left as an exercise to the reader, because, well, we only spiked, so I haven’t actually written a better version to present).

def retrieve_friendships
  if provider == "facebook"
    # made the Facebook API call
    user = FbGraph::User.me(self.data['credentials']['token'])
    # map it to an array of hashes
    return user.friends.map do |friend|
      # check against the db to load their id
      existing = Authentication.where(:uid => friend.identifier, =
:provider => 'facebook').first
      { :friend => existing ? existing.user : nil,
        :provider_id => existing ? nil : friend.identifier,
        :provider => 'facebook' }
    end
  elsif provider == "twitter"
    # Returns only 20 users at a time - is it important enough at this = point to add all the code for iterating? (aidan)
    # return Twitter.friends(nickname).users.map{ |friend| Hash[ :id => = friend.id, :name => friend.screen_name, :provider => 'twitter' ]}
    return {}
  else
    raise "Refactor me"
  end
end

Having the authentication object test what provider API it should be loading from is an obvious code smell, but it’s not the one I want to talk about.

There’s an implicit type in use. Aidan and I referred to it as a FriendHash, it’s a normalization for comparing friendship objects from the app’s database and from different API calls. FriendHash would be a data class with just a smidge of validation code to account for all three attributes.

I’ve noticed these sneaking around code in the interfaces between objects. I don’t quite have a name for them, but I realize it touches a bit on design by contract. The part that smells to me is that this code is spread amongst many places that need to be updated in sync with each other (not that this is terribly better if you have a class for it, but at least you can grep for references to it). I’m not a big fan of data classes, so I don’t love this solution.

What really frustrates me is that I’ve got enough experience to have a tingling intuition when I am attempting to re-solve some problem that surely someone else has analyzed and offered solutions to, but my resesarch hasn’t turned anything up.

So I’m posting to hear what folks think of this situation. Have you read anyone writing about these quiet but complex data types at interfaces, or do you have ideas for how to address them?

Craftsmanship Tour: Jim Ray
Code: , , , ,
No comments

Last Tuesday I spent the afternoon with Jim Ray at the excellent Hooked coffeeshop in Denver, CO (and then I spent the next several days sick from a bad meal and recovering, so this post got delayed).

We worked on FindAPair, a small web app he’s developing with Jeff Powell. Developers can list their city and link to their Twitter and GitHub profiles to find other devs to pair with. (Yeah, this basic idea felt pretty familiar.)

First, Jim and I added some acceptance tests using Steak to test profile editing. Steak is a DSL for acceptance in BDD, but closer to Ruby code than the popular Cucumber. Jim explained he chose Steak because this project does not have a non-technical customer who’d read, verify, and help write specs. Cucumber’s translation between text files and method calls would be unnecessary complexity.

FindAPair uses the wonderful Devise library, so we followed the steps on the wiki to allow people to log in by username as well as email address. It went pretty well, but we needed to do a little tweaking for Rails 3. Of course, we contributed back to the docs. One of the nice things about GitHub is how frictionless it is to share these small improvements.

Another interesting bit of coding was planning the URL structure of the site. The two major resources are users and the cities they live in. Jim wanted to have really simple URLs like findapair.me/jimiray or findapair.me/denver. He didn’t want to overlap the namespaces, but he couldn’t decide between them. As he was explaining the problem I thought back to the early days of the web and suggested putting cities at the root and users at findapair.me/~jimiray. Jim laughed, decided the audience was nerdy enough to remember it, so we committed it.

I’m not going to recount all the little things we did, but it was a fun afternoon building what could be a great resource.

While I was programming with Jim, I was reminded of the importance of studying. I knew a lot more vim commands than Jim, but it’s not because I’m any smarter. When I started learning vim I quickly realized it was a far bigger topic than I could absorb at once, so I only learned enough to take care of my daily routine. After I was comfortable with that, I started reading the vim manual, but I read at most one chapter a week. Read a little, experiment, give it time to settle, refresh my reading in a few days, and then move on to something else. In the decade since then, I still watch for new information about vim, reading the vim subreddit and Twitter accounts like learnvim and dotvimrc. It’s easy to skip over the stuff I already know. It’s easier to earn in small amounts over years than read one book and hope to retain everything.

Craftsmanship Tour: David W. Allen
Code: , , ,
1 comment

I’m in Grand Junction, Colorado because it seemed as good as any a place to start traveling. I have family in Denver and plans to ski, so why not tour the state a while? Once I knew I was coming to Grand Junction, I remembered GitHub can be searched by location and I got curious, so I did the search.

One result? If there’d been a hundred people I’d have been done right there, but if there’s exactly one person, well, why not say hi? So I sent an email that I think sounded like this:

I realize I’m a complete stranger and the idea of wandering into people’s offices to work with them for a day sounds perfectly ridiculous, but would you like to grab coffee, talk code, do some pair programming?

Perhaps I didn’t come off quite so badly because David said “what the hell” and invited me to meet him for his workday at Traders Coffee.

A side story: I happened to arrive before him and was looking around for him, and couldn’t see him. But I saw a woman surfing Stack Overflow! I thought maybe this was someone David knew and invited along, so I had an awkward conversation where it quickly became clear she had no idea who I was or why I was introducing myself. If I’d been thinking quicker I’d have invited her to join us or at least given a business card, but my generic nerdly social phobia overwhelmed me and I excused myself. If you’re reading this and someone interrupted your Stack Overflow session to ask if you knew a guy you’d never heard of, sorry for the confusion, but if you’d like to grab coffee, talk code, maybe do some pair programming, I’m in town until Friday afternoon.

As David and I sipped delicious lattes, I learned that he’s a mechanical engineer who got into software development around ten years ago as his job changed. He’s self-taught, he cracked the books and started experimenting. Along the way he attended some Construx Seminars. Construx is the software firm started by Steve McConnell, the giant who strides above programmers bestowing books of wisdom and lore, so that gave David a good grounding.

A lot of our conversation was about how David has found it hard to improve his craft without other, ideally better, programmers around to work with. It’s referenced by the community aspect of the Software Craftsmanship Manifesto. Programmers improve their craft by improving each other’s craft.

As an example, David told me about his startup project veloGraf. He and another developer are using graph algorithms to analyze social networks and I soon realized I only know enough graph theory to hang onto the ankles of the conversation. I hadn’t thought about it before, but David pointed out that game tags on NearbyGamers form a highly cyclical graph. Gamers add edges to the graph by listing what games they’re interested in playing and tags self-referentially describe themselves (eg. Dungeons and Dragons is tagged as a an RPG).

I demoed the site’s most computationally-expensive page, which shows gamers who share tags, sorted by distance. This page required some hacking to avoid a gigantic couple of joins that would’ve dragged in most of the rows in the RDBMS, but David chuckled because it’s the sort of problem a proper graph database doesn’t even notice. He’s tracking millions of nodes with many times more edges in order to extract interesting connections. The simple analyses he pointed out flaws in common algorithms that were obvious to him but surprising to me. I managed to give him some fodder for experiments to improve his algorithms, which says more about the value of a little collaboration than it does my graph chops.

David told me a tale of (mild) woe from a previous job. An experienced cowbody coder was developing the core of a product while supposedly mentoring interns. In reality, they were restricted to working on small, ancillary tasks so as not to destabilize the core or distract the cowboy. How do you fix this?

My best guess (share yours in the comments, please – it’s not an uncommon scenario) was to task the interns with adding automated testing. The cowboy will think of it as menial work, but the interns will be exposed to the whole system and have reasons to have conversations with him. And probably better to start with acceptance testing than unit testing to preempt the cowboy from whining that using unit tests to verify that his code works (as opposed to hack and hope and oh yeah, I forgot those two things were related) slows down his brilliance.

There’s a complacency I’ve seen that’s sort of related to the Blub Paradox. When a programmer succeeds at building their first product or two, they often get an panglossian sense of competence. They may think, “I’m good at my job, there’s people telling me I should learn these things, but I’m good at my job without them, ipso facto I don’t need to learn more.” Unless they’re around a noticeably better programmer or especially introspective, they may never realize how much more there is to learn and the possibility of much greater skill.

I think this is where net negative producing programmers come from. I joined a project with an NNPP once. I watched all commits to the repository (everybody does this, right?) and I realized nearly his every commit added a bug or the strong potential for future problems. There was feedback available in the rising bug count and frustration of trying to improve the codebase, but he didn’t have the experience to know that this experience was abnormal. (Or at least improvable.) As I and other developers started cleaning up the code, he recognized what had been happening and decided to reinvest in his career by attending training.

I think this is why the nascent software craftsmanship movement has such potential. Programmers need to make steady improvement, however small, a core value of the profession. Currently there’s a “well, it works” when practice and mindful improvement could be a precept. The likeliest way I see that happening is masters of the craft mentoring apprentices. Every craft has its share of “just get it done”, but it doesn’t have to be the dominant culture.

Update: David has blogged his thoughts on our wide-ranging conversation.

Craftsmanship Tour: 8th Light
Code: , , , , , , , ,
No comments

The second stop on my craftsmanship tour was last Friday at 8th Light. They’re a local Chicago consultancy that’s active in the software craftsmanship community, especially in building the new Chicago SC group.

I started out pairing with Colin Jones, which was a bit lucky because he’s also a vim user. After I shared the joy of matchit.vim we got down to the task at hand.

8th Light is one of a few companies that do work for Groupon, the local wunderkind. Our task was to write a small script to sync data between services for a small percentage of subscribers, but when a site has over 40 million subscribers, you can’t just hack out something fast and hope for the best.

The work needed was very similar to an existing script, but Colin’s first instinct wasn’t a quick copy and paste and tweak. We analyzed the script to find the points of commonality and extracted them to a superclass, and we grew the existing tests (yes, the one-off script already had tests) to cover the new functionality.

I get the feeling that by the time I finish this tour I’m going to be pretty familiar with every combination of testing and mocking library out there. Last Monday it was Cucumber, this time I dropped into learning rspec and its built-in mocking. I’m glad I experimented with writing my own mocking library a few years ago, I earned a deeper understanding of mocking than one library’s API and was able to explain what I was trying to do and find the right bit of rspec docs quickly.

Every Friday lunch is “8th Light University” and a developer presents on an interesting topic. Colin gave a presentation on Clojure, walking through the basics of functional programming style like map, filter, and the importance of understanding when Clojure will or won’t walk an entire sequence to produce its result (as demonstrated by using (range 10000000000000)). It ended with a little exercise to add up all the numbers between 1 and 1000 that are evenly divisible by 3 and 5, my solution wasn’t too bad, though I should’ve used reduce in place of filter/apply:

(defn evenly_divisible? [x] (or (= 0 (rem x 3)) (= 0 (rem x 5))))
(print (apply + (filter evenly_divisible? (drop 1 (range 1000)))))

That refresher of Lisp syntax and FP came in handy immediately after lunch, when I paired with Micah Martin. He was working on adding a Clojure interface to the GUI toolkit Limelight.

We refactored some Java and Clojure to remove some code duplicated logic for locating and loading GUI elements (“players”). I don’t really have any experience in Java, but it only took me about five minutes of removing duplication before I ran into hassles with the type system.

As soon I was distracted by the interesting problem of structuring code in an unfamiliar language, my hands reverted to use vim commands. We were using IntelliJ IDEA on OS X, so I flailed a bit, at one point griping “I feel like I’m wearing mittens”. It’s disorienting to lose tools I’ve tweaked and practiced with almost daily for a dozen years.

The Java started like this:

public void recruit(PropPanel panel, String playerName, CastingDirector castingDirector)
{
  final Scene scene = panel.getRoot();
  final String scenePlayersPath = scene.getResourceLoader().pathTo("players");
 
  final boolean couldRecruitFromPScene = recruitFrom(panel, playerName, castingDirector, scenePlayersPath);
  if(!couldRecruitFromPScene)
  {
    final String productionPlayersPath = scene.getProduction().getResourceLoader().pathTo("players");
    final boolean couldRecruitFromProduction = recruitFrom(panel, playerName, castingDirector, productionPlayersPath);
    if(!couldRecruitFromProduction)
    {
      boolean couldRecruitDefaultPlayer = recruitFrom(panel, playerName, builtinCastingDirector, BuiltinBeacon.getBuiltinPlayersPath());
    }
  }
}

I found this code really hard to read, in part because there were nested conditionals when it’s really doing the sequential work of trying different paths. It’s a bad application of single exit point because Java is a garbage collected language. It’s OK to have multiple return points from a function because you don’t have to worry you’re leaking memory. So I rewrote it as:

public void recruit(PropPanel panel, String playerName, CastingDirector castingDirector)
{
  final Scene scene = panel.getRoot();
  final String scenePlayersPath = scene.getResourceLoader().pathTo("players");
  final String productionPlayersPath = scene.getProduction().getResourceLoader().pathTo("players");
 
  if (recruitFrom(panel, playerName, castingDirector, scenePlayersPath))
    return;
 
  if (recruitFrom(panel, playerName, castingDirector, productionPlayersPath))
    return;
 
  recruitFrom(panel, playerName, builtinCastingDirector, BuiltinBeacon.getBuiltinPlayersPath());
}

There are fewer variables and I moved their initialization together, so your eye picks up the parallel structure and just focuses on the differences.

Of course, that’s not just parallel structure but repeated code in the conditionals and call to recruitFrom(). I fell into Ruby syntax trying to explain what I wanted to do as the next step of refactoring:

public void recruit(PropPanel panel, String playerName, CastingDirector castingDirector)
{
  final Scene scene = panel.getRoot();
  [
    [scene.getResourceLoader().pathTo("players"),                 castingDirector],
    [scene.getProduction().getResourceLoader().pathTo("players"), castingDirector],
    [BuiltinBeacon.getBuiltinPlayersPath(),                       builtinCastingDirector],
  ].each { |path, director|
    if (recruitFrom(panel, playerName, director, path))
      return;
  }
}

Now the code doesn’t have any extra variables or repeated method calls. If the recruitFrom signature changed, there’d be exactly one place to change it.

Alas, it was not to be. I couldn’t simply bundle a player and a canvas in an array of tuples because they were of different types. Micah started typing a simple inner class to gather them together, but I vetoed it. Even though it would removtge duplication, I didn’t think it worth the tradeoff of added complexity of a new, data-only class. We left it at the previous revision.

Refactoring isn’t a blind set of tools to apply while (code.has_refactoring_possible?) because they’re about managing complexity by moving it to explicit places without duplication. You can spend hours smoothing things out only to be left with a stubborn bit of complexity that you can leave a warning on rather than spend a disproportionate amount of time trying to get that last bit. I find that when I do this, a future change will often expose what the real trouble is and I can advance the refactoring then.

All in all, a great day at 8th Light. I’m still looking for places to visit in Chicago before the end of the year, so please check out my calendar and suggest places here or on Twitter.

Craftsmanship Tour: Obtiva
Code: , ,
2 comments

Last month when I started planning my travels of indefinite duration, I ran into the blog On Being a Journeyman Software Developer by Corey Haines.
He spent the end of 2008 and most of 2009 traveling around the United States pair programming in exchange for room and board, trading knowledge and having interesting discussions. I saw it and thought, “Hey, I could do that.”

I chatted with Corey about his experience and he encouraged me (and every other developer, really) to visit and learn from as many software developers as I could. To this end, he and Scott Parker introduced me to folks at Obtiva and 8th Light, two local consultancies that are part of the software craftsmanship movement, with the encouragement that I visit places even before I leave Chicago.

Obtiva

I’ve gotten my programming tour started. I spent yesterday at Obtiva pairing with Andy Maleh on some client work. We tracked down and fixed a bug, added a straightforward feature, and added a feature that exposed a hidden assumption in the spec and required some refactoring.

This was the first day I’ve spent pair programming and I really enjoyed the experience. There were some tools I hadn’t used, and Andy’s ability to answer a question, give an example, or push me to try it (“Here, you drive”) at the exact time I was wondering how to do something made for quick learning. We traded off responsibilities smoothly, with one of us writing the implementation and the other watching for typos and thinking of different possibilities. There were a few moments when I felt like I had a second brain because I had Andy’s knowledge to draw on or because he pointed out a problem I wouldn’t have noticed for a few minutes because I was heads-down on the current task.

Cucumber

One of those new tools I saw was Cucumber, a test framework for behavior-driven development and acceptance testing. You write your tests in lightly structured English and then write code to implement each step.

I’d seen a lot of talk around Cucumber but ignored it because I’m happy to write my high-level tests in Ruby. I don’t have a non-technical customer to show them to, so I didn’t see a benefit. I’m very glad to have gotten a little experience using Cucumber. It forces you to write a DSL for testing your app, with small, well-named methods to encapsulate each step of the process. It’s a great exercise that I’d recommend for anyone who writes automated tests (which is all developers, right? *cough*).

Dehumanize

As we were tidying up some Cucumber tests, I noticed a bit of repeated code that transformed an English description like “Phone number” into a column name like “phone_number”. We were about to abstract it to a small method, when Andy mentioned he’d submitted a small patch to Rails a while ago that was not accepted. Perhaps inspired by the recent accomplishment of a former coworker, I said, “Why don’t we package a gem?”

So that’s just what we did, creating the dehumanize gem for this snippet to perform the inverse of ActiveSupport::Inflectors#humanize. Jeweler made it the work of a minute to package and publish; we spent more time retrieving Andy’s lost login to publish to RubyGems.

Aside from all these new toys, we had a nice running conversation on the size of commits and using feature branches. My tendency is to make much smaller and more frequent commits than Andy, and to break work off into branches much more often. Look for a blog post about it in the next few days.

Are you in Chicago?

If you’re in Chicago now and would like to spend a day pair programming with me, please contact me. There’s a lot to learn from each other, and it’s a lot of fun. At the least, you can follow this blog’s RSS feed or me on Twitter to see how it goes.


Fatal error: Call to undefined function twentyseventeen_get_svg() in /home/malaprop/push.cx/wp-content/themes/pushcx/archive.php on line 45