Open Source Bridge presentation

In case anyone stumbles here looking for the notes and examples from my Open Source Bridge talk, here they are:

osbridge_2009.zip

Note: this is a ~30MB download, since it contains (amongst other things) a full copy of JRuby 1.3.1 and the ActiveMQ runtime. The actual presentation and example code are very light.

You can also just view the talk slides, though they aren’t terribly informative without the code.

Update: video is available on blip.tv now. My apologies for the long delay while everyone downloaded the demo archive in minutes 3:00-8:15 or so.

Better late than never

I was reading a brief but interesting post surveying the current state of the art in security as programming language features, and realized that a lot of the links overlapped with the material from the paper I wrote for my security theory course a while back. Rather than re-post all of those as a blog entry, I thought I should probably just put a link to the finished PDF.

Given that this was a school paper, I hope that folks will forgive the somewhat stilted grammar and obviously-academic format. If you get nothing else from it, though, the bibliography may at least be of interest.

Never do today what you can put off ’til tomorrow

In many ways, this is a golden age for web developers: we have a bunch of good, high-level frameworks for writing apps in highly-productive dynamic languages and a solid corpus of best practices for testing, service API design, and data serialization. We don’t have to deal with dog-slow CGI scripts, complicated J2EE stacks, or proprietary ColdFusion code that only runs atop expensive application servers.

Unfortunately, all is not wine and roses (or scotch and bacon, or whatever). The major dynamic webapp frameworks push you by convention into doing the bulk of your application work syncronously in the request-processing loop, rather than asynchronously in a background thread. All of the accumulated wisdom about building responsive graphical user interfaces gets thrown out and re-discovered by each framework’s user community, resulting in a multitude of solutions for the basic problem of pushing work into a queue and dealing with it later.

As the fine folks at Twitter so famously discovered, synchronous processing puts a hard upper limit on how much (and how quickly) you can scale an application. Even at the much more modest loads my current project at work receives, there are quite a few performance problems that can’t be solved by simply throwing more stuff in memcached and hoping for the best.

Some folks are starting to catch on, and bake asynchronous processing into their frameworks by default, but the solutions tend to either be limited to very particular deployment and application models, or esoteric in the extreme. Meanwhile, desktop application authors continue to politely chuckle at all of our bumbling, and old-skool enterprise developers look at our hackish background-worker implementations and (rightly) consider them to be toys compared to the classic “big boy” message queueing solutions, or even the newer open source alternatives.

The next generation of web application frameworks should be designed around the idea that work is done asynchronously by default, with a fallback to syncronous jobs only in cases where a user needs to see the result immediately. Since applications also need to scale across a potentially large and heterogenous set of CPUs and servers, those delayed jobs also may not be running in the same memory space as the web application itself. That means machine and language-agnostic serialization, fast network IPC, and callback and event-driven programming.

Developers who grok these concepts now will have a leg up on the competition when building tomorrow’s crop of web applications.

Two steps forward, one step back

Once upon a time, there was RCS, and then CVS. They tracked normal edits to a set of text files reasonbly well, and coupled with telnet or ssh, even made it relatively straightforward for a trusted group of collaborators to share their changes with each other. Some people used other proprietary tools (Perforce, Visual Source Safe, etc.) but they tended to be either a) expensive b) really, really lousy or c) both. Among the open source crowd, at least, CVS dominated the version control space for many years.

Then came Subversion. It improved on many of the failings of CVS — notably, Windows support was dramatically better, repositories could be shared over HTTP, and many operations that just didn’t work in CVS (renames, binary diffs, etc.) performed reasonably well out of the box. To this day, Subversion is a reasonable choice for many projects, especially given the advanced level of support for it in IDEs, graphical repository browsers, and the like.

Much of the reason for that diversity of useful tooling built atop Subversion, of course, is that it was written in C, and built with an eye towards allowing high-level languages to use bindings into the same runtime libraries upon which the ’svn’ command itself relied. In fact, Python, Perl, Java, and Ruby are all supported by the core Subversion maintainers, and additional bindings using those same underlying libraries are available for a number of other languages.

Enter the distributed version control systems: Git, Mercurial, Bazaar, Darcs, and their ilk. The basic workflow they offer is in some ways more like RCS than it is Subversion: each developer works locally against their own copy of a repository, and they share their work via patch files and periodic synchronization. (This is of course a gross over-simplifaction, as all of them offer much more sophisicated change-tracking under the hood than RCS did, but the user-visible behavior is still reminiscent.) However, their ability to maintain change history across many developers and systems without forcing everyone to eventually squash their work down into a single source tree makes a number of new modes of project management possible, or at least much easier than before.

All of the above DVCS systems potentially offer a huge gain of productivity for many developers, since you can easily experiment with changes locally, selectively share only those modifications which you wish to, and continue working without being connected to the central repository. (This is especially significant for those whose employers maintain draconian firewall rules and disallow off-site access to their source control.)

Unfortunately, none of the popular DVCS systems have anything resembling the level of cross-language API support that Subversion does. Mercurial and Bazaar are both implemented in Python, making access from other Python code quite fast, and that from any other language painfully slow. Git is implemented in C, but without a supported and documented core library of functions designed to be used to facilitate access from other languages. Darcs is written in Haskell, which means only crazy mathematicians and CS majors have any ability or interest in using it. (I’m kidding here, but the point remains that Haskell isn’t exactly the most useful substrate for scripting language bindings.)

The fallout from all of this is that we’re left using wrapper libraries which fork out to the command-line tools for each DVCS. Such wrappers have a number of problems: the performance sucks, the internal APIs are usually only as robust as the set of regular expressions you write to parse the output of the commands, and almost no work is shared between the various wrapper implementations.

Don’t get me wrong: as a simple version control tool, I’ve found Git in particular (and distributed version control in general) to be a big step up from the old centralized-repository model. However, the very eighties-esque fork-and-regexp-scrape model for IPC — coupled with the lack of an obvious “best of breed” leader in the DVCS space — means that I (along with anyone else trying to support DVCS in a general-purpose way) end up doing a lot of low-level grunt work when we could be building real value for users.

Even something as simple as a standard dump format for a common subset of the information available from the popular DVCS types would be a start. I do know that, for the time being, I’m stuck supporting a bunch of very brittle code which relies on the various idiosyncratic console output formats of each version-control system.

Playing prognosticator, I would even go so far as to suggest that the first DVCS system to provide supported, documented interfaces in a number of popular programming languages could climb to the top of the dogpile that exists currently and emerge as a clear standard.

Inauguration playlist

We had a little family dance party in the street (no, seriously, we did — pictures forthcoming) after re-watching all the coverage of the inauguration tonight.

Our playlist:

  1. The Payback — James Brown
  2. Song 2 — Blur
  3. Gone Daddy Gone — Gnarls Barkley
  4. The Yeah Yeah Yeah Song — The Flaming Lips
  5. The Golden Path (Ewan Pearson Extended Vocal) — The Chemical Brothers Featuring The Flaming Lips
  6. Paper Planes — M.I.A.
  7. All My Friends — LCD Soundsystem
  8. My People — The Presets
  9. Fear Not Of Man — Mos Def
  10. Work On You — MSTRKRFT
  11. Rawnald Gregory Erickson the Second — Starfucker
  12. My Favorite Things — Outkast

The theme is obvious, but we all enjoyed the hell out of it.

Daily git-svn

My team at Sun uses Subversion to host our “authoritative” source repository for Project Kenai. However, since most work is done on the trunk, many of us find it more convenient to work locally with Git, using the git svn subcommand heavily to keep ourselves up-to-date without interfering with others’ work.

When I first started using this combo, I had some early trouble keeping my local Git repository from getting horribly b0rked whenever there were edits made to the same files I had been working on locally. Having used CVS and Subversion for so long, I initially assumed such conflicts (and the manual merge steps they entailed) were simply part of the equation, even when working with a proper DVCS. However, by applying a little more discipline to my use of local branches, I’ve been able to basically eliminate manual merges, except in cases where the exact same line has literally been edited by multiple people.

My first, most critical discovery was to never use the fetch command. Instead, use rebase. Second, never pull the latest changes from Subversion into a working feature branch; instead, switch to your master branch, create a new branch for merging (I usually call mine “svn-merge”), and do your rebase there. After the rebase has finished, merge in your feature branch changes, and then use dcommit to push your changes upstream.

As an example, here are to commands I would use to check out a new local Git clone of the main Subverion repository, work on a single command, and then push it back into SVN:

viper:Work$ git svn clone https://example.com/svn/repo/trunk -r500:HEAD repo
# ... lots of Git output here ...
viper:Work$ cd repo
viper:repo$ git checkout -b issue-123
# ... hack, hack, hack...
viper:repo$ git commit -m "fix for issue #123"
viper:repo$ git checkout master
viper:repo$ git checkout -b svn-merge
viper:repo$ git svn rebase
# ... watch results for conflicts ...
viper:repo$ git merge --squash issue-123
viper:repo$ git commit -m "ISSUE-123: fixed"
viper:repo$ git svn dcommit -e
# ... $EDITOR launches, allows you to write useful commit message for svn ...
viper:repo$ git checkout master
viper:repo$ git merge svn-merge

This may seem like a lot of extra branch switching, localized commits, etc., but the end result has been worth it (for me, at least). If you following this process, you can be relatively certain that your master branch will only ever mirror changes that have been made in Subversion.

Insuring that the master branch is always “clean” (i.e., has no conflicting commits) with regard to the shared svn tree makes it easy to switch temporarily to another feature branch if you have an urgent bugfix or simple change to make, while your bigger changes happily sit on another feature branch waiting to be pushed.

Updated Mar. 4, 2009: Changed merge to use --squash option, so that many local Git commits can be combined into a single upstream revision.

Testing

Is this thing on?

Looks like mobile blogging is a go.

Quick status update

It’s been two weeks since I posted, which is a bit embarassing. I’ve been pretty busy with the new job, though, and generally sticking to Twitter to get the word out about how I’m doing.

The quick version is: the Project Kenai team is turning out to be just about as good a group to work with as I could hope for. We’ve got enough to do to keep things from getting boring, but it’s the good kind of work: interesting + challenging, but nothing that feels like a death march. Plus, every time I fire up an editor to look at a new piece of code and see the GPL license in the header comments, I feel a little better about the company as a whole. Being somewhere that open source is the default (rather than a special case for which you have to lobby) is a pretty cool feeling for a OSS nerd like me.

Otherwise, things are pretty normal. As my family and friends all give in to the gravitational pull of SE Portland, I also find that my social life is less and less about going out, and more about staying in for social meals + conversation, which suits me just fine, especially in the winter.

That’s it for now. Expect more on the technical side of the work I’m doing after Christmas, when I have some time to write up my impressions of doing JRuby on Rails, and working in a heterogenous Solaris/Linux/OS X environment.

Home charcuterie

Those who have followed this blog for a while (or have met me in person) already know that I’m a fan of home-made charcuterie: bacon, sausage, ham, pastrami, etc. Over the last few months, the quality and consistency of said DIY projects has gone up considerably from our initial wonderful-but-inconsistent results.

Behold:

DSC_0012

bacon

While I didn’t actually prepare the cure for either of these batches, I did take at least my share of time at the smoker to insure their juicy-salty-goodness.

Killfile 2.0

There’s been an persistent blog-wank-fest making the rounds over the last few weeks about the state of the Ruby community: whether it’s become more or less “fun”, “creative”, etc. I’m not going to reward any of the participants with a link, but I do offer the following balm if you, like me, are a bit sick of hearing about it:

%w(open-uri rss resolv uri rubygems hpricot).each {|lib| require lib }

blog_url = ARGV.shift

blog_hdoc = Hpricot.parse(open(blog_url))
rss_links = blog_hdoc / :head / 'link[@type="application/rss+xml"]'

feed_rss = RSS::Parser.parse(rss_links.first['href'])

rants = feed_rss.items.select {|i| t = i.title; t =~ /rant/i && t =~ /ruby/i }

if rants.empty?
  puts "Okay, you get a pass."
else
  puts "Bad blogger! No biscuit!"
  hostname = URI.parse(blog_url).host
  ip_addr = Resolv.getaddress(hostname)
  `sudo route add -host #{hostname} gw 127.0.0.1`
end