Archive for October, 2008

Document replication: CouchDB vs. DVCS

My friend (and CouchDB committer) Chris just posted an excellent overview of the application-hosting potential of CouchDB on his blog. My first response was: okay, you’ve convinced me. Post-election, I’m porting the minimal Sinatra app backing Misfict to CouchDB, since it’s really just a minimal JSON storage engine at its core.

My second reaction was to find it a bit funny to see E4X making an appearance in this day and age; like most XML-centric tech, I had sort of assumed that the coming of JSON and YAML had sort of killed it, at least amongst the web-dev early adopters. It guess it just goes to show that everything old is new again, especially in the fast-moving world of web development tools.

Regardless, perhaps the most compelling picture Chris paints in his post is the idea of capitalizing on the off-line replication features of CouchDB to allow groups of people to separately work on a collection of documents, then merge their changes together at some point in the future. He leans heavily on a classroom metaphor, but I think the real potential may be more in the area of groupware and collaborative editing. Knowledge workers have been looking for the “holy grail” tool which combines the power of Word’s “track changes” with mixed on- and off-line authoring for a long time, and I think we’re finally building the infrastructure that will make that class of application relatively easy to build.

Looking over the CouchDB documentation, though, I still think there’s one major piece missing from their replication and conflict-resolution story: automatic merging of non-conflicting edits. Unlike a DVCS like Git, CouchDB still doesn’t (AFAIK) allow multiple contributors to edit different elements of a single document, and then commit those changes, without manually replaying edits from other contributors.

Since JSON is much more structured than raw text (which Git and other DVCS systems deal with handily enough), it seems tractable to examine potentially conflicting updates and to see if they’re isolated to different child nodes of the JSON document. Furthermore, given the degree to which CouchDB has already embrace the map/reduce model, I think you should be able to distill the conflict-resolution algorithm down to two steps: generate a “diff” in the map step, which just notes the original document ID and the changed attribute/subtree elements, and then a “reduce” which attempts to create a new document by applying those changes to the original document.

Regardless, I think it’s an interesting time to be involved in web development. The idea that you could grab just a subset of a larger data store, work with it both on- and off-line, then share your changes with a group of colleagues is a powerful one, and I applaud anyone (like Chris and the rest of the CouchDB team) working to make it possible.

Big changes

So, I’ve been sitting on this for a while now, but finally get to make a wider announcement, now that the “i”s have been dotted and the “t”s have been crossed:

I’m leaving Reed College in a few weeks, and starting a new job at Sun Microsystems to work on Project Kenai. It’s a big change for me — I’ve been hiding out in academia for almost four years now, so switching back into the commercial world is both exciting and scary.

Kenai is a fascinating project, which I hope to talk about a lot more in the near future. I can say already that it’s one of the more ambitious JRuby on Rails projects out there, and that I’m excited to see what we can do with the full Sun hardware + open source software stack underpinning a high-volume Rails site. In addition, I’m going to get the chance to work more closely on UI and interaction design, which is an area in which I look forward to expanding and updating my skills.

Reed has been a great place to work, and I can’t say enough good things about everyone else in the IT organization here. That being said, I’m really psyched about getting to focus almost entirely on writing code and implementing features, and working in a small, distributed group within the larger Sun umbrella.

New toy: misfict

Being home alone with a head cold doesn’t leave one with a lot of excuses not to knock off a quick project. I had been mulling over the idea of building a version of the classic “storytime” party game as a webapp for a long time, and since I also wanted to spend a little more time working with jQuery’s AJAX and JSON support, it seemed reasonable to tackle both at the same time.

So, without further ado, I present misfict, the micro-serial-fiction engine. The process is simple: read the last line someone else wrote, then post your own idea for the next sentence in the story. Eventually, we should end up with a lovely stream-of-consciousness story co-authored by anyone who cares to drop a few words into the bucket.

I may build in some sort of cap for the number of sentences before a story is finished, or periodically declare a “chapter break”, but for the time being, the story will keep going as long as anyone is writing.

PS. any perceived relation between the release of this project and the upcoming start of NaNoWriMo is strictly a coincidence.

PPS. If you’re interested, all the code is available on GitHub

PPPS. (last one, I promise) I got the misfict.com domain, so the link above has been corrected to point there. Also, there’s an RSS feed. Now, go write something.

Tradition

2008

A Well-Deserved Pint, cont.
2007

take 3

2006

another well-earned pint

2005

a well-earned pint

There are security holes, and security holes…

I was reviewing a Perl CGI script a co-worker sent to me for troubleshooting last week, and came across this little gem (excerpted but not changed in any meaningful way):

use CGI;
use LWP;

my $ua = new LWP;
my $req = new CGI;

my $res_id = $req->param('rid');
my $img = $req->param('img');
my $url = "http://somehost/cgi-bin/fetch.cgi?id=$res_id";

$req->get($url, :content_file => $img);

open FH, $img;
unlink $img;

print $req->header(-type=>'application/octet-stream');

while (<FH>) {
        print $_;
}
close FH;

How horribly bad is this script? Well, it allows no less than the deletion/overwriting of any file writable by the web server user. While that won’t allow injection of shellcode under most configurations, it would allow an attacker to delete logfiles, insert malicious replacements to files in upload directories, and generally mess with your system in all kinds of ways.

Even better, it completely misuses the Content-Type HTTP header to force download instead of inline view, instead of using the semantically-appropriate Content-Disposition: attachment route to force a download dialog box to appear on the client.

There are doubtless millions of lines of code like this out there in the world, and (at least in Perl-land) almost all of them could be caught with the simple addition of the -T (”taint check”) flag to the #!/usr/bin/perl line at the top of the script.

Registration, ACORN, and fraud

I’ve twittered about this already, but I think that it’s worth repeating: collecting redundant and invalid voter registration cards is not the same thing as fraudulent voting. I repeat: by registering people multiple times, or even submitting invalid registration cards, ACORN (and every other voter reg group) is not committing voting fraud. They are simply doing a bad job of actually registering voters.

The whole reason that voter registration is required before you’re allowed to vote is so that the local and state election boards can have a chance to verify your eligibility. If you submit more than one registration, it may cost them a few minutes of work to update or reject your registration records, but it won’t allow you to vote more than once.

So, if it doesn’t let people “vote early and often,” why in the hell would ACORN, as an organization, have such a poor record regarding bad registrations? It’s simple: they require their workers to meet a certain quota in order to get paid. I’ve done a bit of volunteer voter registration, and while it’s easier than many other types of direct-contact political work — fundraising and candidate canvassing are both harder, if only because of their inherent partisan focus — you still have good days and bad days out on turf.

Try to imagine yourself in the following position: you’re a high school or college student trying to make a little money over the summer, while still doing something a bit more proactive than flipping burgers. Furthermore, your meager paycheck is dependent on hitting your quota each and every week, and you’ve heard horror stories of the poor-performing workers who got canned just last month.

Now, imagine you’re two registrations short of your quota for the week. Would you be even a little bit tempted to fake one, or press that nice stranger who insisted they were already registered to do so again, in order to save your job? I suspect that most people, if they’re being honest with themselves, would answer at the very least that they might be at least a little bit tempted. I know that I would, which is part of the reason that I’m not very interested in doing any paid political work — as a volunteer, the temptation to cheat goes away.

(Please note that I’m not trying to defend this sort of behavior as ethical, but it is at least understandable.)

Personally, I think the interesting argument isn’t even about whether ACORN does a good or bad job of supervising their staff, or encouraging the right behaviors. The real debate we should be having is whether paid voter registration does more harm than good. The same question extends to signature-gathering, an especially hot issue in Oregon given the recent flood of bad ballot measures.

I personally haven’t decided one way or the other, but I think that a much more constructive discussion is there, waiting to be had, once we get past the current baseless accusations being leveled at ACORN and its partner groups.

Why we need universal health coverage

<political-rant>

I had surgery to repair my broken ankle this sumer. I also have fairly good health insurance through my employer. So, when the pre-billing statements from the hospital, surgeon, and anasthesiologist started showing up, I didn’t panic, even though the total bill (>$15K) would have been pretty tough for me to pay out-of-pocket.

When all was said and done, and my insurance had been fully invoked, I ended up owing a little under $1500. Seems pretty good, right? 90% coverage is good enough for all but the most expensive medical issues, and even significantly more expensive bills might be manageable under some sort of payment structure.

However, of that more than $13,000 that was “covered,” my insurance company actually only paid about $4000. That’s because they drive down the cost of everything else via contracts held with the hospitals, along with a healthy dose of strong-arming of doctors and specialists. (”If you want us to cover any of your patients, you’re going to have to accept 30 cents on the dollar for this procedure.”)

While that doesn’t directly affect me, it does mean that prices have to be that much higher for everyone without insurance. Without the insurance companies to negotiate on their behalf, they’re stuck paying extra to cover the gap between the real cost of care and what the insurers will pay.

We need a comprehensive Federal health plan that covers everyone if we’re going to have any chance of getting a handle on the cost of health care. Leaving those with the least all on their own to try to negotiate for care is neither fair nor sustainable.

</political-rant>