Document replication: CouchDB vs. DVCS

My friend (and CouchDB committer) Chris just posted an excellent overview of the application-hosting potential of CouchDB on his blog. My first response was: okay, you’ve convinced me. Post-election, I’m porting the minimal Sinatra app backing Misfict to CouchDB, since it’s really just a minimal JSON storage engine at its core.

My second reaction was to find it a bit funny to see E4X making an appearance in this day and age; like most XML-centric tech, I had sort of assumed that the coming of JSON and YAML had sort of killed it, at least amongst the web-dev early adopters. It guess it just goes to show that everything old is new again, especially in the fast-moving world of web development tools.

Regardless, perhaps the most compelling picture Chris paints in his post is the idea of capitalizing on the off-line replication features of CouchDB to allow groups of people to separately work on a collection of documents, then merge their changes together at some point in the future. He leans heavily on a classroom metaphor, but I think the real potential may be more in the area of groupware and collaborative editing. Knowledge workers have been looking for the “holy grail” tool which combines the power of Word’s “track changes” with mixed on- and off-line authoring for a long time, and I think we’re finally building the infrastructure that will make that class of application relatively easy to build.

Looking over the CouchDB documentation, though, I still think there’s one major piece missing from their replication and conflict-resolution story: automatic merging of non-conflicting edits. Unlike a DVCS like Git, CouchDB still doesn’t (AFAIK) allow multiple contributors to edit different elements of a single document, and then commit those changes, without manually replaying edits from other contributors.

Since JSON is much more structured than raw text (which Git and other DVCS systems deal with handily enough), it seems tractable to examine potentially conflicting updates and to see if they’re isolated to different child nodes of the JSON document. Furthermore, given the degree to which CouchDB has already embrace the map/reduce model, I think you should be able to distill the conflict-resolution algorithm down to two steps: generate a “diff” in the map step, which just notes the original document ID and the changed attribute/subtree elements, and then a “reduce” which attempts to create a new document by applying those changes to the original document.

Regardless, I think it’s an interesting time to be involved in web development. The idea that you could grab just a subset of a larger data store, work with it both on- and off-line, then share your changes with a group of colleagues is a powerful one, and I applaud anyone (like Chris and the rest of the CouchDB team) working to make it possible.

4 Responses to “Document replication: CouchDB vs. DVCS”


  1. 1 Chris Anderson

    Part of the CouchDB philosophy is to avoid RPC style “server magic” whenever possible. If you think of documents as just resources, of course they should be manipulated by the client. Some apps may have fields with complex interdependencies (like an invoice’s line items and it’s total). Pushing merging back to the app is the only solution for the breadth of uses out there.

  2. 2 Chris Anderson

    Oh and thanks for the insightful commentary. I do think that many merging use cases can be captured by just a few lines of code. Hopefully we’ll come up with a good way to reuse that code. What will be the jQuery of CouchDB?

    BTW you’re site stores a cookie but gives me this error when I try to post the comment without refilling my name etc. “Error: please fill the required fields (name, email).”

  3. 3 lennon

    I think that eschewing “magic” is commendable, and it may well be that automatic merging doesn’t belong in the CouchDB core. When I say that you need a good “story,” though, I mean precisely that: some compelling explanation for the “best practices” for multi-way merging that doesn’t simply leave it as an exercise for the reader.

    Perhaps I’ll get a chance to play with some simple implementations of automatic merging once I start digging into CouchDB a little more deeply. If so, you can bet I’ll be coming to you with questions, Chris.

  4. 4 Tony Garnock-Jones

    Having some kind of programmable merge capability would be a great addition to CouchDB. The current system, if I understand it correctly, is a kind of two-way merge in which one branch always wins. In order to support a more sophisticated approach, one would need to be able to identify the historical revision that the two branches have in common in order to do a three-way merge, and then to be able to supply a javascript function to actually do the merge. I’ve been experimenting with Javascript DVCS, including adding history-and-merging into an experimental TiddlyWiki; it’d be wonderful to have hooks within CouchDB for programmable 3-way conflict resolution.

Leave a Reply