Tag Archive for 'erlang'

Document replication: CouchDB vs. DVCS

My friend (and CouchDB committer) Chris just posted an excellent overview of the application-hosting potential of CouchDB on his blog. My first response was: okay, you’ve convinced me. Post-election, I’m porting the minimal Sinatra app backing Misfict to CouchDB, since it’s really just a minimal JSON storage engine at its core.

My second reaction was to find it a bit funny to see E4X making an appearance in this day and age; like most XML-centric tech, I had sort of assumed that the coming of JSON and YAML had sort of killed it, at least amongst the web-dev early adopters. It guess it just goes to show that everything old is new again, especially in the fast-moving world of web development tools.

Regardless, perhaps the most compelling picture Chris paints in his post is the idea of capitalizing on the off-line replication features of CouchDB to allow groups of people to separately work on a collection of documents, then merge their changes together at some point in the future. He leans heavily on a classroom metaphor, but I think the real potential may be more in the area of groupware and collaborative editing. Knowledge workers have been looking for the “holy grail” tool which combines the power of Word’s “track changes” with mixed on- and off-line authoring for a long time, and I think we’re finally building the infrastructure that will make that class of application relatively easy to build.

Looking over the CouchDB documentation, though, I still think there’s one major piece missing from their replication and conflict-resolution story: automatic merging of non-conflicting edits. Unlike a DVCS like Git, CouchDB still doesn’t (AFAIK) allow multiple contributors to edit different elements of a single document, and then commit those changes, without manually replaying edits from other contributors.

Since JSON is much more structured than raw text (which Git and other DVCS systems deal with handily enough), it seems tractable to examine potentially conflicting updates and to see if they’re isolated to different child nodes of the JSON document. Furthermore, given the degree to which CouchDB has already embrace the map/reduce model, I think you should be able to distill the conflict-resolution algorithm down to two steps: generate a “diff” in the map step, which just notes the original document ID and the changed attribute/subtree elements, and then a “reduce” which attempts to create a new document by applying those changes to the original document.

Regardless, I think it’s an interesting time to be involved in web development. The idea that you could grab just a subset of a larger data store, work with it both on- and off-line, then share your changes with a group of colleagues is a powerful one, and I applaud anyone (like Chris and the rest of the CouchDB team) working to make it possible.

Erlang warts

After more than a year of complaining about the syntax, I’m forcing myself to finally sit down and learn some Erlang. Between CouchDB, EjabberD, and all the other interesting projects people are implementing in Erlang, I would be remiss as a systems engineer to not at least pick up the basics.

Unfortunately, I’m still chafing a bit at a number of little annoyances:

  • The REPL is basically crippled since you can’t define functions. Being forced to think in terms of compilation units (rather than simple expressions) pisses me off.
  • Why oh why do I need to explicitly list the module name in my file header if I’m also bound by the restriction that filenames and module names have to be the same? The old Java package/file path ties were always a big annoyance when I was stuck in that environment.
  • For a functional language, there’s an awful lot of syntactic vinegar for basic operations like map and fold. I appreciate having a concise syntax for lambdas, but writing fun my_function/2 smells a bit.
  • Records (as syntactic sugar for tuples) are a poor substitute for a real type system. Both tutorial and real-world Erlang code I’ve seen is basically full of tagged tuples, which means you get the verbosity of a strongly-typed language without any of the ability of real type checking to catch errors at compilation time.

I want to stick with it long enough to find the real gems underneath all this noise. I mean, if I can sit through extended sessions reading and writing Perl, I should be able to find something to love about Erlang. Furthermore, most of the complaints I make above are inapplicable to mainstream languages — i.e., C and Java dont have an REPL or lambdas, and Ruby and Perl don’t have anything resembling a traditional compiler — not miraculously better.

I definitely think that learning a new language should make you feel a little bit uncomfortable. Unfortunately, right now Erlang leaves me feeling uncomfortable in all the wrong ways: I understand everything that’s going on with the language, and just don’t like it.

I’m going to keep plugging away for at least a little bit longer, though. Next up: reading the source to EJabberD to (hopefully) get a sense for idiomatic language use in a context where its unique features (lightweight concurrency + distributed computing) are a real advantage.