10:17 AM, Friday, January 18, 2008

Comments on Serendipitous Reuse

I just finished reading an article in IEEE Computer titled "Serendipitous Reuse" (pdf here) by Steve Vinoski. There is also some interesting discussion on this article at Steve Vinoski's blog. The central premise of this article is that uniform interfaces such as the those offered by REST encourage reuse, where as specialized interfaces such as those routinely done in various distributed object/interface models such as RPC, CORBA, WSDL etc reduce the chances of reuse. Steve's paper is a good read, and I agree with most of his analysis about why specialized interfaces are complex to build and manage and why reuse does not happen as much as the designers intended. However, the premise that uniform interfaces promote serendipitous reuse is, in my opinion, incomplete. Uniform interfaces do not increase chances of reuse, unless of course, the interface is solving a generic problem, and is adhering to the hypermedia constraint of REST.

What I liked most about this article is the observation that

serendipity is the best explanation for Web mashups, in which the capabilities of unrelated Web sites are combined to create new sites that provide benefits beyond those that the original developers had intended or even considered.

Well said. If we look behind the most popular interfaces today, there are two reasons for success:

  • providing generalized interfaces, such that general-purpose client libraries or techniques could be used to create client applications, and
  • solving some complex yet generic problem.

Take for example, the APIs for photos from Flickr, maps from Google, reviews from Yelp, and the APIs for geographical data from Geonames. The interfaces for these services are not necessarily RESTful, but these APIs are generic enough to be usable without complex client side code, and a lot of interesting and useful applications have been built and are being built using services such as these. But genericity can only take us so far, just like XML did several years ago. A more important fact is that each of these services provide solutions to problems that can not easily be replicated by others quickly and efficiently. That's the key for reuse. The more specific the problem is, the less likely it is to be reused.

Upon discussing why specialized interfaces inhibit reuse, one of the key conclusions that this paper draws is that

If the proliferation of specialized interfaces inhibits reuse, reducing interface differentiation should increase it.

While specialized interfaces inhibit reuse, the converse that reducing interface differentiation should increase reuse does not necessarily follow. Reducing interface differentiation lowers the barrier of entry for client application development, and whether reuse happens or not depends on the kind of problem being solved by the interface.

Explaining the difference between "uniform" and "as generic as possible" interfaces, this paper gives the following example:

BagOfBytes processThis(BagOfBytes);

As this paper argues, such an interface is necessarily semantically weak. So, let me rewrite this interface using HTTP verbs.

GET /this/...?...

POST /this/...?
...
PUT /this/...?
...
DELETE /this/...?

This resource-oriented interface is uniform and the operations on the resources as indicated by HTTP verbs have definite meanings. Moreover, these operations are not opaque to intermediaries so that intermediaries can provide value such as caching without parsing the contents of messages. Since such an interface is uniform, it also lowers the barrier of entry for client application development. An application can use any well-known HTTP client library to interact with this interface since the interaction protocol is well-defined, well-understood, and well-supported. However, is this interface semantically stronger than the original BagOfBytes processThis(BagOfBytes) interface? I don't think so, for a couple of reasons.

First of all, while the operations in the resource-oriented interface have well-defined semantics, the semantics of the data are not immediately known. Certain well-defined media types (such as images, markup, etc.) do have well-defined semantics associated with them, but a vast majority of data, particularly enterprise data, is too specific to have common and well-understood semantics. A "purchase-order/xml" in one context may mean gibberish in another context. So, unless the client knows the semantics of the this resource in the above interface, the client can not deal with that resource correctly.

Secondly, to make this interface semantically complete, the interface implementor must also provide information (such as hypermedia) on how to interact with the interface. For example, can an application DELETE the resource this? If so, under what circumstances? Answer to both these questions can be addressed by the interface providing hypermedia along with the resource. However, as I discussed in Hypermedia and REST I am not convinced that it is possible and pragmatic to always provide machine readable hypermedia.

Even though uniform interfaces are not semantically complete, I do agree with Steve that uniform interfaces offer a lot of value for applications. Reuse is a different matter.

Comments

Hi Subbu, thanks for your well-considered comments. I see we agree on a lot of things, but of course, I respectfully disagree with your conclusion. :-)

Consider the UNIX shell pipeline. It allows independently-developed tools conforming to its semantics to be chained together serendipitously to create new tools. This reuse is based entirely on the uniform stdin/stdout bytestream interface. Different data formats can pass through the pipeline, depending on the tools involved and the command-line options given to them. The tools, of course, have to understand the data formats they exchange. But best of all, the same power applies to more than just the shell; any UNIX app can open its own pipes and chain together whatever tools it wants to.

Does this mean all UNIX tools are reusable? Of course not. But without the simple framework that the pipe capability provides, the tools couldn't be combined in that fashion to create new tools. More importantly, the presence of the pipelining framework drives the form of the tools. Nobody would even think to develop tools that performed small tasks that could be combined into larger ones if the pipes weren't there to begin with.

Now, what if each tool instead specified its own specialized interface for sending data into it or getting data out of it? No more ad hoc tool chains. You'd have to develop special applications just to interact with each tool's interface, and there's less likelihood of that because the benefits are limited.

REST as realized in HTTP has some similarities to this. The uniform interface is useful for many distributed applications (and as we both agree, is specifically tuned to deal with issues related to networking and distribution), and MIME types standardize the formats of (some of) the data such applications produce and accept. The existence of this framework drives developers to reuse it; they build their applications to fit the framework in order to gain its benefits, and the more applications that use it and conform to how it's intended to be used, the more likely they are to be usable together and reusable. But without such a framework in place, applications are forced to define their own interfaces and their own data formats, thus forcing anyone who wants to communicate with those applications to build specialized code to do so. Again, this is less likely because the benefits are fewer.

At QCon San Francisco in November 2007, I watched Pete Lacey build a RESTful expense reporting system during his talk. He had only a few minutes to do this, but first he had it producing HTML and manipulated with his browser, and then he added the ability to handle the CSV format, after which he could use his application via Excel. This in turn meant that any tools based on Excel suddenly became capable of using his expense reporting system. Adding support for other data formats would have expanded its capabilities and reusability even further. In the span of 10 minutes, Pete illustrated with a very simple yet very realistic example the very type of serendipitous reuse that my article talks about. But if it his system had had a specialized interface, neither the browser, nor Excel, or anything else could have (re)used it.

I don't disagree with your point that certain architectural styles are more amenable to reuse than others. What tripped me off was the discussion about semantics, which I disagree to.

Leave a comment