subbu.org

HTTP, REST and some Cycling

Batching – Back to Basics

with 3 comments

In response to my post on PATCHing and BATCHing, John Panzer made a very valid comment.

How can you design a resource that lets you add a contact record, then retrieve contacts 11-20 out of 100, in one HTTP round trip, assuming that you don’t know a priori whether or not your addition is going to affect the resource you want to retrieve (contacts 11-20)?

This use case is not unique. I have come across several such use cases, and have seen similar problems being discussed elsewhere in the REST community. My first reaction was that, this use case requires a generic batch server to process these requests on behalf of the client. James Snell made a great start, and let’s say, we adopt such a solution for this use case.

Unfortunately this still leaves a number of questions unanswered.

  • Atomicity: The client is asking the server to perform some writes, but is not asking that these writes be performed automatically. Why? Why is suddenly atomicity not a big deal? Is the client developer sure that he does not care about atomicity? How about the server developer?
  • Ordering: The batch proposals by James Snell and the current batch model by Google execute the batch in the order the requests were composed into a batch. While processing the batch what is the batch server doing? Just waiting on IO for each request to be processed in series! Not an efficient use of CPU cycles, particularly when executing large batch requests. It does make sense to execute the batch in parallel, and compose the response once all the requests in the batch return their responses. But parallel execution can potentially alter the representations that the client would receive, particularly in the use case John Panzer describes. What is the best approach? Parallel or serial execution? Does the client developer understand the implications, and account for any oddities in the client code?
  • Complexity: Is creating a batch request simpler than creating a regular HTTP request? Not necessarily. Creating batch requests is more complicated than creating simple HTTP requests. It is most likely that the server has exposed an elaborate URI space that the client is already programming against without batch. Switching to a batch solution comes at a cost for the client developer – not only for creating batch requests, but also accounting for failures, retry etc.

The bottom line is that, such use cases are real, but deserve a lot more scrutiny.

On this note, it is better to take a step back and think about the basics, such as better URI design, better resource modeling etc.

If, at the end, such use cases are deemed important and must be addressed, as I argued in Untangling the BATCH Hairball, it is worthwhile to consider if such a use case be addressed by exposing additional URIs on the resource server, or by building new resource-serving servers that wrap existing servers. Why? Because such a middle-ground could provide more precise answers to the above questions.

To conclude, we need to raise the bar for applying batch execution as a solution.

Written by subbu

February 25th, 2008 at 9:19 am

Posted in HTTP

Tagged with

3 Responses to 'Batching – Back to Basics'

Subscribe to comments with RSS or TrackBack to 'Batching – Back to Basics'.

  1. Atomicity: This is an interesting and useful extension that could be requested/supported totally separately from batching. It’s also something which is a stumbling block if made a _requirement_ for a batching protocol (as seen in the discussions of the Atom Working Group). Batching is useful without atomicity, so deploy it first, and add atomicity as an option later if it’s useful.

    Ordering: Servers should be allowed to parallelize if they can prove the reordering doesn’t affect the results. Typically this would be if they know the semantics of the operations (GET to a known resource) and so can easily execute in parallel. Other extensions could be added later to let a client provide parallelization hints; again, this would be a stumbling block if made a requirement for the basic batching protocol.

    Complexity: Yes, it’s more complex, because it’s an optimization. However, as proposed it’s also completely optional, so it can be left out of a library or added in later without breaking clients. (Atomicity guarantees would for example make this impossible.)

    [Reply]

    John

    25 Feb 08 at 11:43 am

  2. Addressability of Fragments

    In his Addressing fragments in REST Simon St. Laurent says ……

    [Reply]

    subbu.org

    25 Feb 08 at 1:56 pm

  3. >> Ordering: Servers should be allowed to parallelize if they can prove the reordering doesn’t affect the results. Typically this would be if they know the semantics of the operations (GET to a known resource) and so can easily execute in parallel. Other extensions could be added later to let a client provide parallelization hints; again, this would be a stumbling block if made a requirement for the basic batching protocol.

    I don’t think an independent batching server can determine if reordering does not effect the results. It does not know if resources are somehow related in the backend (e.g. through some database).

    [Reply]

    Subbu Allamaraju

    27 Feb 08 at 11:35 am

Leave a Reply