REST and Batch

by subbu on February 3, 2008

Over the past few couple of weeks, I came across this question several times. The question is, how can a client submit an arbitrary set of requests to a server in a batch, and get a single response from the server. At first, this seems to be a natural use case to support on the server side. For the client, this would save a lot of work – instead of sending several requests, the client would just send a single request and save on network latency. Late last year, I spent a week prototyping a server to process batch requests. It opened up some really interesting problems to think about. The batch models I have looked have plenty of holes to drive trucks through, particularly when it comes to updating resources.

The first proposal that I came across looked like this:

<batch>
<item type="GET uri="/myresources/myrsource/123">

</item>
<item type="DELETE uri="/myresources/myrsource/456">

</item>
<item type="PUT uri="/myresources/myrsource/789">

</item>
</batch>

submitted via a POST to a URI such as /myresources/batch.

This model is very similar to the batch processing model in Google Data APIs. Both models have the following characteristics.

  • The request URI is a generic end point that can consume requests to process arbitrary resources. This means that the URI used to submit the request does not uniquely identify the resource(s)s that the batch end point is processing.
  • The body of the request, and not the HTTP verb used to submit the request, describes the operations.This means that, while the request advertises that the operation is a POST (i.e non-idempotent and non-safe), the actual processing may be anything. Depending on what the body is, the request could be completely safe (like a GET or OPTIONS), idempotent (like a GET, OPTIONS, PUT and DELETE), unsafe (like POST), or any combination of these. Only the client and the server would know what those operations are. That is, the operation remains completely opaque to proxies, caches and other intermediaries.
  • Depending on what was encoded in the request body, as a result of processing a batch request, the server may have created new resources, or updated some resources, or deleted some other resources, or may have done all these within the same request. Some of those requests may have succeeded and some failed. The server has no easy way to communicate the results via HTTP response headers, and will need to encode the results in the response body. That is. the response remains completely opaque to all intermediaries.

In this model, since the URI is not necessarily tied to the resources a given request is operating upon, such a batch operation could as well be implemented as a proxy that sits between clients and servers, taking each incoming batch request, relaying those back to each server, wait for responses, packet those responses into a single response payload, and return to the client.

Anyone smell SOAP? Well, this is a perfect case for SOAP/WS, and REStafarians would quickly reject this model. In fact, such batch processing, when mixed with RESTful resource access is more harmful than SOAP over HTTP as the requests and responses would be opaque sometimes and transparent some other times, and intermediaries won’t know what to make of request and response headers.

An alternative that comes to mind is pipelining the same set of requests over HTTP/1.1. With pipelining, instead of packing several requests into a single request body, the client would make several requests over a pipelined connection. However, there is a catch with pipelining. If the server closes the connection in the middle of processing requests over a pipelined connection, the client is supposed to resend all the requests. To make such a retry safe, the requests sent over a pipelined connection need to be idempotent, thereby limiting its use to GET, HEAD, OPTIONS, PUT and DELETE. Yes, pipelining has the advantage of not breaking any rules. URIs continue to uniquely identity resources, and operations remain transparent, and request headers and response headers continue to mean something. Pipelining is better than encoding those requests in a SOAP-like request body.

Recently, James Snell posted a note on this topic, suggesting the new HTTP method PATCH combined with multi-part messages. The above batch request can be rewritten as:

PATCH /myresources
Content-Type: multipart/mixed; boundary=xyz

-xyz
Content-Id: 
Batch-Operation: GET uri="/myresources/myrsource/123
Content-Type: ...

-xyz
Content-Id: 
Batch-Operation: DELETE uri="/myresources/myrsource/456
Content-Type: ...

-xyz
Content-Id: 
Batch-Operation: PUT uri="/myresources/myrsource/789
Content-Type: ...

Despite the fact that his approach requires some extensions to HTTP, I like his approach for a few reasons:

  • The URI used for the requests identifies a resource whose contents are requested to be operated upon as described by the each part in a multipart request.
  • The request method is PATCH, and therefore identifies that the resource described by the URI is being modified.
  • The multipart request body identifies the changes the clients wants to be applied to the resource.
  • Each part in the request and response can have their own respective HTTP headers in a transparent manner.

One downside with both pipelining and PATCH + multipart/mixed requests is that the server can not guarantee atomicity. If atomicity is necessary, it should be done differently. Here is an alternative, still based on the PATCH method.

PATCH /myresources
Content-Type: application/xml
If-Unmodified-Since: ...

<myresource>
<resource id="456" patch-type="delete"/>
<resource id="789" patch-type="put">
... representation ...
</resource>
</myresource>

The difference between this and the first approach is that the body is just a patch onto an existing resource, with the body of the resource providing instructions on how to modify an existing resource. The server can provide atomicity if desired, since the entire patch can be executed at once.

{ 2 comments… read them below or add one }

1 Brian McCallister February 3, 2008 at 8:03 pm

I believe pipelining is the orthodox restafarian answer.

-Brian

[Reply]

2 Subbu Allamaraju February 3, 2008 at 8:51 pm

:)

Subbu

[Reply]

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Previous post: Cull Canyon Ride

Next post: URI Escaping and java.net.URLEncoder