Performance of RESTful Apps

Tuesday, March 1, 2011

I’m resuccitating this old artile to support some inbound traffic.

A while ago I showed how chatty some well-known apps are on my iPhone. But this issue is neither new nor unique to apps on phones and similar devices. Efficient data retrieval from distributed/decentralized servers is a well-recognized problem in distributed computing. For instance, in the abstract of his November 1994 paper A Note on Distributed Computing, Jim Waldo notes the following (emphasis mine).

We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure.

Most front-end developers by now know and follow the best practices that Yahoo!’s Exceptional Performance team documented a few years ago. However, the REST community may have missed the bus and has some catching up to do. Performance of RESTful Apps is not one of the most frequently talked about topics online or in print. From talking to various teams, it often seems that a great deal of time is spent on URI/representation design, schemas, use of the uniform interface for CRUD, using the hypertext constraint etc. No doubt — these topics are all very important, but understanding and accounting for the performance characteristics in the design and implementation of server and client apps is no less crucial.

Here are some techniques to help build high-performance RESTful apps.

Composites for Performance

The best of all web performance techniques is to minimize the number of HTTP requests. However, RESTful Apps rarely follow this practice. This difference stems from how each side sees resources.

On one side, front-end folks optimize their servers to serve bulk representations for CSS, JavaScript or image sprites, or even data URIs for images to reduce the number of HTTP requests and thereby latency. On the front-end, most resources are in fact composites.

On the other side, API/service developers prefer clean looking resources and URIs (shown on the right side above). Though this can lead to chatty network usage, there is one specific advantage in offering a set of resources that are independent and less coupled with other resources — it leaves room for clients to innovate. It lets them combine data from multiple resources in numerous ways that the resource developers could not possibly think of.

Loose coupling is another benefit of this approach, as clients can evolve rapidly on their own.

The expense is of course latency, particularly when those client apps are not very close to the servers. Each client may need to submit several requests to the server in order get its job done. So, how do we go about fixing this without sacrificing the flexibility of less-coupled resources? One answer is to use composite resources. See my RESTful Web Services Cookbook for details.

With a composite, in stead of sending n-number of HTTP requests over 1-n connections, the client can open just one TCP connection to send an HTTP request to retrieve the data it needs — just like a browser getting CSS or Javascript bundles in the front-end. A composite changes a pattern like

GET /something HTTP/1.1
Host: www.example.org

GET /something-else?params HTTP/1.1
Host: www.example.org

GET /some-other-thing-related-to-something?params HTTP/1.1
Host: www.example.org

GET /get-all-things-i-need-about-something?params HTTP/1.1
Host: www.example.org

Each composite can generate a projection of state required for one or more clients. These composites can be more specialized than the resources they aggregate - as each composite can cater to particular client needs.

(P.S.: My usage of the term “resource” is not precise here as a “composite” is also a resource.)

This approach also shifts issues related to concurrency (such as ordering of requests based on success or failure), CPU (for generating projections, correlating related data from across representations etc.) and I/O (to fetch back-end resource representations in a serial/parrallel fashion depending on dependencies) workloads from the client to the server.

On the server side, the server hosting composites can also optimize its connection handling to resource servers to reduce TCP handshake and slowstart overhead. For instance, it can maintain pools of persistent/long-lasting connections (e.g., with keep-alive) between servers hosting composites and the resources (shown by bold arrows above).

In this post, I’m not going to discuss software choices to serve composites, but you may need to account for several features:

multi-tenancy or isolation of code execution, configuration and deployment so that different teams can build composites
data or control flow for fetching representations in parallel or sequentially based on inter-dependencies
query languages (such as YQL) to normalize data formats and to easily create projections
non-blocking or asynchronous I/O to better tackle I/O workloads

I’m personally excited about nodejs and async I/O support in Java 7 as both would let us build small and nimble broker apps to serve composites.

Of course - the idea of a composite is to add an extra layer of indirection on the server side to offset network overhead when performance is at stake. It is not meant to replace loosely coupled resources that can be manipulated using HTTP and linked using hypertext controls like links in representations.

Better Connection Reuse

Long-lasting TCP connections help reduce connection-establishment overhead as well as help the TCP stack settle to appropriate congestion window size. Reusing connections is usually trivial, and pooling is often part of client libraries. But there are a few precautions to take at the application level.

Avoid Explicit Connection Closing

# Don’t do this
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 1234
Connection: close

{ … body … }

The first precaution to take is not to add Connection: close to requests or responses by default. Carelessly adding this header to requests or responses will prevent connection reuse. There better be a good reason to add this header such as a server that can’t handle too many open connections, or to prevent abuse.

Avoid Close Delimited Messages

# Avoid this
HTTP/1.1 200 OK
Content-Type: application/json

{ … body … }

Make sure to include Content-Length or use Transfer-Encoding: chunked so that the recipient of a HTTP message can know when one HTTP request/response message ends and when the next one starts. If a response has neither of these two, then the recipient will need to read the stream till the connection is closed, which means that the connection cannot be reused. Close-delimited HTTP request/response messages are bad for performance.

Read Messages Completely

Incomplete reads will also prevent reuse of the connection. Incomplete reads usually happen when a client receives an error or a redirect response. These kinds of responses can have a body but clients can determine what to do by just looking at the response line and headers. For instance, a 409 response may have a body that explains why.

HTTP/1.1 409 Conflict
Content-Type: text/html
Content-Length: 1234

...

But not reading the body from the connection my prevent reuse in some frameworks - Java is known for this. Also be wary of libraries that translate 4xx and 5xx response code into exceptions - in this case, in addition to catching the exception, the client will need to read the body.

Tune Idle Connection Timeouts

Servers and clients usually close idle connections upon a timeout to conserve resources. Settings for idle connection timeout may be hard to find or even not exposed in client/server frameworks. When tuning is possible, ensure that the defaults are reasonable.

Try Proxies for Long-Haul Traffic

In some cases a configuration like the following can help:

Client apps that are short-living (like an app on a mobile/tablet or even on desktops), and hence connections can’t be persistent. Instead, the proxy can keep connections persistent which limits connection establishment cost to the first leg from client app to the nearest proxy.
Client apps that can’t maintain too many persistent connections — which is still the case for browsers today — though it is slowly changing.

Of course, this approach also lets the server distribute responses to caches on those proxies to further reduce network cost. Many variations of this approach are possible depending on how your servers are distributed and how far are client apps from servers. If you’re new to the idea of REST and are still wondering why HTTP’s uniform interface is such a big deal, here is why - once you implement HTTP reasonably correctly, you can reconfigure servers, proxies and caches as necessary without code changes.

Progressive Serving of Representations

Sometimes it is not the network, but generating a response is the bottleneck. This is particularly true for composite resources or resources that rely on a number of data sources to generate a response to the client. The typical flow in such cases is as follows:

read the request data such as the path and query string
decide what to fetch
fetch data from each dependent source in sequence or concurrently to the extent possible (which depends on dependencies)
prepare data for the response
write the data to the response

Of these steps, when I/O for dependent sources is done sequentially, the server takes at least n*t time to generate a response. If all the I/O can be done in parallel, it takes max(t) amount of time, i.e, it performs at least as slow as the slowest source.

On the front-end side, when it takes time to generate a page, a common practice is to turn to XMLHttpRequest or iframes to split the page into fragments and defer loading of slower parts of the page. Both these techniques potentially use additional connections. In a multi-tiered setup, this causes a flood of new requests from the browser to front-end servers, and from there to backend servers and so on. This also introduces new state management and security problems as the server may need to push state first to the browser only to get it back via XMLHttpRequest immediately.

An alternative is to progressively render the page over a single connection. In this case, the flow would be

read the request data such as the path and query string
decide what to fetch
fetch data from fast sources
initiate requests for slow sources
serve partial page based on response from fast sources
as and when a slow source responds, prepare a partial response and write to the client
after all the sources respond (or after some timeout), write additional chunks and finally end the page

Here, by “chunk” I mean “part of a message” and not an HTTP chunk.

The goal of this technique is to reduce user-perceived latency without using more network connections from the browser. In this flow, browser makes an HTTP request to the front-end server which writes snippets of markup and script over a period of time before ending the response. Since the server does not know the Content-Length of the page, it would use chunked transfer encoding where end of response is triggered by a zero-sized chunk.

This is called “progressive rendering”. This technique is well-known in front-end circles and Facebook calls this technique BigPipe. Progressive rendering depends on two things:

Server being able to write chunks over lasting connections — asynchronous I/O based servers like nodejs are very attractive for this (see my nodejs example or Bruno’s example using continuations).
Clients being able to process response as it arrives — in Javascript capable browsers, this capability is already present.

We can apply this technique for non-front-end resources as well provided (a) it is possible to retrieve data from fast sources before slow resources, and (b) data from fast sources is meaningful to clients. For instance, think of a personalized product resource that includes data about a product plus IDs, links, and brief summaries of related products. In this case, product data can be looked up from storage under a near-constant time (say, about 20 milliseconds) while finding related products may involve performing some computations on the user profile, past purchase history and other derived data which can be time consuming - say, taking up to 500 milliseconds. Here is an example of a progressive representation of such a product resource.

HTTP/1.1 200 OK
Content-Type: multipart/mixed; boundary=abcdef
Transfer-Encoding: chunked

--abcdef
Content-Type: application/json

{ … product data here … }

--abcdef
Content-Type: application/json

{ … related products here … }
--abcdef–

In this example I used a multipart media type as it provides a visible boundary between different portions of the representations, and the client can read the representation part by part.

If the client is a front-end app that generates an HTML product page for browsers, it can progressively render the product page as soon as it receives the first part, and then render markup for list of related products when the second part arrives.

HTTP/1.1 200 OK
Content-Type: text/html
Transfer-Encoding: chunked

...... HTML for the product data ...

<script> // update related div ...</script>

This shows how progressive generation of arbitrary representations can be combined with progressive serving of the front-end to reduce perceived latency.

Epilogue

One of the patterns to notice from this post is that design considerations between HTML serving front-end apps and JSON/XML/whatever-speaking RESTful apps are not entirely different. Both rely on the same set of core architectural principles such as the uniform interface, visibility, hypertext, and so on. Whatever lessons we learn on the front end are certainly applicable for the so-called API servers.

Finally, it goes without saying that premature optimization is evil. My goal of this post is to point out the techniques you may already have in your toolkit. Apply them based on the need and experimentation.

If you find this post useful, try my book: RESTful Web Services Cookbook.

because writing is clarifying

Performance of RESTful Apps

Composites for Performance

See Also