Can Pipelining Help?

Saturday, February 5, 2011

HTTP pipelining is often suggested as a way to dramatically improve page load times, or to solve multi-GET use cases for RESTful applications. Whether pipelining can achieve the intended effect or not truly depends on what gets pipelined and how the server implements pipelining.

When using pipelining, a HTTP client sends idempotent HTTP requests (such as GET) without waiting for response of previous requests, and expects responses to arrive in the same order from the server. HTTP 1.1 says nothing about order of processing of requests on the server side — servers can process each request in sequence or in parallel. All that matters is the order of responses. However, in the real-world, pipelining is not often used due to a number of interoperability issues. Mark Nottingham recently captured some of these issues in an internet draft:

Anecdotal evidence suggests there are a number of reasons why clients don’t use HTTP pipelining by default. Briefly, they are:
Server implementations may stall pipelined requests, or close their connection. This is one of the most commonly cited problems.
Server implementations may pipeline responses in the wrong order. Some implementations mix up the order of pipelined responses; e.g., when they hit an error state but don’t “fill” the response pipeline with a corresponding representation.
A few server implementations may corrupt pipelined responses. It’s been said that a very small number of implementations actually interleave pipelined responses so that part of response A appears in response B, which is both a security and interoperability problem.
Clients don’t have enough information about what is useful to pipeline. A given response may take an inordinate amount of time to generate, and/or be large enough to block subsequent responses. Clients who pipeline may face worse performance if they stack requests behind such an expensive request.

Even if we fix all the interoperability issues (such as 1, 2, and 3 above), pipelining will not necessarily improve anything. Unlike non-pipelined requests, clients need to know a bit about the server’s implementation before deciding to pipeline requests. Here is why.

The key constraint in pipelining is that the server must send responses in order. This leads to the so-called head-of-line blocking problem.

Assume that the client opens a connection and sends three GET requests, g1, g2, and g3. Of these, let’s say that g1 takes longer to process than g2 and g3. But the server is still required to return responses in the sequence of g1, g2, and g3. Here is one possible implementation in a multi-threaded server.

Server receives a connection, and it gives the associated channel/stream to a thread t0
Server starts parsing the data in t0
Server finds g1, and hands it off to an application handler h1 in thread t1
Server finds g2, and hands it off to an application handler h2in thread t2
Server finds g3, and hands it off to an application handler h3 in thread t3
h2 finishes first and wants to write response — server blocks it since h1 has not finished yet
h3 finishes next and wants to write response — server blocks it too since h1 has not finished yet
h1 wants to write response — since g1 is the first request, the server lets it
Server unblocks h2, and it writes response
Server unblocks h3, and it writes it to response

In this model, the server explicitly blocks application handlers from writing response until it is their turn. Alternative implementations are possible:

The server can wait to read the next request (i.e., the request line, headers and any body) until the previous request is completely processed.
The server can buffer responses of application handlers (at least of those that finish earlier than previous requests) and write them in order to the client.

Over some limited tests during the weekend, I found that both Netty and Tomcat follow the first approach while Nodejs follows the second approach. Both approaches have their limitations, in particular when one of the requests early in the pipeline takes time to complete. In such cases, the client is better off sending g1 over one connection, and pipeline g2 and g3 on a second connection. This will reduce the serialization window on the server. However, in order to make such a choice, the client needs to have some prior idea about workloads involved in processing each request. When such information is difficult to assert (e.g., for a browser sending requests to an arbitrary servers), connection reuse via keep-alive is safer bet than pipelining. In any case, it is better to test before enabling pipelining in clients.

because writing is clarifying

Can Pipelining Help?

See Also