in HTTP

HTTP Caching

Over the weekend I was searching a well-known electronics manufacturer’s site for an LCD monitor. After finding the product page I was looking for, I bookmarked the page, and then went on to surf something else. After a few minutes, I came back and reloaded the bookmarked page. I expected my browser to load the page immediately from its cache, but it did not. I was curious to find out why. It turned out that the server did not send any cache related HTTP headers, forcing my browser to request for the page again. What a waste of bandwidth and CPU cycles!

The URL for this product looked like

http://.../page4.do?dau22.oid=5199&UserCtxParam=0&GroupCtxParam=0&dctx1=25&ctx1=US&crc=712082047	  

I guess that this server retrieved the product info from a database and generated the page dynamically using the
query parameters. If implemented using Struts, the following snippet could generate the product info page.

    public ActionForward execute(ActionMapping mapping, ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception
{
// Get the product info

// Forwards to product_info.jsp
return mapping.findForward("success");
}

Not so surprisingly, whoever developed or configured this server assumed that all content generated dynamically should not be cached. Knowingly or unknowingly, most web apps are made cache-unfriendly. In fact, there is more info and awareness on preventing browsers from caching pages than on improving cacheability of dynamically generated pages.

(I also noticed that this server sends a "Connection: close" response header, forcing the browser to create a new connection for every file downloaded. The server says that it speaks HTTP 1.1, but does not apparently support persistent connections.)

Most web apps use some form of caching on the server side. There are a few open-source cache solutions that can be used to cache HTML markup fragments, database query results, and pretty much all kinds of objects. When used carefully, these caching solutions can help improve performance and scalability. However, when it comes to HTTP caching, most dynamic content generated by web apps is not cache-friendly. Dynamic content does not mean that it should not be cached! For instance, this product info is fairly static, and this server could have taken advantage of HTTP caching.

Unlike server-side caching,  HTTP caching is cheap, as it pushes the responsibility of caching to caching proxy servers and browsers. As I discuss in this post, it is not difficult to make dynamic pages HTTP friendly. HTTP1.1 supports two forms of caching – one is based on expiry interval (called expiration caching), and the other is based on conditional requests (called validation caching). The choice between these two depends on the nature of content/data being displayed.

Expiration Caching

HTTP1.1 specifies certain response headers to help clients (browsers, proxy servers etc.) decide whether to cache the response, and if so, how long. We can change the above Struts action to add these headers to allow expiration caching:

    public ActionForward execute(ActionMapping mapping, ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception
{
// Get the product info

// Add headers
long current = System.currentTimeMillis();
long expires = current + 86400000;
response.addDateHeader("Expires", expires);
response.addDateHeader("Last-Modified", current);

// Forwards to product_info.jsp
return mapping.findForward("success");
}

In this example, I assumed that the product info will not change for the next 24 hours. This is rather arbitrary.

When this action is run, the client receives the following additional response headers:

Expires: Wed, 05 Jan 2005 03:49:18 GMT
Last-Modified: Tue, 04 Jan 2005 03:49:18 GMT

This informs the client when this response was last modified and when it expires, so that the browser can cache the response body till it expires. Browsers and clients typically use the request URL (with all query parameters) as the cache key.

Next time when the user requests for the same product, the browser sends the following request header to the server:

If-Modified-Since: Tue, 04 Jan 2005 03:55:47 GMT

This header tells the server that it need not return the response body if it has not been modified since last-modified time. We can modify the above Struts action to check for this header before generating the response:

    public ActionForward execute(ActionMapping mapping, ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception
{
long header = request.getDateHeader("If-Modified-Since");
if(header > 0) {
response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
return null;
}
else {
// Get the product info

// Add headers
long current = (new Date()).getTime();
long expires = current + 86400000;
response.addDateHeader("Expires", expires);
response.addDateHeader("Last-Modified", current);

// Forwards to product_info.jsp
return mapping.findForward("success");
}
}

By returning a "null", this action causes the server to return without response body. In this example, I bluntly assumed that the product info has not changed. However, if you know that the product info has changed (for example by querying some timestamp field either in-memory or in the database), you can do more precise validation. This modified Struts action generates the following response line:

HTTP/1.x 304 Not Modified

This response status tells the client that the response has not changed. At this time, the client can use the cached response.

You can further control the caching behavior by adding a Cache-Control header. For instance, you can force caching proxies not to share the cached response with other users by adding the following header:

Cache-Control: private

Validation Caching

This form of caching is more fine grained, and involves the following:

  • The server generates a tag (a string) along with the response body. The tag is called an "entity tag" and is supposed to uniquely represent the response. The server sends this tag to the client via the "ETag" response header.
  • On subsequent requests, the client returns the tag via the "If-None-Match" request header.
  • The server checks whether the tag is still valid for the request. If found valid, the server returns a 304 response code and skips the response body. Otherwise it generates complete response.

The trick in validation caching is generating the key that encapsulates staleness or freshness of the content/data used to generate the response body. You can encapsulate any kind of arbitrary data into the key. For instance, we can use a combination of the query used to retrieve the info and the language used to generate the response to generate a key.

    public ActionForward execute(ActionMapping mapping, ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception
{
String etag = request.getHeader("If-None-Match-Since");

// Compute the new etag
String currentETag = ...;

// If match
if(currentETag.equals(etag)) {
response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
return null;
}
else {
// Get the product info

// Add headers
response.addHeader("Cache-Control", "public");
response.addHeader("Etag", currentETag);

// Forwards to product_info.jsp
return mapping.findForward("success");
}
}

When this action is executed for the first time, the server returns the following response headers along with the response.

Cache-Control: public
Etag: 123456789

The browser and/or the caching proxy uses the etag value as part of its cache key for caching the body of the response.

Next time when the user requests for the same product, the browser sends the following request header:

If-None-Match: 123456789

This header makes the GET request conditional, and tells the server that it need not return the response body if the entity tag matches the expected.

If the user is authenticated and if the response is meant for the user and should not be shared, it is a good idea to include the user’s info (e.g. the user’s login name) to compute the entity tag. This way, when the user logs out, the entity tag will not match the expected value for the request.

Note that, validation caching tags can be used in combination with expiration caching headers, and HTTP1.1 requires that the more restrictive headers win in this case.

Legacy No-Cache Tags

For historical reasons, some people use the following HTML tag to indicate browsers that the content should not be cached.

<META http-equiv="Pragma" content="no-cache">

Browsers tend to honor this HTML tag, but I doubt if pure HTTP caching proxy servers detect this tag in the body of the response and not cache it. To guarantee consistent behavior, the best approach is to use equivalent HTTP caching headers. An equivalent HTTP response header is

Pragma: no-cache

You can also return an "Expires" response header with some value in the past.

When I started looking into the product info page issue, my objective was to figure out whether it is easy to make HTTP caching-related decisions in code on the server side. It turns out that the solution is not complicated. There are a few more variants to the above HTTP caching headers that I have not discussed in this post. You can these variants to make more fine-grained caching decisions on the server side.

  • Related Content by Tag