HTTP Caching
Over the weekend I was searching a well-known electronics manufacturer’s site for an LCD monitor. After finding the product page I was looking for, I bookmarked the page, and then went on to surf something else. After a few minutes, I came back and reloaded the bookmarked page. I expected my browser to load the page immediately from its cache, but it did not. I was curious to find out why. It turned out that the server did not send any cache related HTTP headers, forcing my browser to request for the page again. What a waste of bandwidth and CPU cycles!
The URL for this product looked like
http://.../page4.do?dau22.oid=5199&UserCtxParam=0&GroupCtxParam=0&dctx1=25&ctx1=US&crc=712082047
I guess that this server retrieved the product info from a database and generated the page dynamically using the
query parameters. If implemented using Struts, the following snippet could generate the product info page.
public ActionForward execute(ActionMapping mapping, ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception
{
// Get the product info
// Forwards to product_info.jsp
return mapping.findForward("success");
}
Not so surprisingly, whoever developed or configured this server assumed that all content generated dynamically should not be cached. Knowingly or unknowingly, most web apps are made cache-unfriendly. In fact, there is more info and awareness on preventing browsers from caching pages than on improving cacheability of dynamically generated pages.
(I also noticed that this server sends a "Connection: close" response header, forcing the browser to create a new connection for every file downloaded. The server says that it speaks HTTP 1.1, but does not apparently support persistent connections.)
Most web apps use some form of caching on the server side. There are a few open-source cache solutions that can be used to cache HTML markup fragments, database query results, and pretty much all kinds of objects. When used carefully, these caching solutions can help improve performance and scalability. However, when it comes to HTTP caching, most dynamic content generated by web apps is not cache-friendly. Dynamic content does not mean that it should not be cached! For instance, this product info is fairly static, and this server could have taken advantage of HTTP caching.
Unlike server-side caching, HTTP caching is cheap, as it pushes the responsibility of caching to caching proxy servers and browsers. As I discuss in this post, it is not difficult to make dynamic pages HTTP friendly. HTTP1.1 supports two forms of caching - one is based on expiry interval (called expiration caching), and the other is based on conditional requests (called validation caching). The choice between these two depends on the nature of content/data being displayed.
Expiration Caching
HTTP1.1 specifies certain response headers to help clients (browsers, proxy servers etc.) decide whether to cache the response, and if so, how long. We can change the above Struts action to add these headers to allow expiration caching:
public ActionForward execute(ActionMapping mapping, ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception
{
// Get the product info
// Add headers
long current = System.currentTimeMillis();
long expires = current + 86400000;
response.addDateHeader("Expires", expires);
response.addDateHeader("Last-Modified", current);
// Forwards to product_info.jsp
return mapping.findForward("success");
}
In this example, I assumed that the product info will not change for the next 24 hours. This is rather arbitrary.
When this action is run, the client receives the following additional response headers:
Expires: Wed, 05 Jan 2005 03:49:18 GMT Last-Modified: Tue, 04 Jan 2005 03:49:18 GMT
This informs the client when this response was last modified and when it expires, so that the browser can cache the response body till it expires. Browsers and clients typically use the request URL (with all query parameters) as the cache key.
Next time when the user requests for the same product, the browser sends the following request header to the server:
If-Modified-Since: Tue, 04 Jan 2005 03:55:47 GMT
This header tells the server that it need not return the response body if it has not been modified since last-modified time. We can modify the above Struts action to check for this header before generating the response:
public ActionForward execute(ActionMapping mapping, ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception
{
long header = request.getDateHeader("If-Modified-Since");
if(header > 0) {
response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
return null;
}
else {
// Get the product info
// Add headers
long current = (new Date()).getTime();
long expires = current + 86400000;
response.addDateHeader("Expires", expires);
response.addDateHeader("Last-Modified", current);
// Forwards to product_info.jsp
return mapping.findForward("success");
}
}
By returning a "null", this action causes the server to return without response body. In this example, I bluntly assumed that the product info has not changed. However, if you know that the product info has changed (for example by querying some timestamp field either in-memory or in the database), you can do more precise validation. This modified Struts action generates the following response line:
HTTP/1.x 304 Not Modified
This response status tells the client that the response has not changed. At this time, the client can use the cached response.
You can further control the caching behavior by adding a Cache-Control header. For instance, you can force caching proxies not to share the cached response with other users by adding the following header:
Cache-Control: private
Validation Caching
This form of caching is more fine grained, and involves the following:
- The server generates a tag (a string) along with the response body. The tag is called an "entity tag" and is supposed to uniquely represent the response. The server sends this tag to the client via the "ETag" response header.
- On subsequent requests, the client returns the tag via the "If-None-Match" request header.
- The server checks whether the tag is still valid for the request. If found valid, the server returns a 304 response code and skips the response body. Otherwise it generates complete response.
The trick in validation caching is generating the key that encapsulates staleness or freshness of the content/data used to generate the response body. You can encapsulate any kind of arbitrary data into the key. For instance, we can use a combination of the query used to retrieve the info and the language used to generate the response to generate a key.
public ActionForward execute(ActionMapping mapping, ActionForm form,
HttpServletRequest request,
HttpServletResponse response) throws Exception
{
String etag = request.getHeader("If-None-Match-Since");
// Compute the new etag
String currentETag = ...;
// If match
if(currentETag.equals(etag)) {
response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
return null;
}
else {
// Get the product info
// Add headers
response.addHeader("Cache-Control", "public");
response.addHeader("Etag", currentETag);
// Forwards to product_info.jsp
return mapping.findForward("success");
}
}
When this action is executed for the first time, the server returns the following response headers along with the response.
Cache-Control: public Etag: 123456789
The browser and/or the caching proxy uses the etag value as part of its cache key for caching the body of the response.
Next time when the user requests for the same product, the browser sends the following request header:
If-None-Match: 123456789
This header makes the GET request conditional, and tells the server that it need not return the response body if the entity tag matches the expected.
If the user is authenticated and if the response is meant for the user and should not be shared, it is a good idea to include the user’s info (e.g. the user’s login name) to compute the entity tag. This way, when the user logs out, the entity tag will not match the expected value for the request.
Note that, validation caching tags can be used in combination with expiration caching headers, and HTTP1.1 requires that the more restrictive headers win in this case.
Legacy No-Cache Tags
For historical reasons, some people use the following HTML tag to indicate browsers that the content should not be cached.
<META http-equiv="Pragma" content="no-cache">
Browsers tend to honor this HTML tag, but I doubt if pure HTTP caching proxy servers detect this tag in the body of the response and not cache it. To guarantee consistent behavior, the best approach is to use equivalent HTTP caching headers. An equivalent HTTP response header is
Pragma: no-cache
You can also return an "Expires" response header with some value in the past.
When I started looking into the product info page issue, my objective was to figure out whether it is easy to make HTTP caching-related decisions in code on the server side. It turns out that the solution is not complicated. There are a few more variants to the above HTTP caching headers that I have not discussed in this post. You can these variants to make more fine-grained caching decisions on the server side.



Great post! I currently don’t have caching enabled on my blog, but it’s for a legacy technical reason I’ve been too lazy to deal with (it is unneccessarily difficult to tell whether the result would be ‘modified’. I’m currently not in a position where my server or bandwidth is tried enough to matter anyway - but it’s great to see someone cover the ground so thoroughly.
Anyway, just wanted to compliment the post.
Cheers - R.J.
Thank you so much. It was indeed a great knowledge transfer for me. I have been ignoring these features in my project.
regards,
Sekar
Often, the perceived speed can be greatly increased by adding response header
Cache-control: public, mag-age=3600
to all static ressources (*.js, *.css, *.gif), so they will be cached for an hour.
Especially when using https, most browsers will by default never cache results (since they are assumed to be confidential). In this way, you can let the browser know that it’s OK to cache utils.js, main.css and blank.gif.
This greatly improves the user experience, since the browser doesn’t need to have to wait for the roundtrip to a 304 Not Modified response before rendering the page.
It also takes some load of your server (though serving static resources is normally very cheap).
this is good piece on HTTP caching, i was trying to implemet caching in our product, found very useful
sridhar
Hi!
Nice entry. I am wondering if and if yes what is the influence of browser settings (IE: Check for newer versions of stored pages, FF: browser.cache.check_doc_frequency) and the hints you posted.
./alex
–
.w( the_mindstorm )p.
—
(http://themindstorms.blogspot.com)
Hi
Nice blog. Really helpful and inspiring. I have a question can u help me.
In my struts applicatyion i have to implement a search feature which searches files in different hosts(different machines) how to implemet this feature. Plz suggest me if there are any plug-ins/webservices available. Also kindly let me the know the development procedure.
Thanks a lot
i read all the help from your site.but i found some problem in my asp program.
i firefox backbutton not working when we use fckeditor(editor)
so please help me……..
I need to get rid of HTTP 304 msg’s from the network traffic. I am using Fiddler - HTTP Debugging proxy to findout the trafic. We have around 40% traffic with 304 status code. I used HTTP header “Expires” and also “Last-Modified”, but it was not helpful for me. We have lot of static elements such as Image/gif’s, .js files, CSS. We are using jakarta-tomcat-4.1.31. Can you help me out. Is there are way I can directly set(Caching Static Elements) in tomcat server rather than changing the code. If not can you please let me know how to fix this in my code.
How about static images that Tomcat or other containers serve. How you specificy an Expires or Last-Modified for these?
I’ve not looked at what Tomcat does, but I would assume that it would set a Last-Modified header based on the file’s modified date.
If Knuth is right that premature optimization is the root of all evil, then it’s a Good Thing that there’s more information on preventing browsers from caching than on improving cacheability. The user experience penalty for failing to cache is merely slowness (possibly imperceptible). The user experience penalty for caching incorrectly is getting the wrong data.
Given the general bugginess and unreliability of software, I would place worrying about trying to cache dynamically generated product catalog pages very low on the list of things to do (how many minutes, hours, days, or weeks of incorrectness do you want the user to see when you make a product change?)
The low-hanging cache fruit is in resources that are never fetched directly (from a user perspective), but only via on-page references: images, javascript, css, etc.
Ideally, all CMSs would make it easy to generate unique URLs for each revision of each of these resources (and automatically regenerate all resources containing references to them), which could then be marked as “cache forever”.
This kind of caching is easy to do without fear of ever presenting the user with stale/incorrect data, and it’s currently used by very, very few websites in the world.
Ron,
>> If Knuth is right that premature optimization is the root of all evil, then it’s a Good Thing that there’s more information on preventing browsers from caching than on improving cacheability. The user experience penalty for failing to cache is merely slowness (possibly imperceptible). The user experience penalty for caching incorrectly is getting the wrong data.
You are right about user experience. However, I would argue that not setting cache headers correctly would provide a deteriorated user experience. Being universal HTTP-clients, browsers do expect certain clues about how to treat a given markup, and caching headers do provide one set of clues.
>> Given the general bugginess and unreliability of software, I would place worrying about trying to cache dynamically generated product catalog pages very low on the list of things to do (how many minutes, hours, days, or weeks of incorrectness do you want the user to see when you make a product change?)
It depends on what kinds of traffic you are expecting and how much resources you have on the server side. As Mark Nottingham once remarked (http://www.mnot.net/blog/2005/11/26/caching), caching is a way of distributing your application.
>> The low-hanging cache fruit is in resources that are never fetched directly (from a user perspective), but only via on-page references: images, javascript, css, etc. Ideally, all CMSs would make it easy to generate unique URLs for each revision of each of these resources (and automatically regenerate all resources containing references to them), which could then be marked as “cache forever”. This kind of caching is easy to do without fear of ever presenting the user with stale/incorrect data, and it’s currently used by very, very few websites in the world.
On the contrary, caching of static resources is the first step, and all major web sites spend quiet a bit of effort in this area. In addition to setting long cache expiry times, these sites also offload these resources to CDNs.
Subbu
Hi friends,
How to cache a web page in the client machine’s tepmorary Internet files folder using the struts app.
Try setting the Last-Modified and Cache-Control headers. For example:
Last-Modified: Mon, 18 Feb 2008 17:57:32 GMT
Cache-Control: public, max-age=864000
Please be aware that “public” means that any intermediate cache can store the response. Set this to “private” if the response is user-specific and should not be cached in public caches.