URI Escaping and java.net.URLEncoder
In the Java-land, it is common to use java.net.URLEncoder to safely encode reserved characters into URIs. However, what java.net.URLEncoder does is to apply application/x-www-form-urlencoded encoding which is different from treating reserved characters in the RFC-3986 way.
Here is what HTML 4 says about application/x-www-form-urlencoded:
Forms submitted with this content type must be encoded as follows:
- Control names and values are escaped. Space characters are replaced by `+’, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH’, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as “CR LF” pairs (i.e., `%0D%0A’).
- The control names/values are listed in the order they appear in the document. The name is separated from the value by `=’ and name/value pairs are separated from each other by `&’.
On the other hand, reserved characters in URIs are governed by Sec 2.1 of RFC-3986 which requires percent encoding for all reserved characters.
These rules are quite almost the same, except for the treatment of the space character. Per HTML 4, space should be encoded into + where as RFC-3986 requires it to be encoded into %20.
The odd wrinkle is forms using method GET. If you submit a form with method GET, the browser uses the HTML rules to encode spaces in parameters to + to create the query string. This behavior in browsers is incorrect, since HTML rules for encoding apply only when the request body is encoded using application/x-www-form-urlencoded. Since only POST and PUT requests carry a body, these HTML rules should apply only to those types of requests, but browsers do apply these to GET requests as well, thus contradicting URI escaping rules in RFC 3986. The only rationale I can think of is that it retains a parity between GET and POST which many web developers are happy to treat as equivalent. But the issue does not go away since the server developer has to know which rules to use to de-escape parameters in URIs, thus introducing stronger coupling between client side code to generate/submit requests and the server side code to process those requests.
Something to watch out for if you are using java.net.URLEncoder and java.net.URLDecoder for escaping special characters in URIs!



Have just been bitten by this one myself. Some more rationale for the java.net.UrlEncoder can be found in this bug report - http://java.sun.com/j2se/1.4.2/docs/api/java/net/URLEncoder.html
I ended up switching to the alternative implementation found on Google Data APIs here -
http://code.google.com/apis/gdata/javadoc/com/google/gdata/util/httputil/FastURLEncoder.html