Resource Identity and Cool URIs

Tuesday, October 28, 2008

In response my InfoQ article on Describing RESTful Applications, some of the comments I received so far dealt with resource identity. When I sent a draft to Stefan in late October, he was curious to see why I used ID elements to capture a unique identifier of each resource.

Here is a snippet from one of the examples I used in that article.

<accounts xmlns="urn:org:bank:accounts">  
  <account>  
      <id>AZA12093</id>  
      <link href="http://bank.org/account/AZA12093" rel="self"/>  
      ...  
  </account>  
  <account>  
      <id>ADK31242</id>  
      <link href="http://bank.org/account/ADK31242" rel="self"/>  
      ...  
  </account>  
</accounts>

JJ, in his latest post on the same article makes a similar comment.

Interestingly Subbu also defined a proprietary ID mechanism reinforcing the idea that a URI is not generally used for identity purposes. I would have preferred a “link” to a unique resource identifier.

A similar thought was expressed by Nick Gall in a thread on rest-discuss where he says

So ultimately, I’d prefer to see all identifiers as URLs (not just URIs) and have such URLs be permanent.

Since URIs are supposed to be permanent, i.e., since cool URIs don’t change, we should be able to use URIs to identify a given resource, and ideally there should be no need for proprietary identifiers. However, in reality, URIs are unreliable substitutes for identifiers for client applications to rely upon. Allow me to elaborate.

Look at the following HTTP GET requests.

GET /person/abc  
Host: [www.example.org](http://www.example.org)

200 OK  
Content-Type: ...

<person>  
  <link href="http://www.example.org/person/abc" rel="self"/>  
  <link href="http://www.example.org/person/abc?include=addressbook"  
    rel="http://www.example.org/rels/person-with-addressbook"/>  
  <first-name>Subbu</first-name>  
  <last-name>Allamaraju</last-name>  
  <email>subbu@nospam.com</email>  
  ...  
<person>

GET /myapp/person/abc?include=addressbook  
Host: [www.example.org](http://www.example.org)

200 OK  
Content-Type: ...

<person>  
  <link href="http://www.example.org/person/abc?include=addressBook" rel="self"/>  
  <first-name>Subbu</first-name>  
  <last-name>Allamaraju</last-name>  
  <addresses>  
    <address>  
      ...  
    </address>  
    ...  
  </addresses>      
<person>

GET /myapp/people?like=subbu  
Host: [www.example.org](http://www.example.org)

200 OK  
Content-Type: ...

<people>      
  <link href="http://www.example.org/people?like=subbu" rel="self"/>  
  <person>  
    <link href="http://www.example.org/person/abc?view=mini" rel="self"/>  
    <first-name>Subbu</first-name>  
    <last-name>Allamaraju</last-name>  
  <person>  
  <person>  
    <link href="http://www.example.org/person/def?view=mini" rel="self"/>  
    <first-name>Subbu</first-name>  
    <last-name>Somebody</last-name>  
  <person>      
</people>

In each response, the client is receiving information about the same person. In the first case, it is receiving the first name and last name, in the second case, it is receiving first name, last name, and the person’s address book, and in the third case, it is finding the same person through a search.

Now, let us think of an on-line game review site that uses the server at http://www.example.org for all user data.

Here are some possible user scenarios.

I log into the game review site, and upon login, it greets me with my first name and last name.
I click on a link to view my address book.
One of my friends logs into this site, types in “subbu” in some search box, finds my name in the search results, and then clicks on a link to view reviews posted by me and all the contacts in my address book.

To implement these scenarios, the client needs to be able to (a) relate that all responses are referring to the same user, and (b) store additional data in its databases using the user’s identity as a foreign key in its database. What can the client rely upon?

Let me start with the “self” links. The person has a self link in each case, but they are all different. The client can not determine that the person with name Subbu Allamaraju found in the search results is the same as the one in the first or the second response. So, self links are useless to implement these scenarios.

There are three possible solutions to fix this problem.

Let the client guess that they all refer to the same person by trying to parse the URI.
Introduce another link with a relation value of, say, http://www.exampple.org/rels/identity and a URI that uniquely identify the entity in question.
Introduce an identifier in each representation that uniquely identifies the thing in question.

The first is an obvious no-no since it breaks URI opacity.

Of the remaining two options, I prefer the third one since what the client application needs is an identifier that uniquely identifies the entity, although the second option will work as well.

The key point is this. URIs uniquely identify resources but a URI used to fetch something is not always a good candidate to serve as a unique identifier in client applications. As I showed in the above example, there can be several URIs to fetch different kinds of information about the same entity. As far as HTTP is concerned, for the above example, there are three resources, each with a different URI. But as far as the client and server applications are concerned, we are talking about the same entity, which is a person. The URI that can be used to fetch these does not tell the client that they are the same. We need identifiers for that.

My design choice therefore is to include an identifier in every representation to uniquely identify the entity in question. I prefer using a URN as the value of these identifiers, since URNs are intended to serve as “persistent, location-independent, resource identifiers”.

because writing is clarifying

Resource Identity and Cool URIs