in Uncategorized

Resource Identity and Cool URIs

In response my InfoQ article on Describing RESTful Applications, some of the comments I received so far dealt with resource identity. When I sent a draft to Stefan in late October, he was curious to see why I used ID elements to capture a unique identifier of each resource.

Here is a snippet from one of the examples I used in that article.

<accounts xmlns="urn:org:bank:accounts">
  <account>
      <id>AZA12093</id>
      <link href="http://bank.org/account/AZA12093" rel="self"/>
      ...
  </account>
  <account>
      <id>ADK31242</id>
      <link href="http://bank.org/account/ADK31242" rel="self"/>
      ...
  </account>
</accounts>

JJ, in his latest post on the same article makes a similar comment.

Interestingly Subbu also defined a proprietary ID mechanism reinforcing the idea that a URI is not generally used for identity purposes. I would have preferred a “link” to a unique resource identifier.

A similar thought was expressed by Nick Gall in a thread on rest-discuss where he says

So ultimately, I’d prefer to see all identifiers as URLs (not just URIs) and have such URLs be permanent.

Since URIs are supposed to be permanent, i.e., since cool URIs don't change, we should be able to use URIs to identify a given resource, and ideally there should be no need for proprietary identifiers. However, in reality, URIs are unreliable substitutes for identifiers for client applications to rely upon. Allow me to elaborate.

Look at the following HTTP GET requests.

GET /person/abc
Host: www.example.org

200 OK
Content-Type: ...

<person>
  <link href="http://www.example.org/person/abc" rel="self"/>
  <link href="http://www.example.org/person/abc?include=addressbook"
    rel="http://www.example.org/rels/person-with-addressbook"/>
  <first-name>Subbu</first-name>
  <last-name>Allamaraju</last-name>
  <email>subbu@nospam.com</email>
  ...
<person>
GET /myapp/person/abc?include=addressbook
Host: www.example.org

200 OK
Content-Type: ...

<person>
  <link href="http://www.example.org/person/abc?include=addressBook" rel="self"/>
  <first-name>Subbu</first-name>
  <last-name>Allamaraju</last-name>
  <addresses>
    <address>
      ...
    </address>
    ...
  </addresses>    
<person>
GET /myapp/people?like=subbu
Host: www.example.org

200 OK
Content-Type: ...

<people>    
  <link href="http://www.example.org/people?like=subbu" rel="self"/>
  <person>
    <link href="http://www.example.org/person/abc?view=mini" rel="self"/>
    <first-name>Subbu</first-name>
    <last-name>Allamaraju</last-name>
  <person>
  <person>
    <link href="http://www.example.org/person/def?view=mini" rel="self"/>
    <first-name>Subbu</first-name>
    <last-name>Somebody</last-name>
  <person>    
</people>

In each response, the client is receiving information about the same person. In the first case, it is receiving the first name and last name, in the second case, it is receiving first name, last name, and the person's address book, and in the third case, it is finding the same person through a search.

Now, let us think of an on-line game review site that uses the server at http://www.example.org for all user data.

Here are some possible user scenarios.

  1. I log into the game review site, and upon login, it greets me with my first name and last name.
  2. I click on a link to view my address book.
  3. One of my friends logs into this site, types in "subbu" in some search box, finds my name in the search results, and then clicks on a link to view reviews posted by me and all the contacts in my address book.

To implement these scenarios, the client needs to be able to (a) relate that all responses are referring to the same user, and (b) store additional data in its databases using the user's identity as a foreign key in its database. What can the client rely upon?

Let me start with the "self" links. The person has a self link in each case, but they are all different. The client can not determine that the person with name Subbu Allamaraju found in the search results is the same as the one in the first or the second response. So, self links are useless to implement these scenarios.

There are three possible solutions to fix this problem.

  • Let the client guess that they all refer to the same person by trying to parse the URI.
  • Introduce another link with a relation value of, say, http://www.exampple.org/rels/identity and a URI that uniquely identify the entity in question.
  • Introduce an identifier in each representation that uniquely identifies the thing in question.

The first is an obvious no-no since it breaks URI opacity.

Of the remaining two options, I prefer the third one since what the client application needs is an identifier that uniquely identifies the entity, although the second option will work as well.

The key point is this. URIs uniquely identify resources but a URI used to fetch something is not always a good candidate to serve as a unique identifier in client applications. As I showed in the above example, there can be several URIs to fetch different kinds of information about the same entity. As far as HTTP is concerned, for the above example, there are three resources, each with a different URI. But as far as the client and server applications are concerned, we are talking about the same entity, which is a person. The URI that can be used to fetch these does not tell the client that they are the same. We need identifiers for that.

My design choice therefore is to include an identifier in every representation to uniquely identify the entity in question. I prefer using a URN as the value of these identifiers, since URNs are intended to serve as "persistent, location-independent, resource identifiers".

Write a Comment

Comment

  1. Hi Subbu,

    Great blog, after reading in detail a few of your posts I immediately subscribed. :)

    Can you give some more examples of how you see this problem being solved, specifically with URNs? (I’m still a bit new to REST, and rather igorant of URNs at this point)

    I understand you’re saying that these three URLS are not the same, so the client can’t easily know they refer to the same entity:

    http://www.example.org/person/abc
    http://www.example.org/person/abc?view=mini
    http://www.example.org/person/abc?include=addressbook

    btw, I have to say I do like your second suggestion, e.g.:

    (or http://www.exampple.org/rels/identity)

    - Alex

  2. This makes great sense, except that I wonder whether it works in practice. Obviously, the IDs need to be sufficiently unique (for some value of sufficiently). It is probably enough to have an ID unique among a single media type. That is, unless you have a compelling reason for the same “thing” to be returned using different media types. You would have no way to connect them and you would loose any opportunity at polymorphism. This probably a very edge case

    More compelling to me is the case where you have multiple domains all using the same type. You have no reason to expect that the numbering scheme in each domain will be in compatible with the others. At that point you’re down to tying it to source domain, and media type and you’re probably just better off (unless you can guarantee that you’ll always have control over how your api is used).

    • It is much simpler than that. Usually, most entities are stored in some backend DB, i.e., have some identity, which we generally encode into into URIs. I am suggesting to return the same as an identifier in representations. So, yes, it will be unique across all media types.

      When there are multiple domains dealing with the same set of entities, depending how they interact, they may need to use a common scheme for identifiers. This is already a common practice in a number of application domains.

      • Subbu, i think you’re demonstrably wrong here.

        You and I are both exposing REST APIs for Blogs. We have blog, post and comment resources. And we are both using the same standard media types for our representations. We know nothing about each other, nor about who is using our APIs.

        Jim has an aggregation application that is capable of consuming resources from both of us. When he requests a resource from both of us, its going to come from different domains, but they’re going to have the same media type and because databases issue IDs sequentially, our two resources have the same database ID. So now we have a legitimate collision that is only going to be resolved if Jim segregates those IDs by originating domain in which case origin Domain + DataBase ID which seems to me to be very much like a URI.

        Or lets take the case where we have two distinct resources (maintained in different database tables) that share a common media type. Lets use the idea of a banking API and we have accounts, but we also have wire transfers, and bill payments. For what ever reason, there is an agreement that the end points of a transfer resource (that is the originating and destination accounts) must conform be able to respond with a common media type. Now my accounts, bill payment vendors and SWIFT Account descriptions all have the same media type but are independent of each other and can conceivably have colliding database IDs.

        Or how do you reconcile the case when you have multiple representations of the same entity?

        I would submit that dealing with the first example (I hope the most common), makes it impossible to deal with the second or the example you allude to in you comment where multiple domains have something to say about the same entity.

        Identity is an issue that is certainly not new; Topic Maps have been dealing with this issue for a long time. One of the things they do is use a Published Subject Indicator (which is an URI) so that if two map are merged and there exists one or more topics with a common PSI, the refer to the same subject.

        We can do the same thing here:


        GET /myapp/person/abc?include=addressbook
        Host: www.example.org

        200 OK
        Content-Type: ...

        <person>
        <link href="http://www.example.org/person/abc?include=addressBook" rel="self"/>
        <link href="urn:org:exmaple:person:subbu" rel="PublishedResourceIndicator"/>
        <first-name>Subbu</first-name>
        <last-name>Allamaraju</last-name>
        <addresses>
        <address>
        ...
        </address>
        ...
        </addresses>
        <person>

        Here I’ve coined the term Published Resource Indicator (which should be shortened to PRI) such that we can say that any two representations with identical PRIs (following normal URI interpretation) refer to the same resource, regardless of the origin of resource and its media type. Because some representations (say an image) lack hyperlink semantics, you would want to add a header to HTTP to contain the PRI.

        I think this nicely resolves the issue. But there is one added benefit. If one were to use a network resolvable URI, then the question becomes, what do your get if you resolve that URI. Instead of getting a representation of the resource, I would send a human readable description (if appropriate), a list of media types known to represent the resource and a list of URIs known to return canonical representations of the resource.

        So the bottle neck isn’t with the use of a URI, but rather of the meaning of “self”.

        • @Little Fyr:

          On the first point, those apps should come up with a naming scheme for identifiers to avoid collisions. I am suggesting using raw DB IDs, but you could mint URNs based on those raw IDs.

          The idea of using a link in place of ID is fine. However, borrowing from Atom, you can accomplish the same with an id element/property.

          In any case, the key point of this post was to show that URIs used to access a representation of a resource can not always be used to determine resource identity. How an app chooses to include identity data in representations is an implementation choice.

  3. I apologise for that question comment there :) I thought I submitted a comment yesterday but I must have made mistake somehow.

    Subbu, great blog btw!

    Can you talk a bit more about URNs and give some examples? Personally solution #2 above appeals to me, but I don’t know much about URNs and would love to learn more.

    Thanks,

    - Alex

  • Related Content by Tag