in Uncategorized

Media Types, Plumbing and Democracy

One of the recurring debates in the REST community is the use of media types. There are two opinions about the use of media types.

  • Opinion 1: Web services must use standard media types to be RESTful.
  • Opinion 2: Custom media types are necessary to keep interactions visible, and to serve as contracts.

Of these, the first one is based on a literal interpretation of the following from Roy Fielding’s thesis (emphasis mine).

REST enables intermediate processing by constraining messages to be self-descriptive: interaction is stateless between requests, standard methods and media types are used to indicate semantics and exchange information, and responses explicitly indicate cacheability.

Per this opinion, use of media types such as application/vnd.example.myway+xml is not RESTful. Period. This is an extreme opinion. I don’t know if Roy ever meant to say that, but even if he did, it needs a more rational explanation than a slap on the wrist that use of custom media types is not RESTful. Being RESTful should never the end goal in itself. Attempts to settle such arguments by saying that "your design is not RESTful" because "Roy said so" is no different from using "my priest said so" to settle moral and ethical questions. What is important is to understand the impact of such media type usage in the real world.

The second opinion is based on visibility. In RESTful applications, messages ought to be visible at the protocol level, and one of the attributes that helps maintain visibility of messages is the media type. For instance, how can anyone know if a representation that uses application/xml media type describes a purchase order, or a photo album? If the web service uses media types like application/vnd.example.po and application/vnd.example.album, then any one can interpret the semantics of the representation without parsing the body of the representation. Per this line of thinking, a media type is an identifier for message semantics, and message recipients use media types to trigger processing code.

    if("application/vnd.example.po").equals(response.getMediaType())) {
       // process purchase orders
    }
    else if("application/vnd.example.album").equals(response.getMediaType())) {
       // process albums
    }

This approach is palatable for people looking for "RESTful contracts". Any client that understands the semantics of application/vnd.example.po can process a message with no out-of-band-knowledge. Extending this model further, there are attempts to attach version identifiers and schema references to media types. The end goal of these attempts is to let arbitrary clients and servers determine how to process (not just parse) a message by simply looking at the media type.

All the examples of the second approach work well for XML formatted messages, thanks to RFC 3023. Introducing new media types for other existing formats breaks interoperability with existing software, and designing interoperable new formats takes effort. It also requires registration as per RFC 4288.

So what is the right thing to do? Here is my democratic approach.

  • If the sender is formatting representations using standard extensible formats such as XML or JSON, use standard media types such as application/xml and application/json.
  • Mint new media types when you invent new formats.
  • If you are just looking for a way to communicate application level semantics for XML and JSON messages, use something else (e.g. XML namespaces and conventions).
  • If the goal is versioning, use version identifiers in URIs.

(For those not clear about the difference between a format and a media type, a media type is an identifier for a format.)

My rationale is simple. Media types such as application/xml and application/json are good enough for XML and JSON message processing in code. On the other hand, most of the HTTP plumbing (e.g. proxies and firewalls), and HTTP plumbers (e.g. admins) care more about URI patterns and much less about media types. In fact, your admin may offer you a "you must be kidding" look if you ask him/her to setup routing rules or security policies based on media types. URI based approaches are guaranteed to work across the stack. Ignoring real-world interoperability for the sake of "architectual purity" or "RESTful contracts" may eventually back fire. Your applications may not require such interoperability today, but you never know. Your admins will eventually hate you if they are, using their current tool stack, asked to support 1001 media types used by 101 applications.

But how would clients know the semantics of messages? There are other ways.

    if(poQName.equals(responseDoc.getRootElement().getQName())) {
       // process purchase orders
    }
    else if(albumQName).equals(responseDoc.getRootElement().getQName())) {
       // process albums
    }

Making such checks does weaken visibility. But my flip-flop is based on concerns about real-world interoperability with the HTTP plumbing. Off-the-shelf software is less adaptable than custom application code.

Write a Comment

Comment

28 Comments

  1. When using namespaces to determine the semantics of the message how do you cope with an evolving set of application semantics? For example, consider the situation where the server needs to introduce some new semantics that are incompatible with existing PO format. How would you go about providing the new format to those clients that can handle it while simultaneously providing older clients with the previous version?

    The best approach i can come up with is to shove a version identifier in the URI. By do so you demanding that many clients (particularly any that persist URIs) support every version of the format/semantics. (For what it’s worth, most of the machine clients if have written have need to persist URIs.) For such a client there is no good way to transition from one set of semantics to another if the identifier is in the URI. This requirement does not just increase complexity in the client. Such clients also will never stop using the obsolescent variants so the server can never decommission them without damaging the client base.

    Is there another approach that i have missed? Or are you say this penalty is less costly than the penalty imposed by the plumbing on custom media types?

    • Hi Peter,

      I gave namespace as an example for code to figure out a way to learn about the XML. For versioning, using version identifiers in URIs is a safe bet.

      If the key motivation is to communicate application semantics, then there are cheaper ways. Sometimes, the client "just knows".

      • Version ids in URIs is a safe bet, but not a cheap one. It’s hard to implement on the server side and, at least for some types of clients,hard to implement on the client side. It makes it more difficult to evolve applications and to decommission obsolete formats.

        You didn’t really answer the final questions in my previous comment. I see the appeal of staying on the beaten path, particularly if i am over estimating that path’s cost.

        • Versioning is not cheap. In any case, as we debated in the past versioning by media types does not fly for non-XML types.

          Regarding your question, penalty of using custom media types for the sake of communicating semantics to custom application code is high. It goes up with the size of the service.

  2. Subbu:

    great to see you are blogging again ! Pete, good to see you are still working on the versioning problem. That’s a hard one to crack.

    This discussion sounds a bit heretic to me and I am afraid Roy is going to slap both of you on the wrist.

    The real RESTful way to handle interactions is “bookmarks”. Bookmarks can have versions in their URI. You are entering an interactions from the bookmark perspective. You should never enter an interaction from a URI that was stored from a previous interaction. Of course, if you do it that way, I would argue that a bookmark is pretty much equivalent to a Web Service “endpoint”.

    In the end, there is an incompressible way of expressing some semantics (versioning, contracts, identity, actions…) and whether you encode it in a “RESTful” way or not, these semantics will be isomorphic. An endpoint is an endpoint, a message format is a message format, a contract is a contract, an action is an action and wrapped literal is wrapped literal

  3. Subbu,

    I don’t think Dr. Fielding intended for the first option to be the “correct” understanding of how to use mime types. In his blog post of last year titled REST APIs must be hypertext-driven he writes:

    A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources and driving application state, or in defining extended relation names and/or hypertext-enabled mark-up for existing standard media types. Any effort spent describing what methods to use on what URIs of interest should be entirely defined within the scope of the processing rules for a media type (and, in most cases, already defined by existing media types). [Failure here implies that out-of-band information is driving interaction instead of hypertext.]

    I think his assertion that A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources argues for the use of custom mime types. Otherwise it would have been more correct to say “choosing the media types”. Mind you you could disagree with my attempt to equate media types with mime types.

    The question is “What is a standard mime type?” That is followed closely by, “Between whom must the standard be recognized?” If the answer to the second question is, “Those who care about the API”, then the answer to the first is, “What ever those who care about the API say it is.”

    Nevertheless, I disagree that the “cost” of custom mime types is (necessarily) high.

    First of all, all you have to do is say, “documents that conform to this schema/design/format are know as text/com.example.coolness+json” and you’re done.

    While registration may be desirable if you have a sufficiently large audience for your application, its hardly a requirement (and hardly a likely situation to exist at its inception). If you use something like application/com.example.coolness+json or application/com.example.coolness+xml then you should reasonably expect a generic JSON or XML parser to correctly parse the representation as XML, if that is necessary.

    Second, lets think about all the costs. What do you think would be the development cost of web browsers if we didn’t have text/html but rather the more “standard” application/sgml. I imagine it would make everything a lot less pleasant for the same reasons that simply using application/xml would… you would have to parse the entire document just to find out if you understood it or not. Similarly for clients that consume your API, knowing what its getting without having to parse everything will be hugely beneficial, especially if it needs to perform different behaviour based on receiving different types.

    It will never be the case that any user agent requesting your resource will know nothing about the structure of your resources; that would it completely useless. Even if you parsed a schema and knew that this thing was a URI and that thing was a string, you could not know how to use them. As such it is reasonable to assume that what ever user agent is requesting your resource will be a client designed to handle your mime type(s). Even if you’re building a web application, one can (and should IMHO) consider the browser and the XHR object it provides as part of the transport layer and the JavaScript code you write to deal with your resources is the “real” user agent.

    Finally, there is no need, indeed you probably should not, be stingy with your handling of mime types. The request should include an accepts header. If you find application/com.example.coolness+json then by all means, send your resource using that mime type, but if all you get is application/json then send that mime type. You may even elect to include schema information for the generic mime type that would be unnecessary for the custom type.

    So in conclusion, my opinion (for what little its worth), is that you should create as many mime types as you feel is necessary (although try to use or extend existing mime types if you can). If your application is only used by a handful of people, then there is not even any reason to register your mime type as those who care about it will already know what to do with it. If, however, your application becomes wildly popular, or more generally useful, then by all means please register the mime type.

    They are, after all, just strings. They’re not going to hurt anyone. ;)

    • Interestingly, there are several who would disagree with your interpretation. Having watched/participated in this debate a few times, I remain agnostic of either interpretation.

      It is better to go beyond the terminology and interpretations and look at the problem being solved and its viability. If you’re convinced that your XML needs a new media type, just make sure that there are no use cases or requirements for your plumbing to differentiate requests or responses based on media types. Some areas to watch out include (a) logging, (b) proxy level routing, (c) rate limiting, and (d) metrics. By the way, some of these will be problematic when you extend media types for versioning. That’s why I would be more careful about betting heavily on custom media types for versioning.

  4. Hi, Thanks for another great post.
    My understanding is Adam’s, and I arrived via the same quote of Feilding’s. So I’d appreciate your thoughts no his comments/observations.

    On the version issue: I’d come to the point of thinking it is so endemic it should be dealt with at the entry point. Taking Roy’s HATEOS literally I thought to make versioning such that if a client could process the entry point it could process what followed. If you’re going to entertain having 2 version numbers you might as well entertain having 2,000,000. That makes the server client seem fragile, which it is, and so bought me back to Roy’s point that adam raised. Spend your time on the media types “Any effort spent describing what methods to use on what URIs of interest should be entirely defined within the scope of the processing rules for a media type”
    Essentially I figured if client A expects field K then send that even to new clients can handle field L. If the bloat gets too much, then define a new entry point (version) for new clients that get the new data, possibly using the old media type since the methods to use on the new URI are defined within the scope of the media type.

    Not sure if that makes sense?

  5. A small variation I’ve had success with is to stick with standard media types but to add link headers with further type (or at least type-like) information.

    In described_routes (on hold unfortunately while the URI Template standard gets redrafted), the type identifiers come in the form of URIs that point to URI Template-based metadata, making the application nicely self-describing and facilitating the discovery of similar and related objects.

  6. Nice post detailing the two options. I’m curious, when you say…

    If the goal is versioning, use version identifiers in URIs.

    … how do you envision this as a solution when you’re trying to decouple the client and server using hypermedia and conneg?

    What are your thoughts on defining standard media types with parameters for versioning ? like perhaps application/xml; version=4 and to take it to the next level pass in explicit sematics of the media type in the parameters, like say, application/xml; type=po;version=4

    • Hypermedia and conneg play well when changes are compatible. But the changes are incompatible, most bets are off. Regarding version identifiers in media types, there are two reasons why I think I prefer not to tack version identifiers to media types

      - Media types should best be left for protocol visibility purposes. As far as the protocol (i.e. the uniform interface) is concerned, there are no versions of resources. There are just representations of resources.

      - The best practice on the web is to use conneg when representations are equivalent and differ only the encoding format used. In the case of versioning, representations of multiple versions of a resource are not equivalent.

      In other words, conneg != versioning.

      • > The best practice on the web is to use conneg when representations
        > are equivalent and differ only the encoding format used. In the case
        > of versioning, representations of multiple versions of a resource
        > are not equivalent.

        I don’t think it follows that different versions of a resource are not equivalent. I think that all responses to requests of resource using the same method are pretty close to equivalent regardless of the media type, almost by definition.

        Your argument would seem to imply that rather than Atom defining its own mime type it should have instead required that all Atom feeds have a URI distinct from the RSS and HTML representations. I see no difference between the transition from RSS to Atom syndication formats and the transitions required when introducing incompatible changes to proprietary service formats. In both cases incompatible changes to the representations of existing resources are required to support the desired functionality of the system.

        Versioning using media types seems pretty prevalent in the public web, e.g. RSS to Atom, and it seems to work rather well.

        • The objective of this post was to point of proliferation of media types, but looks like I am now compelled :)

          How about this – why not submit an I-D to formalize a versioning mechanism based on media types, and see how it goes? That way, there will be a better chance of getting the tool stack to support it, and also get people experienced with the HTTP plumbing weigh in?

          (BTW – RSS dealt with versions by using namespace URIs, and conneg is rarely used for feeds on the web.)

      • I tend to agree with Peter to an extent …

        Versioning using media types seems pretty prevalent in the public web, e.g. RSS to Atom, and it seems to work rather well.

        … in the sense that I see a version of a resource as a yet another representation of the resource. Like in the RSS or Atom representation example, both are equivalent but might not be compatible. Granted that passing version and message semantics in media types via the use of parametrized media types might not work as intuitively as one might like. By that I mean that versioning complicates the conneg between the client and the server to present the most appropriate version of the resource.
        Now, we can alleviate this problem by using new media types for each representation/version but that poses its own challenges.

        • >> I see a version of a resource as a yet another representation of the resource

          which version are you talking about? You have PO/123/v1 and PO/123/v2, i.e. the version of the PO instance and then you have the version of the business logic involved in “accessing” PO/123. I don’t see how “access” can be equivalent in any way. You have the notion of forwards compatibility, but there is almost by definition no “equivalence”.

          Guys, let’s face it, REST couples, access and identity and that problem, IMHO, cannot be resolved.

          • “version of the business logic involved in “accessing” PO/123″

            Why would a client care what version of the business logic (code?) was used?

      • “As far as the protocol (i.e. the uniform interface) is concerned, there are no versions of resources. There are just representations of resources.”

        I would use the exact same argument to show that versioning should be in the media type (which refers to the representation) than in the URL (which identifies the resource). You can have v1 representation of /marques/bentley or a v2 representation. The resource in either case is the same (the car maker), and thus so should be the URL. You are conflating the representation with the resource, saying that a v1 and a v2 format are different resources. I argue that they are not, only different representations of a conceptual Bentley, one for older clients and one for newer clients. I realise that what resource a URL represents is defined by the API developer. I am just trying to help developers make good decisions :)

        “The best practice on the web is to use conneg when representations are equivalent and differ only the encoding format used. In the case of versioning, representations of multiple versions of a resource are not equivalent”

        Not true. Content negotiation is negotiation between representations of the conceptual resource. Those representations DO NOT have to be equivalent. You say they should only differ by encoding. I presume you are thinking of e.g. identical text encoded in EUC-J and UTF-16. What about lossless bitmap, lossy bitmap, high-res bitmap, and vector versions of an image? I presume they are not “equivalent” to you. To me, they REPRESENT the same resource (e.g. the background star for my “buy now” button) and the client can use whichever it would prefer to render.

  7. Subbu

    You said:

    The objective of this post was to point of proliferation of media types, but looks like I am now compelled :)

    I hope you mean you’re convinced and not that someone has put a gun to your head and said “you will make many media types” ;)

    I’m a little lot out of touch with all the acronyms… what is I-D? I get your point though. If you make a public proposal on how to encode a version in a media type then a lot of people will have an opinion on how to do it, not just us nutters with a thing for REST.

    However, as an intermediate step, perhaps it would be useful to produce a “Media Types Design Best Practices” document(/blog/forum ???) where we can hash out how to be a good media type citizen.

    It strikes me that this is a start to such a discussion:

    0) Before creating a new media type, check for existing well supported media types that accomplish your needs exactly (say vCard for addresses) or that have adequate extension mechanisms that you can add the stuff you need (say by adding a custom namespace to RSS). Either justify why you’re not using those media types, or provide them as alternate representations if you go on to create your own.
    1) Create a new media type for every representation you design
    2) The schema/spec for your representation should provide adequate mechanisms for expansion (ie namespaces on XML, ignore things you don’t understand).
    3) For significantly similar versions, the representation itself should communicate the version and must be backwards compatible (ie fit withing the bounds of point 2). The representation should communicate the version as early as possible in the representation when the version changes how entities of the previous version should be handled.
    4) Create a new media type only when the differences between versions is such that clients that can consume version x would not be able to understand version x+1

    That, I think, strikes a good balance between over populating the media type space and versioning…. at least as a starting point

    • “However, as an intermediate step, perhaps it would be useful to produce a “Media Types Design Best Practices” document(/blog/forum ???) where we can hash out how to be a good media type citizen.”

      Great idea, did it go anywhere?

  8. I think his assertion that “A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources” argues for the use of custom mime types. Otherwise it would have been more correct to say “choosing the media types”.

    It is not more correct to say “choosing” as this would preclude creation of new media types. Neither is it correct to interpret “defining” to mean “creation” when Roy really means “documenting”. REST APIs should be described by documenting the media types used, whether they’re defined by the API or chosen.

    Any confusion on this point is cleared up by the canonical text in Roy’s thesis, that REST relies on the shared understanding of an evolving set of standard media types, or by reading through Roy’s clarifications of this very point in the comment thread of his referenced weblog post (<a href='http://tech.groups.yahoo.com/group/rest-discuss/message/14388&#039; excerpted here). REST works because out-of-band knowledge, like the meaning of text/html, is common knowledge. Only through common knowledge is decoupling achieved.

    When you mint a new media type, its initial implementation is based on uncommon out-of-band knowledge, i.e. the definition of your media type. The REST style allows for, and encourages, the growth of the Web while discouraging rampant proliferation of media types. It does this by encapsulating out-of-band knowledge in readily-standardizable form, i.e. media type definitions.

    A system based on undocumented custom media types can’t be RESTful. A system based on documented custom media types may or may not become RESTful over time, depending on whether that media type is adopted by enough implementations that its definition becomes common knowledge (through a standardization effort). Documenting your media type, in and of itself, isn’t going to attract sufficient re-use unless external developers feel they’re going to be participating in the definition of the media type, through some sort of standardization effort.

    Dictating the media type definition to external developers goes against the REST style — avoiding the standardization of your custom media type limits its chances of being accepted widely enough to become common knowledge. Hence, consumers of your media type will always be coupled to the server.

    Do you see the difference? Encoding knowledge within clients and servers of the other side’s implementation mechanism is what we are trying to avoid.

    Getting stuck in just this trap is what leads to the notion of media-type versioning, which is why REST frowns on such notion. It’s an inherent encoding of knowledge within clients of the server’s implementation mechanism.

    • Eric,

      I think you’ve made some good points, and I’ll be modifying my position on mime types.

      However, I think you’ve made a few errors:

      Dictating the media type definition to external developers goes against the REST style — avoiding the standardization of your custom media type limits its chances of being accepted widely enough to become common knowledge. Hence, consumers of your media type will always be coupled to the server.

      RFC 3023 provides the following rationale for the +xml suffix (which i think applies to our discussion):

      Appendix A. Why Use the ‘+xml’ Suffix for XML-Based MIME Types?

      Although the use of a suffix was not considered as part of the original MIME architecture, this choice is considered to provide the most functionality with the least potential for interoperability problems or lack of future extensibility. The alternatives to the ‘+xml’ suffix and the reason for its selection are described below.

      A.1 Why not just use text/xml or application/xml and let the XML processor dispatch to the correct application based on the referenced DTD?

      text/xml and application/xml remain useful in many situations, especially for document-oriented applications that involve combining XML with a stylesheet in order to present the data. However, XML is also used to define entirely new data types, and an XML-based format such as image/svg+xml fits the definition of a MIME media type exactly as well as image/png[PNG] does. (Note that image/svg+xml is not yet registered.) Although extra functionality is available for MIME processors that are also XML processors, XML-based media types — even when treated as opaque, non-XML media types — are just as useful as any other media type and should be treated as such.

      Since MIME dispatchers work off of the MIME type, use of text/xml or application/xml to label discrete media types will hinder correct dispatching and general interoperability. Finally, many XML documents use neither DTDs nor namespaces, yet are perfectly legal XML.

      I would argue that this reasoning applies equally well to any representation (that uses a common, standard format of encoding), and would argue, IMHO, for a similar RFC to be published for specifying a +json suffix with similar semantics.

      You also say:

      When you mint a new media type, its initial implementation is based on uncommon out-of-band knowledge, i.e. the definition of your media type. The REST style allows for, and encourages, the growth of the Web while discouraging rampant proliferation of media types. It does this by encapsulating out-of-band knowledge in readily-standardizable form, i.e. media type definitions.

      Simply using standard media types doesn’t solve this. You still have to document the vocabulary. That out-of-band communication runs into precisely the same roadblock. The use of a standard suffix (the +xml and the hypothetical +json), I would argue, eliminates that objection entirely.

      Further, the reliance on standard media types doesn’t address a key element of of REST hypertext. text/xml doesn’t tell you which tag and/or attributes represent a link to a URI. The XLink vocabulary does so leveraging that in your vocabulary will help. Even then it doesn’t tell you what the link if for, let alone what the meaning of the tag is. Now you might be able to describe some of that in a schema but there is no requirement to use a schema with XML and many reasons not to (notably that parsing a document as generic XML, and then applying its schema can be a LONG processes).

      You still have to communicate all the meaningful information out of band. All you’ve done is changed the label you apply to the thing you’re communicating out of band.

      I would agree if the majority of representations provided by REST APIs were binary. It makes little sense to supply image/my_cool_graphs when image/png works just fine. But most representations are text based, and the vast majority use XML and/or JSON.

      I think that the reasons provided in the appendix to RFC 3023 are form a compelling argument for using custom media types for, at a minimum, XML representations and ideally for JSON representations (provided we assume +json is standard).

  9. I don’t think your PO/Album example is a good one. A better one would be an Oracle Financials PO vs. a SAP PO. Or a Saleforce.com contact vs. a Thunderbird contact. If a client receives a URL how is it supposed to ask for, or find out a specific XML format exists? Conneg + custom media types is one way. Links are another. One causes an explosion of media types, the other an explosion of links. no? I don’t have time now, but I’ll blog more about what I mean by this.

    As for versioning, I was all for using a version tag within a media type along with conneg. Now, I’m not so sure, but, not because of this blog post. In Java land, with the current tools available, it would be very hard to implement a common URL scheme that served up different versions of the same application using conneg. The problem being that you’ll probably have different versions of the same Java class that process this data. Using hateoas pretty much can hide any complicated URL scheme that proliferates because of applicationv versioning.

    • In Java land, with the current tools available, it would be very hard to implement a common URL scheme that served up different versions of the same application using conneg.

      This is exactly the problem I was referring to in my comment about routing rules.

      • JAX-RS can route based on media type. That is not the problem. The problem resides in the fact that most (all?) JAX-RS implementations are built on top of a servlet container. Each web deployment is its own classloader and thus you can’t version classes within it. I don’t know of any servlet implementations that allow you to bind different web deployments to the same URL scheme.

        • Once the message gets into the app runtime (e.g. JAX-RS) routing is not a big problem, but I am talking about the message routing from the time it reaches some front-end proxy till the time it is fed into a runtime. This gets more interesting when you consider homegrown/commercial CDNs setup with origin servers half-way across the globe. When developers make choices about URIs and media types, they must not ignore these situations. These are more real, and in some cases, more important than application level considerations.