03:00 PM, Sunday, December 16, 2007

A RESTful version of Amazon's SimpleDB

Dimitri Glazkov posted a quick comment on the rest-discuss mailing list stating that the design of Amazon's SimpleDB REST API is just a big welcome to 1999. The technology powering SimpleDB is definitely impressive - no doubt about it. It is a simple-to-use web service to manage structured data in a reliable manner hosted on Amazon's data centers. It does have limitations such as lack of immediate consistency (the "C"" of ACID), and I am sure Amazon will solve such problems eventually. However, as a REST API, it is a disappointment. The API failed (a) to identify resources, and (b) to specify operations on resources in a RESTful way. It uses a single verb GET to create, delete, update, or get data from the store. As Julian Reschke commented, this is worse than SOAPy REST, or using POST for everything as is done with SOAP over HTTP. The SimpleDB API is neither resource oriented nor HTTP friendly.

Having said that, how should such an API be designed in a resource-oriented manner? Here is my take, a version-0.1 of a RESTful SimpleDB.

In the design below, I tried to keep the semantics of this version as close as possible to the oficial SimpleDB API, but please comment in case you see errors or omissions. The design below uses URI templates in a loose manner, and I have made no attempts to formalize the syntax.

Resources

The first step is to define what the resources are. For SimpleDB, the resources are domains, items and attributes. A domain is similar to a table containing rows of items. Each domain has a unique name. Each item has a unique identifier (its name), and a arbitrary set of attributes. Each attribute has a unique identifier (its name), and one or more values.

We can map these resources into a hierarchical resource tree for domains, items, and attributes, so that one can traverse from the root of the space to a given domain, from a given domain to a given item, and a given item to its attributes, as shown below.

Resource hierarchy for SimpleDB resources

Here is the API.

Create Domain

In Amazon's SimpleDB API, the CreateDomain operation is non-idempotent only the first time. Subsequent invocations of this operation do not create new domains. Instead, they return a reference to a domain already created. Here is my version.

POST /{AWSAccessKeyId}/domains

DomainName={DomainName}

where {AWSAccessKeyId} is the access key to Amazon Web Services (AWS), and the body of the request contains the name of the domain {DomainName} as a application/x-www-form-urlencoded string.

Delete Domain

In SimpleDB, the DeleteDomain operation deletes a domain.

DELETE /{AWSAccessKeyId}/domains/{DomainName}

where {DomainName} is the name of the domain.

List Domains

The ListDomains operation in SimpleDB lists all domains associated with a given AWS account. The caller can optionally specify a maximum number of domains as well as paginate through sets of domains.

GET /{AWSAccessKeyId}/domains({MaxNumberOfDomains},{NextToken})

where {MaxNumberOfDomains} is the maximum number of domains to be returned, and {NextToken} is a token previously returned by SimpleDB in case of pagination. Both these optional.

Note that I am making up this syntax of specifying a subset of domains under a resource domains. Other alternatives are possible.

Put Attributes

Although an item is a valid resource in SimpleDB, there are no explicit operations to create, list, or delete an item. Items are created and deleted implicitly, and so I am following the same pattern.

The PutAttributes creates or replaces attributes in an item. If an item does not exist, it will create an item.

POST /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes

Attribute.{x}.Name={AttributeName}&Attribute.{x}.Value={AttributeValue}&Attribute.{x}.Replace={Replace}&...

where the request body is application/x-www-form-urlencoded containing the name of the attribute, value of the attribute, and an optional boolean to indicate whether the current value should be replaced or a new value should be added. The parameter {x} is a sequence number.

In place of using Replace parameter, a better approach is to model attribute updates via a PUT as in

PUT /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes

Attribute.{x}.Name={AttributeName}&Attribute.{x}.Value={AttributeValue}&...

to replace several attribute values in bulk, or several PUT requests to replace each attribute as in

PUT /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes/{AttributeName}

AttributeValue={AttributeValue}&...

The latter approach is more intuitive since it follows the resource hierarchy more naturally.

Delete Attributes

The DeleteAttributes operation in SimpleDB can be used to delete one or more attributes associated with an item. This can be done via the following:

DELETE /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes

to delete an item including all the attributes in that item,

DELETE /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes/{AttributeName}

to delete a specific attribute with the given {AttributeName}, and

DELETE /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes/{AttributeName}/{AttributeValue}

to delete a specific value of a given attribute.

Get Attributes

The GetAttributes operation returns all or specific attributes associated with a given item. This operation can be modeled as follows:

GET /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes

to return all the attributes of a given item {itemName}, and

GET /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes/{AttributeName}

to return the values of a specific attribute {AttributeName} of the given item.

Query Items

The Query operations returns a set of items that match the given query expression. This can be modeled as follows:

GET /{AWSAccessKeyId}/domains/{DomainName}/items({QueryExpression})

where the optional token {QueryExpression} specifies an expression to filter items as described in the SimpleDB documentation.

In this exercise, my starting point was Amazon's definition of the REST API, which I refactored into a RESTful version without breaking the usage pattern. Several variations of this approach are possible, but the key point to make is that (a) it is important to identify what the resources are, and (b) then think of mapping various operations into known HTTP verbs for the API to be RESTful, without losing focus on the net benefits of building an API over HTTP. This is not hard.

TrackBacks

TrackBack URL for this entry: http://mt4.subbu.org/mt-tb.cgi/73

» Why is Bad REST Easy? from subbu.org

In my view, the answer is simple. We have long been following action-oriented principles to designing distributed software, and action-oriented interfaces, when translated into REST, simply put, suck.... Read More

» Amazon SimpleDB: The Good, the Bad and the Ugly from Dare Obasanjo aka Carnage4Life
» Idempotency Explained from subbu.org

In response to David Peterson's The SimpleDB Epiphany: I Finally GET It... Why RFC 2616 Is To Blame I posted a quick note here, and a few comments on his blog. He asked a few questions, but as I got caught up with my day job, I could not completely elaborate my comment ... Read More

Comments

MikeD said:

Thanks - this is a good example of how straightforward defining a RESTful approach can be. There probably are plenty of things that would evolve with this, but it's good to start with the right approach.
One comment - the ListDomains operation probably would benefit from the server returning a full URL to the 'next' block in the list, rather than a token that the client would have to insert into an undefined URL.

Noah Slater said:

Have you taken a look at CouchDB, a free software implementation of a document-oriented database? CouchDB provides a fully RESTful interface much like what you have discussed here.

http://couchdb.org/

This seems more shy of query string parameters than necessary. For instance, GET /{AWSAccessKeyId}/domains({MaxNumberOfDomains},{NextToken}) is just screaming to use query string parameters.

Several methods involve multiple return values, like GET .../attributes/ -- maybe urlencoding the results would be good, and then you can use PUT to the same address to update multiple attributes at once. Lacking consistency, I'm not even sure if you can do optimistic locking. If so, I guess you could calculate a composite etag and most-recent last-modified, and do a batch update against that. But without consistency, what isn't a conflict now could be a conflict when things sync up again.

Having multiple locations where data can get updated for fetched does mean that cache consistency is a little harder; just because a particular URL hasn't been PUT to doesn't mean it hasn't changed. But I don't think that's a usable feature for this case anyway. I don't think such strict restfulness is as important/useful as atomicity of multiple updates.

There's no reasonable restful solution I see to doing batch updates without a single representative URL for all the updated items. Except perhaps if you really used the query string much more and defined a clear container format, so you could do something like fetch /items?item1&item2&item3, or PUT to that same URL. I guess you could use ; instead of ?/& to placate people who have some deep aversion to those characters (maybe an ampersand killed their parents or something). These sorts of resources can still really be resources, but they are very much created on demand, and relate to other resources in a way that's not machine-obvious.

"This seems more shy of query string parameters than necessary. For instance, GET /{AWSAccessKeyId}/domains({MaxNumberOfDomains},{NextToken}) is just screaming to use query string parameters"

I can see your point. Both MaxNumberOfDomains and NextToken can as well be encoded as part of the query string.

"Having multiple locations where data can get updated for fetched does mean that cache consistency is a little harder; just because a particular URL hasn't been PUT to doesn't mean it hasn't changed. But I don't think that's a usable feature for this case anyway. I don't think such strict restfulness is as important/useful as atomicity of multiple updates."

Caching can be addressed via etags.

"URL for all the updated items. Except perhaps if you really used the query string much more and defined a clear container format, so you could do something like fetch /items?item1&item2&item3, or PUT to that same URL. I guess you could use ; instead of ?/& to placate people who have some ..."

I agree.

Daniel Yokomizo said:

That's great, but why no love for put? Create Domain should use put too, after all the caller should be in control of the final URI. Also putting attribute can be made simpler: instead of using Attribute.{x}.Name={AttributeName}&Attribute.{x}.Value={AttributeValue}& it can just be AttributeName=AttributeValue&AttributeName=AttributeValue, etc., as the order can be implied or even unnecessary.

It's interesting to see that a good actual RESTful API can be so natural to define and yet be so wrongly designed by people who don't understand it.

P.S.: It would be much better for your blog usability to have a captcha that didn't require javascript. I got an error on my submission because I had javascript disabled and had to enable it just for this.

"That's great, but why no love for put? Create Domain should use put too, after all the caller should be in control of the final URI."

I agree.

"Also putting attribute can be made simpler: instead of using Attribute.{x}.Name={AttributeName}&Attribute.{x}.Value={AttributeValue}& it can just be AttributeName=AttributeValue&AttributeName=AttributeValue, etc., as the order can be implied or even unnecessary."

Yes, of course.

Nice article. I wanted to print it for serious reading later, but printing doesn't work in either IE7 or Firefox. Try using the print preview; you'll see what I mean.

"Nice article. I wanted to print it for serious reading later, but printing doesn't work in either IE7 or Firefox. Try using the print preview; you'll see what I mean."

Sorry about that. I just fixed the CSS.

Baz said:

the CreateDomain operation is non-idempotent only the first time.

That doesn't make sense. An operation being 'idempotent' doesn't tell you anything about the first invocation, only that the result of subsequent calls will be the same as the first.

I think you meant to say that the CreateDomain operation is idempotent but is not safe (in the senses of those words used in the http spec).

You are right - it does not make sense, but that is how it is currently defined by Amazon.

pwb said:

Agree with the above that querystrings seem like a better approach for some of these thigns.

Also, are you really suggesting:
POST /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes ?

Or do you mean:
POST /{AWSAccessKeyId}/{DomainName}/{ItemName}/attributes ?

"Agree with the above that querystrings seem like a better approach for some of these thigns."

IMO, query strings should only be used for those things that can not be mapped into URIs. The URI path segments identify resource structure (as trees or graphs) more meaningfully.


"Also, are you really suggesting:
POST /{AWSAccessKeyId}/domains/{DomainName}/items/{ItemName}/attributes ?

Or do you mean:
POST /{AWSAccessKeyId}/{DomainName}/{ItemName}/attributes ?"

For this example, either should be fine. I inserted "/domains", "/items" etc. for extensibility reasons. In future, you may insert other kinds of sub-resources.

pik said:

+1 TO ianbicking.org.

PS: as lightly colorblind person, the captcha is a hell.

PS2: this blog does not survive the "back button" test, comment get lost, is this restful ?

Chris Jolley said:

+1

I'd like to see the ability to handle multiple Items in a single request, especially when returning sets of data.

{ItemName} => {*} {1,2,4} or {1-10}

Tim Olsen said:

If you know what the URI will be ahead of time, you can use PUT to create a resource.

So instead of:

1. POST /{AWSAccessKeyId}/domains
2.
3. DomainName={DomainName}

Just do:

PUT /{AWSAccessKeyId}/domains/DomainName

Now it's idempotent. :-)

Agreed :)

Leave a comment