Tag Archives: http

Is REST === ROA?

I’ve had this feeling of unease around REST for awhile, and I’m coming to the conclusion that there’s a central disconnect in the abstraction we’re using for the RESTful model.

REST is “Representational State Transfer”, so at least nominally it centers on the ideas of “state” and “representation”. But the state in a RESTful communication is tied to the representation, and only indirectly to the underlying resource.

That is, if I request a data resource using a GET, I get data back that represents the state of something in the system. But the data, whether it’s XML or JSON or tab-delimited lines, is just a convenient way to represent the data, rather than a pure serialization of the underlying database rows or whatnot.

For example, I might throw some things into the representation, like calculated values, or maybe hyperlinks, that don’t exist in the underlying object. Or I might leave some stuff out that the client has no business seeing.

What makes it worse is that the “pure” data objects in my system might have several representations. Or else I might be able to GET resources that combine several underlying data objects. Or a lot of underlying data objects. Worse, the GET might return a representation of a calculated result, for which there’s no persistent underlying data object at all.

There is this idea, that I am fond of, that RESTful services are the same as Resource-oriented Architecture. That if you expose a service, follow all the REST rules, and diligently follow the GET PUT POST DELETE model, then you have an ROA system.

But trying to claim that REST is ROA puts us in a hard place when we look at questions like, “What about searches?” Search is one of those hard problems in the REST world, because it clearly belongs there, and yet there isn’t a persistent underlying resource that maps to a search result.

Ultimately, I think it comes down to a shared misconception that RESTful communications are Resource-oriented. But I don’t think that’s right — they are Representation-oriented.

I’m still kicking the idea around, but in the end I think we’re going to have to get rid of the idea that REST is ROA. They are very compatible, but still not the same.

There are already standards around for communicating data structure as well as data, but I think we’re going to have to rely on those to provide our ROA. REST is a useful model for shaping communications to remote services, but there’s still a big disconnect with what we’d really expect from a true ROA.

EDIT: I just did a little more reading, and it sounds like it boils down to: “resource” doesn’t have a single definition. Lots of specs mean different things when they say “resource”. So that is an area of emerging clarity. In essence, saying you’re doing “ROA design” is like saying you’re doing “?OA design”. I guess I’m too practical, so it’s easy for me to discount the “resource” part and focus on the practical “representation” side. ūüėČ

4 Comments

Filed under REST

REST and HTTP: A Visceral Guide to Status Codes

I was in an argument … er … discussion here, and we couldn’t agree on the usage of a couple of the 40x HTTP error codes. So I spun out a quick email putting a more¬†colloquial spin on the codes in question. It did help move the conversation along.

One of the¬†tenets¬†of REST is to use a uniform interface. So if you are putting a REST service on top of HTTP, it’s really important that you *use* HTTP, and use it *correctly*. So the HTTP status codes become very important, and you shouldn’t neglect them like I did in the first 20 years of my programming career.

So here’s a visceral guide to the HTTP status codes, also known as The Dude’s Guide to HTTP status codes. Or Dudette’s.

I marked the ones you mostly have to worry about with a “(!)”.

10x The spec defines codes in this range, and then tells you not to use them. Thanks.

The 200 codes are used to make the client feel warm, safe and happy. Only use these codes if the operation succeeded. Or if it’s your boss and reviews are coming up.

(!) 200 OK Like, yeah!

(!)201 Created Yeah I made it for you.

202 Accepted Yeah, Riiiiiiiiiiiiight. You bet. Mmm-hmm.

-or- 202 Accepted No problem, I’m on it.

204 No Content (silence is consent)

-or- 204 No Content Yeah, OK, here it is but there’s nothing in it.

The 300 codes start to get into scary territory, not because anything is¬†specifically¬†wrong, but because it’s not immediately clear it’s all right. Also, the 300 codes sometimes mean more work for the client, which could upset them because of all the work they’ve done to get here already.

300 Multiple Choices Well, there are several places you can find that. Let me make you a list.

(!)¬†301 Moved Permanently Nope. Try over here. And don’t come back. Ever.

302 Moved Temporarily Yeah, look over there. But it might come back, so check here later.

302 er… BTW don’t use 302. This is good for web pages and overly-complicated protocols, but for “resources”, if it’s moved, it’s probably gone for good.

304 Not Modified Yeah, no changes. Keep using the one you have.

400 codes are frought with peril, because now you’re just telling the client they are wrong, or have messed up. That’s always scary. But be brave, because these codes mean someone else has failed.

400 Bad Request What? I mean… really, what?

(!) 401 Unauthorized Nope, not until I know who you are.

-or-¬†401 Unauthorized I don’t see you on the list.

402 What Happened to 402? poor 402

(!)¬†403 Forbidden No way, not now, not ever. And I don’t care who your daddy is.

(!)¬†404 Not Found What? There’s nothing like that here.

Now the territory gets treacherous. The 500 codes are areas where we start admitting fault. And you know what admitting fault leads to — lawsuits.

(!)¬†500 Internal Server Error Wow. That really didn’t work.

501 Not Implemented Um, let me see. No, we haven’t done that one yet.

502 Bad Gateway I’d like to have an answer for you, but you know what? I just don’t. Maybe the guy down the line will have an answer next time you ask.

503 Service Unavailable Wow, we are super-duper busy right now. Mind checking back later? Well, check back later anyway.

Leave a comment

Filed under REST

Representation … Resource … and …?

OK I’m trying to get my brain wrapped around the terminology in REST. I think the important thing to bear in mind that REST is a client-server, client-pull-message¬†(or client-message-push) architecture. So it’s about how to identify and move information initiated by a client.

The main things that REST boils down to are:

  • Identifier: some arbitrary string which points at an instance of data from an abstract class of data
  • Representation: just some way to bundle up information for transmission
  • Resource: a conceptual idea of some information you want to either get, or put in place
  • Static resource: a resource that is backed by some data that’s going to stick around for a while, so you expect to get the same information back for multiple retrievals over time, more or less
  • Dynamic resource: a resource that is more ephemeral, like a calculation done on the fly, or an aggregation of other data that might be changing rapidly
  • …? : behind the resource, there will be some data storage, or processing, that results in a bundle of data we can call a resource and roll up into a representation and point to with an identifier

I see the last bit called an “entity” frequently, but really the REST architecture definitions, including Roy Fielding’s, mostly stop at the resource ¬†level and leave the rest to our imagination.

Personally, my main stumbling block is at the use of the term “resource”. I think of that word as something static, or even worse, a specific instance that I can put my hands on. So when I say, “a resource”, I usually think “the blue mouse in the top drawer of my desk.” In common usage, that blue mouse is a resource. It’s something I can use. But in the REST world, the Resource is “computer mouse”, and the Identifier¬†adds “/desk/drawer?which=top”. So what I normally think of “a resource” in common usage is really a Resource + Identifier in REST parlance.

In the OO world, an abstract class is a Resource, and an instance is Resource + Identifier. More or less.

I think one of the main reasons for all the blurriness is that what we call REST is really a collision of several worlds:

1) The REST architecture itself, as laid out by Roy Fielding. Which is really more about ways to string together client-server systems in a uniform way, granting interoperability and scalability at the expense of efficiency.

2) The HTTP protocol, which is the primary protocol people use to implement REST. It’s not at all correct to say “REST is HTTP”, but if you are doing the REST thing, and putting it on top of HTTP, then REST commands you, “Use HTTP strictly! Don’t use your own personalized variation of HTTP!”

A personalized variation of HTTP is essentially what 99% of the industry uses today.

After all, despite it’s¬†pretensions, the software industry enjoys freedom from the tyranny of rational thought.

3) The ROA, or resource-oriented architecture crowd, which tends to take the basic terminology of REST and wrap it around a design based heavily on “static resources” or “nouns”. I say noun-based and not object-based, because the strict ROA guys require hyperlinks, but don’t allow intermixed data and methods like you’d see in OO.

4) All the personal preferences, biases, superstitions, agendas and personality disorders of everyone who’s involved with developing software for the Internets. Which, taken together, somehow fails to oblitherate points 1-3. Most of the time.

Based on the conversations I’ve seen where I work, the primary religious conflict is between the static resource and dynamic resource crowd. Let’s face it, the Internets were built not just on a series of tubes, but on the GET/POST verbs. Chopping RPC out of REST flies in the face of what Fielding was trying to do in his paper — to capture what made the Internets work.

For example, some guys in my shop who were brave enough to dive into the Flex realm have been horrified to discover that Flex doesn’t support even the basic set of functionality required to implement a static-resource-over-strict-HTTP design. It basically just supports the GET/POST model, and no more. For some reason, the Flash guys thought that was all they needed.

All that said, I fall heavily on the “noun” side of that religious war, if only because it’s more challenging and fun.

So the upshot is, from my readings, I think that REST is a very open-ended idea that almost every web developer has been using for the past 10 years anyway, just because that’s the way browsers and web servers work. Now we’re getting strict about the HTTP protocol — which is a really good thing — and raising the awareness of the power and simplicity of noun-based design. Which is also a good thing.

But I’m going to try to keep the terminology straight, because I think the core ideas of REST are worth keeping in mind, and it’s important to keep the terminology clear in discussion.

Not that I’m good at that, especially where it doesn’t help my argument.

2 Comments

Filed under computer architecture, REST

The HTTP ETag header and optimistic locking in REST

Optimistic locking is one of those really powerful techniques that is often overlooked, even though it can make life a lot simpler. It doesn’t block. It can span multiple systems, even ones that don’t have good old-fashioned transactions. In the world of REST architectures, optimistic locking is particularly powerful.

The HTTP protocol, which is the primary protocol for implementing REST architectures, provides nice mechanisms that can support optimistic locking with some special headers.

REST

REST stands for REpresational State Transfer, which is an architecture style centered around resources instead of commands. You identify resources in your problem space, tie each to a URI or Universal Resource Identifier, and then build web services around those representations and URIs. A representation is just a very simple rendering of the object, usually as XML or JSON. And URIs are familiar to us all as URLs, or the locater strings we type in at the top of our browsers.

A resource can be many things: a file on disk, a record in memory, a row in a database table, or even the entire table. A resource might be a log file, that we can append to, but we’ll never pull back the whole thing. Generally, it’s OK to think of a resource as a “row” or “object”, since that’s the most common case.

Optimistic Locking

Optimistic locking is a technique for managing concurrent access to a resource. Pessimistic locking is the usual kind, and means you’re wrapping transactions and locks around your operations. It’s pessimistic, because you assume there will be contention for the resource while you work with it. In optimistic locking, you assume there won’t be contention, but the scheme will tell you if there is.

The mechanism in optimistic locking is simple. For each instance of a resource, whether its a row in a database or a file or whatever, I keep a version number. When I get that resource from the system, I also get the version number. Note that I don’t “check out” the resource, or lock it, or block anyone else from grabbing it. Then I modify the object. When I save it, I send back both the representation and the version number I got originally. The system is responsible for checking to make sure the version number is still the same. If it is, I win — the system saves my data. Then it adds one to the version number.

However, if someone else updates the instance of that resource in the meantime, the version will differ from the one I have. So when I go to save the it, the system can tell I have an out-of-date version of the object, report the error, and I’ll have to start over.

A collision is not the end of the world — usually it means notifying the user that particular resource was updated in the meantime, get the fresh data, and ask if they still want to make the change. Usually you want to reserve optimistic locking for resources where the chance of a collision is really low — just because it’s a hassle repeating the update cycle. So optimistic locking isn’t well suited for resources where there’s a lot of contention, or where starting over is very difficult.

Even with that caveat, in the REST world, optimistic locking can work really well because it’s low-overhead, and you don’t face the problem of distributed transactions when you access a variety of resources.

HTTP Help

The HTTP headers which are useful for optimistic locking are spelled out in the HTTP 1.1 specification.

When you recieve a copy of a resource, you can write the server to provide:

  • ETag header : ¬†the current value of the “entity tag”. You can supply a resource version here for optimistic locking.

When you send an HTTP request, you can include:

  • If-Match header : means only perform the operation if the entity tag value matches the resource’s current value. That is, only if the version hasn’t changed in the meantime.

Note that HTTP also specifies headers for “Last-modified-time” and “If-unmodified-since” that can be used the same way. However, computer operations are so fast now that the last modified time might only be accurate to to millisecond, or even worse to the second, so many operations might have happened in that time. It’s much, much safer to use a version numer, so you know exactly what version of the resource you have.

In order to write your service to provide an optimistic locking version in the ETag header, you first have to track the version number of the resource. Usually, this means adding another column to the database to hold this value. Then the client/server interaction goes like this:

  • Client: GET the resource
  • Server: return the representation of the resource, and the ETag header with the current version number for optimistic locking.
  • Client: PUT or POST the modified resource, and the If-Match header with the same version number.
  • Server: Check the database to see if the resource they want to change has that version number.
    • If so, save the resource.
    • If not, return a 412 “Precondition Failed” response, letting the client know that it couldn’t perform the update, and the client will have to start over.

If possible, the check and save should be atomic. In the database world, you can accomplish that by taking advantage of the atomic nature of UPDATE:

  • UPDATE … WHERE id = the_id AND current_version = etag_value

If the row count is 0, it means that the identified resource is not at the right version.

Or you can always just wrap the thing in a transaction, and do a SELECT and UPDATE in the same transaction context, locking the record in the meantime. But note that the lock only lasts as long as the update, not as long as the entire GET / PUT / SELECT / UPDATE cycle.

Upside

Some of the benefits of optimistic locking are:

  • If you save a resource and don’t hit a concurrency error, then you can be sure that the version of the resource you modified is exactly the same one you started with. No one else slipped in any changes while you were working.
  • In a very busy system, optimistic locking can provide a huge performance boost, because it allows you to process a lot of operations free of the overhead of long transactions. Row locking should be very brief, because sometimes databases will lock more than just the row you’re after. Some databases will lock on a SELECT statement by default. So the less time you spend locking a row to do an update, the better.
  • Some systems don’t support a locking mechanism. For example, if your resource is a file on disk, like a JPG image or a text file. There’s no way to lock those resources, unless you go to the trouble to put them into some sort of system, like a version control system, to provide check-out and locking.

Downside

The warning is:

  • If the resource is hotly contested, like records in an online booking system, then the chance of a concurrency collision is much higher, and you’d probably be better off using a regular relational database, at least to save the hotly contested resources.
  • If the cost of repeating the modification is high, you might rethink your design. For example, if modifiying the resource requires the user to do a lot of typing, then you might think about a different scheme.

Non-Blocking

The basic idea behind locking and commit strategies is how to hold on to temporary data. There is the old version of the data, and there is the new version that we want to save. In common relational databases, the database takes care of setting aside a pristine copy of the row you’re trying to update, updating the row, and waiting for the “commit” to happen. If it gets a “rollback” instead, then it restores the pristine version of the row. It also maintains a lock to keep anyone else from coming in and getting at the row in the meantime.

In optimistic locking, the client is taking responsibility for holding on to the temporary data, instead of the server. If the client needs to roll back, it can hold on to the pristine copy of the record and its version number, and save that back to the service in order to restore it to where it was before.

So…

So really, optimistic locking as a non-blocking concurrency stragegy is all about who holds on to the pristine and the modified copy of the data until the operation is complete.

If you don’t expect contention for the resource or resources you want to update, then using optimistic locking can provide a good speed boost and simplify things greatly.

6 Comments

Filed under REST