Tag Archives: etag

The HTTP ETag header and optimistic locking in REST

Optimistic locking is one of those really powerful techniques that is often overlooked, even though it can make life a lot simpler. It doesn’t block. It can span multiple systems, even ones that don’t have good old-fashioned transactions. In the world of REST architectures, optimistic locking is particularly powerful.

The HTTP protocol, which is the primary protocol for implementing REST architectures, provides nice mechanisms that can support optimistic locking with some special headers.

REST

REST stands for REpresational State Transfer, which is an architecture style centered around resources instead of commands. You identify resources in your problem space, tie each to a URI or Universal Resource Identifier, and then build web services around those representations and URIs. A representation is just a very simple rendering of the object, usually as XML or JSON. And URIs are familiar to us all as URLs, or the locater strings we type in at the top of our browsers.

A resource can be many things: a file on disk, a record in memory, a row in a database table, or even the entire table. A resource might be a log file, that we can append to, but we’ll never pull back the whole thing. Generally, it’s OK to think of a resource as a “row” or “object”, since that’s the most common case.

Optimistic Locking

Optimistic locking is a technique for managing concurrent access to a resource. Pessimistic locking is the usual kind, and means you’re wrapping transactions and locks around your operations. It’s pessimistic, because you assume there will be contention for the resource while you work with it. In optimistic locking, you assume there won’t be contention, but the scheme will tell you if there is.

The mechanism in optimistic locking is simple. For each instance of a resource, whether its a row in a database or a file or whatever, I keep a version number. When I get that resource from the system, I also get the version number. Note that I don’t “check out” the resource, or lock it, or block anyone else from grabbing it. Then I modify the object. When I save it, I send back both the representation and the version number I got originally. The system is responsible for checking to make sure the version number is still the same. If it is, I win — the system saves my data. Then it adds one to the version number.

However, if someone else updates the instance of that resource in the meantime, the version will differ from the one I have. So when I go to save the it, the system can tell I have an out-of-date version of the object, report the error, and I’ll have to start over.

A collision is not the end of the world — usually it means notifying the user that particular resource was updated in the meantime, get the fresh data, and ask if they still want to make the change. Usually you want to reserve optimistic locking for resources where the chance of a collision is really low — just because it’s a hassle repeating the update cycle. So optimistic locking isn’t well suited for resources where there’s a lot of contention, or where starting over is very difficult.

Even with that caveat, in the REST world, optimistic locking can work really well because it’s low-overhead, and you don’t face the problem of distributed transactions when you access a variety of resources.

HTTP Help

The HTTP headers which are useful for optimistic locking are spelled out in the HTTP 1.1 specification.

When you recieve a copy of a resource, you can write the server to provide:

  • ETag header : ┬áthe current value of the “entity tag”. You can supply a resource version here for optimistic locking.

When you send an HTTP request, you can include:

  • If-Match header : means only perform the operation if the entity tag value matches the resource’s current value. That is, only if the version hasn’t changed in the meantime.

Note that HTTP also specifies headers for “Last-modified-time” and “If-unmodified-since” that can be used the same way. However, computer operations are so fast now that the last modified time might only be accurate to to millisecond, or even worse to the second, so many operations might have happened in that time. It’s much, much safer to use a version numer, so you know exactly what version of the resource you have.

In order to write your service to provide an optimistic locking version in the ETag header, you first have to track the version number of the resource. Usually, this means adding another column to the database to hold this value. Then the client/server interaction goes like this:

  • Client: GET the resource
  • Server: return the representation of the resource, and the ETag header with the current version number for optimistic locking.
  • Client: PUT or POST the modified resource, and the If-Match header with the same version number.
  • Server: Check the database to see if the resource they want to change has that version number.
    • If so, save the resource.
    • If not, return a 412 “Precondition Failed” response, letting the client know that it couldn’t perform the update, and the client will have to start over.

If possible, the check and save should be atomic. In the database world, you can accomplish that by taking advantage of the atomic nature of UPDATE:

  • UPDATE … WHERE id = the_id AND current_version = etag_value

If the row count is 0, it means that the identified resource is not at the right version.

Or you can always just wrap the thing in a transaction, and do a SELECT and UPDATE in the same transaction context, locking the record in the meantime. But note that the lock only lasts as long as the update, not as long as the entire GET / PUT / SELECT / UPDATE cycle.

Upside

Some of the benefits of optimistic locking are:

  • If you save a resource and don’t hit a concurrency error, then you can be sure that the version of the resource you modified is exactly the same one you started with. No one else slipped in any changes while you were working.
  • In a very busy system, optimistic locking can provide a huge performance boost, because it allows you to process a lot of operations free of the overhead of long transactions. Row locking should be very brief, because sometimes databases will lock more than just the row you’re after. Some databases will lock on a SELECT statement by default. So the less time you spend locking a row to do an update, the better.
  • Some systems don’t support a locking mechanism. For example, if your resource is a file on disk, like a JPG image or a text file. There’s no way to lock those resources, unless you go to the trouble to put them into some sort of system, like a version control system, to provide check-out and locking.

Downside

The warning is:

  • If the resource is hotly contested, like records in an online booking system, then the chance of a concurrency collision is much higher, and you’d probably be better off using a regular relational database, at least to save the hotly contested resources.
  • If the cost of repeating the modification is high, you might rethink your design. For example, if modifiying the resource requires the user to do a lot of typing, then you might think about a different scheme.

Non-Blocking

The basic idea behind locking and commit strategies is how to hold on to temporary data. There is the old version of the data, and there is the new version that we want to save. In common relational databases, the database takes care of setting aside a pristine copy of the row you’re trying to update, updating the row, and waiting for the “commit” to happen. If it gets a “rollback” instead, then it restores the pristine version of the row. It also maintains a lock to keep anyone else from coming in and getting at the row in the meantime.

In optimistic locking, the client is taking responsibility for holding on to the temporary data, instead of the server. If the client needs to roll back, it can hold on to the pristine copy of the record and its version number, and save that back to the service in order to restore it to where it was before.

So…

So really, optimistic locking as a non-blocking concurrency stragegy is all about who holds on to the pristine and the modified copy of the data until the operation is complete.

If you don’t expect contention for the resource or resources you want to update, then using optimistic locking can provide a good speed boost and simplify things greatly.

6 Comments

Filed under REST