[Note: since writing this, my understanding of the “real” REST terminology is getting better, so I’m not sure a lot of this discussion is on mark. I would delete the post altogether and start over, except that the comments are actually better than the post. So I appreciate the comments, and I’ll let the post stand, to my shame to honor those comments …just heads-up.]
We’re in the process of creating a big data aggregator, which is going to be a massively distributed and parallelized data dump, where all our business units can just dump their data, and then that data will be available for them to run whatever map-reduce jobs they want. The problem I’m thinking about now is how to make that work, when the form of that data will vary from source to source, and even for a given source, it will vary over time.
Also, as our shop starts to put together all these disparate REST systems, every team is putting their own spin on the problems of URI definition and resource representation. It would be cool to pretend that we could just draft a standard, and it would be clear and perfect, and everyone would code to it strictly. But whoops — I work in the software industry, not in Nirvana. So that’s not going to happen.
Or to use another metaphor which is appropriately vaguely insulting, in this zoo, I work in the monkey house. At least, not anywhere that reason dominates.
So I’m starting to think about representation. We have transformation technology lying around that will help us get there, at the expense of CPU cycles, but it means laying out all the resources that span our business space in a sort of directed graph, and filling in the links with transformations. Do-able, but people internally are going to whine a lot. Which will be a side benefit.
So I started thinking about REST — representational state transfer. In kicking ideas around, I’m wondering whether it’s correct to call REST resource-based, when really it’s representation-based.
If I come to a REST service, it’s because I want to GET or PUT some data (ignoring the bad-boy POST and the troublesome DELETE for now). And I’m going to GET or PUT the data I’m interested in a particular representation.
That data will probably map to some underlying resource — but it might not. The data might map to something ephemeral, like a search result. A search result isn’t really a resource in the system, it’s the *representation* of some *work* that I just did. I *might* create a key for that search, shove it in a data store, and treat it as persistent, but odds are I won’t.
A resource might be mapped with multiple URLs, and it might have multiple representations. A resource might be included in the representation of another related resource.
Even if I’m grabbing data from another source — let’s say a table of user accounts — the service is going to decide how deep to drill to add related records to the representation. My representation might be a very deep rendering of the data, or it might be shallow, with links that tell me where to get the rest of it.
It might come back as XML with a strict XSD schema, or it might just come back as a JSON bundle and you take-what-you-get. It might be a raw binary stream.
There’s a many-to-many relationship between representations and resources — so why are we equating them?
The point is, I think that calling a REST service resource-based is misleading. In nice, simple systems serving up static resources like JPG images or HTML files, sure the representation maps directly and easily to the actual data in the data store. But if you step away from that and spend any CPU cycles at all “preparing” the data — even rendering columns in a table into JSON — then your interface is not resource-based anymore, it’s representation-based.
I haven’t thought this all the way through — obviously — but I’m having the nagging feeling that this is related to my problem with the big data dump, and all these disparate REST systems we’re building. I’m having the feeling that asking or even wishing for uniform representations is wrong-headed. That we should be thinking of these systems as a federation of *representation* services, not a federation of resource services.
Right now, I have the feeling that making that fine distinction will be liberating and make things easier, but I have to noodle on it some more.