Category Archives: computer architecture

Anti-pattern: Framework Superclasses

Michael Feathers called out the anti-pattern of frameworks that require extension of classes in the framework. Hooray!

http://michaelfeathers.typepad.com/michael_feathers_blog/2013/01/the-framework-superclass-anti-pattern.html

Thank you for saying this out loud! Frameworks based on class extension are extremely appealing at first, particularly to the control freaks who want tight coupling in their applications (unbelievably I work with quite a few of those). Over time the loss of design flexibility and moving control from the “application” or the “problem domain” over to the framework leads to a lot of stale code that is hard to test and extend. In particular, I’ve seen systems with useful business logic that can’t be reused because it’s locked into framework hierarchies.

This is one of those anti-patterns that once you see it, it’s hard to unsee it.

Essentially, when you’re doing class design, object hierarchies should follow the problem domain. So in an accounting model, you should have classes like “Account”, “Creditor”, “Ledger” and whatnot. When building an application, you end up modelling the application space, which is not as clean, but hard to get away from. So you end up modelling artifacts like “…Handler”, “…Request”, “…Response”, “…Manager”.

A lot of engineers I work with see the problem domain as dumb data objects you pass to Handler or Manager classes, rather than modelling them as functional classes directly. But oh well.

What the superclass anti-pattern calls out is that the framework is taking control of the modelling effort and forcing application artifacts into its own hierarchy. That really moves the modelling effort away from the problem domain and locks it into the framework’s domain. A lot of engineers don’t see why that’s bad, because they just want the framework to tell them what to do. But the system as a whole pays the price for it.

That’s why the surge of POJO-oriented technologies have become so popular. It’s the recognition that I should be able to take my class, as-in, in my own problem domain and hand it to a framework for handling. There’s growing recognition of the power that provides.

Leave a Comment

Filed under computer architecture, design patterns, opinionizing

Fewer Data Scientists, More Big Data?

Interesting article breaking out some aspects of Big Data

http://gigaom.com/2012/12/22/we-dont-need-more-data-scientists-just-simpler-ways-to-use-big-data/

He talks about data architecture, machine learning, and analytics as the gateways to a company harnessing big data without a big team of number crunchers and Hadoop experts.

While we’re at it, here’s an article from a guy who transformed himself into a data scientist:

Leave a Comment

Filed under bigdata, computer architecture

REST and Shared State Between Client and Server

Every now and then the question comes up where to draw the line on stateless communications as mandated in RESTful services. I remember one conversation at another job where a group said they couldn’t do such-and-such because it required shared state between the client and server. And that was that.

At the time, I knew that wasn’t right, but I’ve been wrestling with exactly why. Then last night it clicked. We share state between client and server all the time. Shopping carts, authentication and authorization status, what page a user is on, a step that a user just took and now we’re verifying, etc. There are tons of examples of where the client and server have to share state.

But it is bad when *A* client shares state with *A* server. If server #432 in your server farm blanks out, and you lose the state for a client, then, yes, you have server affinity. Or stickyness. And that is a bad thing.

But if your client comes to your server bank, and you share the client state either through some clustering, synchronization, or pushing the state into a distributed datastore or cache, then you’re OK. The acid test is, can I walk over to my server farm and pull the plug on a machine, or even a rack, and not lose track of what the clients are doing? If so, you’re OK.

That allows both reliability and scalability, because if a server goes away, there are lots others to take up the load. And since any one server isn’t remembering anything particular, then you can just add more servers and they’ll just join the mix.

So the pushback from that group was off-target. The problem isn’t with shared state in general — that’s the basis of REST. Rather, the prohibition is against *A* particular client sharing state with *A* particular server.

I just wanted to clear the air on that. Thank you for your time.

5 Comments

Filed under computer architecture, computer scaling, open standards, opinionizing, REST

Article: REST vs. SOAP

This article has a ton of information. I’m going to be moving into a shop that is SOAP-heavy but REST-curious. So I need to be conversant on the differences, benefits, and costs between the two.

This goes way deep into the background, so I’m going to be drawing heavily on it.

http://www.prescod.net/rest/rest_vs_soap_overview

1 Comment

Filed under REST, computer architecture, open standards

JSON Hypermedia

I’m sure I read this before, but I want to make a note of this page. I think it’s going to be important soon.

http://www.amundsen.com/blog/archives/1054

Leave a Comment

Filed under computer architecture, open standards

Scaling?

lol I posted this to a message board talking about which language is better for scaling an application, PHP or Java? I posted this reply, and wanted to copy it here before the message board and time washed my wisdom away.

Whoops! No one reads this blog, either …so mission failed. Oh, well. ;)

Ahem.

It has always stuck me that scalability has several dimensions, and we tend to concentrate on the “scaling to load” dimension. But there’s also the “scaling to complexity” dimension. As a system matures, it will gain more features and complexity, and my feeling is that Java has an edge there, because of features like “programming to interfaces” and type-safety: the compiler will enforce code contracts and help you manage the system as it scales in complexity.

Another dimension that @Desmond mentions is scaling the size of your team — my feeling is also that it’s easier to have a larger team working in a statically-typed language because of the compiler guarantees. …That said, you can write bad code in any language.

On the “scaling to load” question, as long as you’re adhering to good architectural principles like avoiding server affinity and allowing replication and horizontal scaling, the question of which language is more neutral. Then it’s just a matter of how many dollars does it take to build out hardware, or buy software licenses for supporting infrastructure, and I admit I don’t have a good feel for the numbers on that.

My experience is that scaling your persistence layer for load is a much harder problem than scaling your application layer.

Leave a Comment

Filed under computer architecture, opinionizing

Link: Rollout Degrade Metrics and Capacity

This is a pretty fascinating article. While very high level, it talks about things I’ve been interested in (on a much smaller scale) for a while.

http://www.bigfastblog.com/rollout-degrade-metrics-and-capacity

I’ve hand-coded some solutions that do these sorts of things, but not to this scale or completeness. The article surprised me with the nature of some of the technologies they were using; for example, I’d always thought of Ganglia as a monitoring solution, not for metrics gathering.

Leave a Comment

Filed under computer architecture

Scaling MySql (link)

These days, it’s all about scaling. And the toughest part of the system to scale (correct me if I’m wrong) is persistence. Traditional databases give nice reliability and consistency guarantees, but at the cost of horizontal scaling. It’s possible to replicate in order to spread the data out and provide failover, but I’ve actually seen few shops do that in a way that doesn’t require manual intervention when something bad happens.

So here’s an interesting article about how Facebook scales out MySql. Something I need to read and digest.

http://gigaom.com/cloud/facebook-shares-some-secrets-on-making-mysql-scale

Leave a Comment

Filed under computer architecture, computer scaling

What Rough Beast?

Does anyone else have the same impression that I do, that the entire computing industry is just slouching and degrading?

I have this sense that even five years ago, I was able to have interesting conversations about abstract topics like patterns and OO design. I felt like I was scoring points for just being interested in those topics. And, as a team, we could look at patterns or design, and it shaped the work we did.

Today, I feel like there’s a growing contempt for a deeper understanding for what we do. It’s like the goal these days is just to have the loudest voice.

Here’s an example. The REST movement. REST has a lot of very interesting ideas that can influence REST-ful and non-REST-ful design. But even though Fielding wrote his dissertation twelve years ago, industry is still at the point of chanting “HATEOUS!” and then ignoring hypermedia and getting into deep arguments about URL design.

When I asked about REST here, on of my co-workers bragged, “We even got to the level of whether to use plural or singular in our URLs!” I could only blink mutely. The whole point of REST is that hypermedia makes the URLs opaque, so that detail is insignificant — instead of plurals, you could just as well pass a UUID as the path.

Come to think of it, in my initial interview, they asked me a question about REST, and I drew a deep breath and went into the reasons why you would differentiate identifiers in the query portion of the URL vs. identifiers in the path. And then I used that to explain the basis for my answer. My current team lead (blarg!) sneered and said, “OK as long as you don’t let that slow you down when you’re working” or something to that effect.

The last couple of years I watched the department I was working for implode, while the Technical Leaders of that shop set themselves up as industry experts. I jumped ship, and now here I am. Whenever I try to find a deeper reason for justifying a design, the team just rudely and dismissively cuts me off. These days it seems like the race is to master dynamically-typed scripting languages, intricacies of this or that database, some Apache library, or else to put “closures” on your resume. As soon as you hit Factory and Singleton, it’s time to stop — that’s enough theory, thank you.

(lol the code I work with now is just peppered with Singleton Factories. Their chests must have swelled with pride when they put that one together. It makes the code almost impossible to unit test — there are some serious shortcomings to the Singleton pattern — but never mind. I tried to explain to a co-worker some of the simple patterns you can use to break open Singleton and make it unit-testable. His reaction was like I was proposing we install gravity absorption pods or hyperspace warp beams.)

For a long time, I’ve noted a difference between people who master APIs vs. people who master principles. I’m more of a principle guy myself — memorizing minute details about APIs has always been really boring for me. Some people take great pride in their ability to recite all kinds of background about the APIs they’ve mastered. The industry really needs both types of specialization. But for the last few years it seems like all the technical discussions I’ve been in have been bantering over the trivia surrounding different APIs. People just seem to treat the principles that drive those APIs as fussy distractions.

Maybe I’ve just had the bad luck to plant myself in shops with a heavy focus on maintenance and IT, and I need to get out and talk to some people doing original development. I don’t know. It just feels like I have to work harder to make simple points about design lately, because no one really wants to know how their shiny new technical toys work.

And they want to put the word “closures” on their resumes.

EDIT: Something a co-worker said comes to mind. He was noting that everyone these days wants to be an architect. Dev managers want to be architects. Program managers want to be architects. Even architects want to be architects all of a sudden, instead of being the impressive-looking guy who attends meetings but doesn’t contribute anything useful. However, when they talk about architecture, they seem to focus on infrastructure architecture — how many boxes, where they’re installed, what OS they run, physical vs. virtual, what network appliances to buy, even how to lay out the cages in the Data Center. The practice of software architecture is an annoying distraction to installing physical infrastructure. Maybe that’s an exaggeration ;) but maybe that’s part of the feeling I’m getting.

Leave a Comment

Filed under computer architecture, opinionizing

More Service Feature Checklist

After a long hiatus, I’ve added some more items to the service feature checklist. A couple of them are still very blurry, so I might pin them down better later.

Monitoring

I added some items around monitoring. It starts with the basic “alive” monitor that just acts like a hello world link. Note that the version link serves nicely for this. Then the monitoring can get deeper into internal processes, to make sure all the necessary queues, threads, and such are up and running correctly.

The next one might bear further discussion. The idea is to provide monitors for remote systems our system is dependent on. For services that we own ourselves, we only need to provide a connection check. The idea is, we should be setting up the same sort of monitoring for all our services, so if one starts misbehaving, its monitors should speak up. If a queue dies or somesuch, I don’t need my entire infrastructure to light up and start flashing alarms.

Now, if I have a dependency on a remote system that I don’t own and don’t monitor, then it might make sense to have a little deeper check. But hey, the checklist items are prescriptions, not proscriptions.

A really good service should be able to report on error conditions that it’s seeing. You shouldn’t have to scrape a log file to find out if something’s going wrong. The error reports could be a part of monitoring. Or they could be for human consumption, as a part of the troubleshooting process. Keeping some sort of rolling log in memory would help reveal some history of the problems.

Consider enabling your system to push metrics and alarms to a remote system. That way, in a huge bank of servers, you don’t have to do brutal configuration of monitoring for hundreds of machines.

Metrics

Whether the numbers are gathered inside the application, or dumped to an external system (like a database) and crunched, you really need a way to see the performance of the system over time. So a service should report numbers about what it’s doing. The two basic numbers it should provide are counts and timings. Counts are just the number of times something happened: e.g. how many POST calls we made to a remote system. Timings are how long it took to do it.

For timing, it used to be that Milliseconds were just fine. But in the modern world, thinks happen much faster. For example, we had a hash-based authorization system that I timed to run at 250-400 *micro*seconds.

Also, watch out for CPUs. If the CPUs on your box have different times (and that can happen), then you can actually get negative elapsed times if your process jumps from one CPU to another.

I found another measure to be really useful: a “count per second” measure. For extremely fast transactions, it can be useful to count the number of transactions that happened in the past second, and then add that number to your metrics store. It’s interesting to see the “count per second” numbers jump up or down.

While most metrics can be aggregated outside the service, aggregating some critical metrics right in memory and making them available for viewing is a really handy thing to do. So even if you’re going to dump your numbers into a datawarehouse and crunch them there, you can get some really important statistics by tallying the numbers right in memory and have them current to within microseconds.

LifeCycle Management

Over time, your service is going to pick up a lot of mechanisms. Some of these mechanisms need to start up in an orderly way, and need to be closed down in an orderly way. For example, execution thread pools need to have a time to do a graceful shutdown so that items still in their queue remain in an acceptable state.

A more subtle aspect of this is remote systems that know about your service. For example, your service might tell your monitoring service that it’s going down for maintenance, and save everyone some trouble an grief. Especially as your number of services increase.

Async and Caching

I’m surprised how often the need for these items comes up in my work, and it seems like when the need for asynchronous processing, or data caching comes up, it almost instantly sparks a whole series of religious wars. The NoSql guys and the Sql guys go at it. The Stateful vs. Stateless war rises in flames. And above the din of battle, the “We don’t need that!” crowd can always be heard.

Come on. A mature system needs asynchronous processing. And a mature system needs a way to maintain lightweight state, without jamming it into a full-blown relational database. So figure it out early, and then use it a lot. My experience is that once a shop makes that leap, then suddenly things become a lot quieter. Tough issues can just be handled matter-of-fact, because the service already has the infrastructure it needs.

Note both asynchronous processing and distributed caching do add overhead. You have to write code for it, monitor it, and build your data center it. Nothing is free. But once it’s in place, then you can use it for all kinds of things.

That’s it for now. I might come back and refine the async and caching bits.

1 Comment

Filed under computer architecture, opinionizing