Advanced Distributed Systems Design with SOA, part 1 of N

May 11, 2014

A couple of weeks ago now, I had the opportunity to partake in Udi Dahan's excellent ADSDS course in London. For my own sake (to better absorb the course through repetition), as well as for you (to, perhaps, learn something new) and, at the same time, toss some credits Udi's way, what follows is a collection of notes that I took during the course.

No, I won't share the slides. If you find the notes interesting, book the course through Skills Matter, or through Particular Software. It's well worth your time ;-)

Systems are not Applications

Historically, we haven't had to deal with the term connectivity, really. This is not something that is baked into the mainstream programming languages. Similarly, pattern books often doesn't deal with connectivity, although almost everything we create today, is connected. In the mid/late 90's, the fallacies of distributed computing was written down by Peter Deutch and James Gosling (the father of Java). Since then, Ted Neward has added an additional three fallacies - named the fallacies of enterprise computing - to this collection:

The system is atomic / monolithic
The system is finished
Business logic can and should be centralized

Fallacy 1: The Network is Reliable

Why does our distributed logic sometimes fail? Part of the reason for failures when writing distributed systems, is that it's very easy to write fragile code:

var service = new ServiceProxy();
var result = service.Process(data);

It's so easy to call a remote object! And when the sun's shining and things are well, it just works. However, what happens when it doesn't work? What about timeouts? What do we do with them? Well, first, we need to figure out when the timeout happened: On our call to Process, or on the service's response with our data? Does the server have the data we posted? How do we handle these scenarios? Should we retry the call? What if the server received the data; will the request then be processed twice? What about other servers down the line, which were called as a result of our call to Process? These are all very difficult questions ... so we often decide it's best not to bring them up at all.

Reliable messaging structures include Azure Services Bus, ActiveMQ and RabbitMQ. Be wary of ZeroMQ, however, which does not guarantee delivery. You could build a reliability story on top of it, but it's not an complete product in itself as compared to the other solutions available.

The roll your own scenario (as a means to compensate for the fact that the network isn't reliable), is often created organically as someone in your team decides to "add some reliability" late during integration testing (or worse, when you are putting the system into production). Then, you add fixes to these additions and, before you know it, you have created your own messaging infrastructure that you need to maintain.

This messaging model is the hard part - we have to change the style of development to adhere to asynchronicity, rather than just try sprinkle fixes onto our RPC-style code. Similarly to adjust to drive in the UK as compared to the rest of the world, this is not a smooth transition.

Next: Latency isn't a problem

Search This Blog

Development Experience(s)