Should be aware of possible confusion with http://en.wikipedia.org/wiki/Pastry_(...

paddyforan · on Oct 20, 2012

Confusion is intentional. This is just an implementation of that algorithm, with a few tweaks (that are explained, along with their reasoning, in the README). I believe I even link to that Wikipedia page in the blog post and the README.

stavros · on Oct 20, 2012

I didn't see anything in either the Wikipedia page or the code, but how do you achieve redundancy? Wikipedia says nodes can die with minimal or no loss of data, but I see everywhere that each message only goes to one node. Which is the case, and how is this achieved?

paddyforan · on Oct 20, 2012

They both are the case. Well, sort of. The messages can go to any number of nodes as part of the routing, but they're only considered "delivered" at one node. However, nodes can die with minimal loss of data.

The problem is you're confusing the messages for data. As far as Pastry is concerned, your messages are not data that it should retain. They are one-off communications. The only data that Pastry is concerned with storing is the state of the cluster; that is to say, Pastry only considers "data" the information about enough nodes to reliably be able to route a message to any node in the cluster. Your messages, by default, are lost the moment they are sent, forwarded, or delivered. The node stops caring about them at that point.

However, on top of this framework you can build applications that do care about the data in the messages. Before a message is forwarded, it fires off a callback to any application that cares to listen for it. When a message is delivered, it fires off a callback to any application that cares to listen to it. You can register callbacks that store this information, then build your own redundancy frameworks on top of that to store the data you'd like to retain.

If you're interested in this, I'd recommend paper on PAST. It's a high availability storage system built on Pastry: http://research.microsoft.com/en-us/um/people/antr/PAST/hoto...

stavros · on Oct 20, 2012

Ah, I must have confused this for a storage algorithm, rather than a message passing algorithm. Doesn't "distributed hash table" refer to, well, a distributed hash table? One that stores key/value pairs? What good is it if the values are lost right after going in?

paddyforan · on Oct 20, 2012

That is what a distributed has table is. And the values and keys aren't lost--there's just some confusion over what the keys and values are.

The keys and values, at least in Pastry's case, are the NodeID and the metadata about the Node, respectively. According to my understanding, at least. I had to learn a lot to implement this, so I would not claim to be an expert on distributed hash tables by any stretch of the imagination.

stavros · on Oct 20, 2012

Hmm, so is the use case a distributed task queue? If so, why not use a consistent hash ring (http://www.martinbroadhurst.com/Consistent-Hash-Ring.html), or just pick a server at random? I'm afraid I'm missing the point entirely :/ Is it about maintaining a consistent list of member workers?

paddyforan · on Oct 21, 2012

A consistent hash ring is a similar concept, I think. We're quickly approaching the out-of-my-depth line, though.

It's about maintaining the property that you can route information from many different places and have that information end in a consistent place, no matter where it's sent from. And, assuming the consistent place doesn't go offline, a large number of servers can fall over without warning before that property is changed. It's about making discovery of nodes in your cluster reliable, and passing information in between them.

The use cases for that are many, and each has their own nuances, which is why this is so confusing and hard to talk about.

stavros · on Oct 21, 2012

Right. It sounds like one of those problems that can be solved in many ways, and a DHT is designed to deal with a specific constraint. Thanks for the details!