?

Log in

No account? Create an account

Protocol design question - Journal of Omnifarious

Jan. 27th, 2007

11:58 am - Protocol design question

Previous Entry Share Next Entry

Comments:

From:hattifattener
Date:January 27th, 2007 10:40 pm (UTC)
(Link)
It seems to me that there's another desirable property, of allowing intermediate nodes to fragment a chunk without losing integrity checking of the newly-subdivided chunks. That seems really difficult. I bet there's been some academic research into it, though.

One approach would be to just not allow coalescing of MACs, even if you allow coalescing of chunks. The originator would generate a MAC for every N bytes of the message, with the MAC being computed over the data plus a header (start-offset and length of this chunk). This means the message could end up with arbitrarily many MACs, but the overhead can be kept below a fixed fraction of the total message size, so it still scales well. This is kind of how Bittorrent does it.

Intermediaries can still coalesce chunks, as long as the coalesced chunk still carries along with it the multiple MACs associated with it. Intermediaries can subdivide chunks, but the subdivided chunk is not checkable until the rest of that MAC's data has been collected.

For whole-message authentication, the originator could generate a final MAC over the whole message, or over the collection of all the message's MACs (or more generally over a pyramidal tree of M(AC)^n s), or the MACs could be chained (the implicit header hashed into each MAC would include the previous chunk's MAC as well as a flag indicating whether this is the last chunk). The tradeoff here is between allowing parallelization of message generation (maybe the message is a response from a distributed data store, and it would be nice to be able to generate each chunk independently, followed by computation of the final MAC) or allowing message streaming with O(1) storage (my CAKE-enabled webcam, for example, may not have an arbitrary amount of storage available to it).

On the third hand, maybe the right answer is to try to disentangle fragmentation-and-reassembly from the CAKE layer entirely (or at least from the integrity-protection mechanisms). CAKE messages could be nonfragmentable and limited to less than 2^32 bytes, but CAKE could define a standard protocol for communicating longer messages, much like IP datagrams are limited to 2^16 bytes but almost everybody uses TCP to handle longer messages.

(Reply) (Thread)
[User Picture]
From:omnifarious' OpenID account [omnifarious.org]
Date:January 28th, 2007 02:36 am (UTC)
(Link)

Allowing messages to be transported in a bittorrenty sort of way is an interesting idea. I think the operative construct here would be a Merkle Hash Tree (http://open-content.net/specs/draft-jchapweske-thex-02.html#anchor2). This allows to you pick a fixed small block size and to allow arbitrary coalescing and splitting of message data on block-size boundaries. I believe that what a .torrent file largely consists of is all the hash data for just such a construct for a 64kibibyte blocksize.

It will be possible for recipients to advertise a maximum message size to the world. So forcing messages to be small and then be split up isn't really necessary. A recipient can make this happen by advertising a maximum message size that's fairly small.

But your ideas give me some other interesting ideas, which I'll try to detail in another reply. I want to catch a bus that is coming soon though.

(Reply) (Parent) (Thread)
From:hattifattener
Date:January 28th, 2007 03:27 am (UTC)
(Link)
Yeah, that's what I was referring to with my comment about pyramidal tree of MACs.

Bittorrent doesn't use a hash tree. The file is divided into chunks (of arbitrary size, but it has to be uniform for any given .torrent) and the hashes of all the chunks are in the .torrent file.
(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:January 28th, 2007 11:42 pm (UTC)
(Link)

Oh, it doesn't? I'm mildly disappointed. Those seemed so nice. :-)

There is a way to handle a hash tree where only the top level hash needs to be handed out centrally. Then any other node can send you data for a block plus at most logb n hashes where b is the block size and n is the total file size in order for you to verify the block it sent. And usually it's a lot less.

(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:January 28th, 2007 02:39 am (UTC)

OpenID and links

(Link)

I hate what LJ does to links posted by OpenID users.

Merkle Hash Trees

(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:January 28th, 2007 02:41 am (UTC)

Even better link

(Link)

Hash tree from Wikipedia.

(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:January 29th, 2007 12:41 am (UTC)
(Link)

My other idea that's related is to implement certain services in CAKE using a peculiar underlying model.

Anytime static data is sent back, it's never sent back within the CAKE protocol that's requested. All that's sent back is the hash of the data that was requested. Then a separate step is done to fetch the data the hash represents. This allows all kinds of arbitrary caching mechanisms to be created to handle things. It solves a couple of what I consider a major problem with the web. First that a server always has to be reachable. Secondly, that server has to have the bandwidth to send everybody the data they're asking for.

Always using the root hash of a Merkle tree hash allows a LOT more flexibility in how caching schemes might be implemented.

(Reply) (Parent) (Thread)