?

Log in

No account? Create an account

Protocol design question - Journal of Omnifarious

Jan. 27th, 2007

11:58 am - Protocol design question

Previous Entry Share Next Entry

Comments:

From:srlamb
Date:January 27th, 2007 10:01 pm (UTC)

What's the goal of coalescing chunks?

(Link)
It sounds like by allowing intermediaries to coalesce chunks, you're hoping to save bandwidth and/or storage costs. If that's true, let's determine the overhead of the worst case - infinite message size (so unchunked overhead approaches zero), generously large headers (say, 5 bytes offset (good for the full tebibyte at one-byte granularity), 5 bytes length (likewise), and SHA-512 means 64 bytes MAC), and stingy chunk sizes (say, 4,096 bytes including headers). 1-(4096-5-5-64)/4096 = 1.8% overhead. That's not so bad.

If it's not good enough...what's a typical MAC, 8 bytes? And if you said that all non-final chunks had to be 16,384 bytes, then you could do something like using 5 bytes for both the offset and the length - mask off the top 26 bits for the offset, mask off the bottom 14 bits for the length. I guess that's not quite right, as you'd have trouble terminating messages that are a multiple of the chunk size; you might have to steal another bit from the MAC for a 'final' flag (leaving a 63-bit MAC). That's 13 bytes of headers in a 16,384 byte chunk. 0.08% overhead.

Actually, you could do even more if you always check MACs serially. Have it be a running total, not just of the one chunk. 64-bit MAC, 15-bit length, 1-bit terminal flag means 10 bytes overhead. At maximum chunk size of 32,768 bytes, that's 0.03%.

You could have stingy MAC on each chunk and a really strong one at the end.
(Reply) (Thread)
[User Picture]
From:omnifarious
Date:January 28th, 2007 02:26 am (UTC)

Re: What's the goal of coalescing chunks?

(Link)

I'm using the count type. So the per-message overhead for the length field is very small. The MAC OTOH is much larger, but your other idea of using much smaller MAC for the intermediate values is a really interesting one. That would solve the "The MAC can be used to make it seem that the message ends early." problem since the final MAC would be required to be a full 256 bits.

It isn't exactly to allow intermediaries to save overhead. It's more to allow intermediaries more flexibility in dealing with messages in whatever way is efficient for them.

And to answer your other question, my main concern is a DoS attack on the recipient. But really nothing I can do will makes this even a tiny bit harder. It's just better for a recipient to advertise a maximum message size that it can deal with without using too many resources.

(Reply) (Parent) (Thread)
From:srlamb
Date:January 28th, 2007 02:49 am (UTC)

Re: What's the goal of coalescing chunks?

(Link)
Hmm, I still don't understand what you mean by flexibility. Without changing the structure of the chunks, it seems like they should be able to store uncoalesced chunks concatenated together, multiple headers and all.
(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious' OpenID account [omnifarious.org]
Date:January 29th, 2007 12:17 am (UTC)

Re: What's the goal of coalescing chunks?

(Link)

Yeah, you're right. *thumps head* Sometimes you just get caught in stupid thought traps.

(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:January 30th, 2007 11:27 pm (UTC)

Re: What's the goal of coalescing chunks?

(Link)

Oh, I just remembered the real reason... :-) I don't want people who use the protocol to be able to make assumptions about chunk lengths being preserved. If the chunk lengths are forced to be preserved by the nature of the protocol, then it would be very tempting to use that fact.

I think I will be using your solution with short MACs. I'll just use a 16 bit MAC as a health check along the way, and a full MAC at the end. The MACs will not include message lengths, but truncation will be prevented because an intermediary will not be able to extend a short MAC into a full MAC to use at the end.

I don't know whether or not the intermediate health checks are useful. But I feel better putting them in, and they shouldn't be that big an efficiency burden given that the message header will typically be more than a hundred bytes anyway.

(Reply) (Parent) (Thread)
From:srlamb
Date:January 28th, 2007 07:46 am (UTC)

Re: What's the goal of coalescing chunks?

(Link)
By the way, why your variable-length non-negative integer encoding over the elegant one used by Scarab's binary format or the older, widely used ASN.1 BER? If I understand them all, the number of bytes used go something like this:

input range Scarab BER bytes ASN.1 BER bytes Cake count bytes
[0, 2^7)111
[2^7, 222)232
[222, 8415)232
[8415, 2^14)233
[2^14, 2^16)333
[2^16, 2^21)344
[2^21, 2^24)444
[2^24, 2^28)455
[2^28, 2^32)555
[2^32, 2^35)566
[2^35, 2^40)666
[2^40, 2^42)677
[2^42, 2^48)777
[2^48, 2^49)788
[2^49, 2^56)888
[2^56, 2^63)999
[2^63, 2^64)1099


I'm not sure which is more efficient. Seems like if you came up with an idea of the distribution of numbers used as input, you could calculate the expected value of the length. My coding theory is not good, but Benford's law suggests maybe a power-law distribution like the Pareto?

Efficiency aside, I like Scarab's elegance (simpler, no length limit, no values with two representations), and it's apparently common enough that Perl's pack and unpack support it natively. All three formats screw over the alignment that makes simple and fast hardware possible, but I guess that's just the nature of the game.

Apparently there's also ASN.1 PER, supposedly more efficient and complex, with an option to not even align to byte boundaries.
(Reply) (Parent) (Thread)
From:srlamb
Date:January 28th, 2007 09:23 pm (UTC)

Re: What's the goal of coalescing chunks?

(Link)
Hmm, typo in the second row, the Cake value should be 1.

A quick Python program calculates with some distributions I found interesting:

$ ./lengthtest.py 
Expected lengths (calculated with sample size 50000):
Distribution                    Scarab  BER     Cake
paretovariate(alpha=1)          1.01    2.00    1.00
paretovariate(alpha=1/2)        1.10    2.07    1.08
paretovariate(alpha=1/5)        1.61    2.49    1.66
paretovariate(alpha=1/10)       2.59    3.34    2.75
expovariate(lambd=1)            1.00    2.00    1.00
expovariate(lambd=1/10)         1.00    2.00    1.00
expovariate(lambd=1/100)        1.28    2.08    1.11
expovariate(lambd=1/1000)       1.88    2.77    1.80
expovariate(lambd=1/10000)      2.18    2.98    2.41
lognormvariate(mu=0,sigma=1)    1.00    2.00    1.00
lognormvariate(mu=0,sigma=10)   1.59    2.49    1.67
lognormvariate(mu=0,sigma=100)  9.01    8.98    8.44
lognormvariate(mu=1,sigma=1)    1.00    2.00    1.00
lognormvariate(mu=1,sigma=10)   1.68    2.56    1.78
lognormvariate(mu=1,sigma=100)  9.11    9.06    8.53
lognormvariate(mu=2,sigma=1)    1.00    2.00    1.00
lognormvariate(mu=2,sigma=10)   1.76    2.64    1.88
lognormvariate(mu=2,sigma=100)  9.19    9.13    8.61


(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:January 29th, 2007 12:12 am (UTC)

Re: What's the goal of coalescing chunks?

(Link)

I'm not overly concerned with efficiency beyond a certain point. And it looks like a CAKE count has a slight advantage there in most situations anyway.

ASN.1 has a reputation for being horribly obscure and painful to use. I don't know enough about Scarab to make a judgement. I would like to use something that was relatively nice and commonly accepted. Especially if lots of parsers existed for it already. Thinking about it, this makes D-BUS vaguely attractive, though that seems to have more other baggage than I want.

OTOH, I don't need the data format to be self-describing. Having type information encoded in the data isn't a negative if it doesn't impact efficiency, but I think it's largely superfluous for CAKE.

(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:January 28th, 2007 02:27 am (UTC)

Re: What's the goal of coalescing chunks?

(Link)

Also, your two responses were really helpful. :-) How did you find this post? How did you find my journal at all? :-)

(Reply) (Parent) (Thread)
From:srlamb
Date:January 28th, 2007 02:46 am (UTC)

Re: What's the goal of coalescing chunks?

(Link)
Glad I could help. I saw your post in advogato's recentlog. I'm slamb there.
(Reply) (Parent) (Thread)