I'm using the count type. So the per-message overhead for the length field is very small. The MAC OTOH is much larger, but your other idea of using much smaller MAC for the intermediate values is a really interesting one. That would solve the "The MAC can be used to make it seem that the message ends early." problem since the final MAC would be required to be a full 256 bits.
It isn't exactly to allow intermediaries to save overhead. It's more to allow intermediaries more flexibility in dealing with messages in whatever way is efficient for them.
And to answer your other question, my main concern is a DoS attack on the recipient. But really nothing I can do will makes this even a tiny bit harder. It's just better for a recipient to advertise a maximum message size that it can deal with without using too many resources.
Yeah, you're right. *thumps head* Sometimes you just get caught in stupid thought traps.
Oh, I just remembered the real reason... :-) I don't want people who use the protocol to be able to make assumptions about chunk lengths being preserved. If the chunk lengths are forced to be preserved by the nature of the protocol, then it would be very tempting to use that fact.
I think I will be using your solution with short MACs. I'll just use a 16 bit MAC as a health check along the way, and a full MAC at the end. The MACs will not include message lengths, but truncation will be prevented because an intermediary will not be able to extend a short MAC into a full MAC to use at the end.
I don't know whether or not the intermediate health checks are useful. But I feel better putting them in, and they shouldn't be that big an efficiency burden given that the message header will typically be more than a hundred bytes anyway.
Expected lengths (calculated with sample size 50000):
Distribution Scarab BER Cake
paretovariate(alpha=1) 1.01 2.00 1.00
paretovariate(alpha=1/2) 1.10 2.07 1.08
paretovariate(alpha=1/5) 1.61 2.49 1.66
paretovariate(alpha=1/10) 2.59 3.34 2.75
expovariate(lambd=1) 1.00 2.00 1.00
expovariate(lambd=1/10) 1.00 2.00 1.00
expovariate(lambd=1/100) 1.28 2.08 1.11
expovariate(lambd=1/1000) 1.88 2.77 1.80
expovariate(lambd=1/10000) 2.18 2.98 2.41
lognormvariate(mu=0,sigma=1) 1.00 2.00 1.00
lognormvariate(mu=0,sigma=10) 1.59 2.49 1.67
lognormvariate(mu=0,sigma=100) 9.01 8.98 8.44
lognormvariate(mu=1,sigma=1) 1.00 2.00 1.00
lognormvariate(mu=1,sigma=10) 1.68 2.56 1.78
lognormvariate(mu=1,sigma=100) 9.11 9.06 8.53
lognormvariate(mu=2,sigma=1) 1.00 2.00 1.00
lognormvariate(mu=2,sigma=10) 1.76 2.64 1.88
lognormvariate(mu=2,sigma=100) 9.19 9.13 8.61
I'm not overly concerned with efficiency beyond a certain point. And it looks like a CAKE count has a slight advantage there in most situations anyway.
ASN.1 has a reputation for being horribly obscure and painful to use. I don't know enough about Scarab to make a judgement. I would like to use something that was relatively nice and commonly accepted. Especially if lots of parsers existed for it already. Thinking about it, this makes D-BUS vaguely attractive, though that seems to have more other baggage than I want.
OTOH, I don't need the data format to be self-describing. Having type information encoded in the data isn't a negative if it doesn't impact efficiency, but I think it's largely superfluous for CAKE.
Also, your two responses were really helpful. :-) How did you find this post? How did you find my journal at all? :-)