?

Log in

No account? Create an account

Protocol design question - Journal of Omnifarious

Jan. 27th, 2007

11:58 am - Protocol design question

Previous Entry Share Next Entry

Comments:

From:srlamb
Date:January 28th, 2007 07:46 am (UTC)

Re: What's the goal of coalescing chunks?

(Link)
By the way, why your variable-length non-negative integer encoding over the elegant one used by Scarab's binary format or the older, widely used ASN.1 BER? If I understand them all, the number of bytes used go something like this:

input range Scarab BER bytes ASN.1 BER bytes Cake count bytes
[0, 2^7)111
[2^7, 222)232
[222, 8415)232
[8415, 2^14)233
[2^14, 2^16)333
[2^16, 2^21)344
[2^21, 2^24)444
[2^24, 2^28)455
[2^28, 2^32)555
[2^32, 2^35)566
[2^35, 2^40)666
[2^40, 2^42)677
[2^42, 2^48)777
[2^48, 2^49)788
[2^49, 2^56)888
[2^56, 2^63)999
[2^63, 2^64)1099


I'm not sure which is more efficient. Seems like if you came up with an idea of the distribution of numbers used as input, you could calculate the expected value of the length. My coding theory is not good, but Benford's law suggests maybe a power-law distribution like the Pareto?

Efficiency aside, I like Scarab's elegance (simpler, no length limit, no values with two representations), and it's apparently common enough that Perl's pack and unpack support it natively. All three formats screw over the alignment that makes simple and fast hardware possible, but I guess that's just the nature of the game.

Apparently there's also ASN.1 PER, supposedly more efficient and complex, with an option to not even align to byte boundaries.
(Reply) (Parent) (Thread)
From:srlamb
Date:January 28th, 2007 09:23 pm (UTC)

Re: What's the goal of coalescing chunks?

(Link)
Hmm, typo in the second row, the Cake value should be 1.

A quick Python program calculates with some distributions I found interesting:

$ ./lengthtest.py 
Expected lengths (calculated with sample size 50000):
Distribution                    Scarab  BER     Cake
paretovariate(alpha=1)          1.01    2.00    1.00
paretovariate(alpha=1/2)        1.10    2.07    1.08
paretovariate(alpha=1/5)        1.61    2.49    1.66
paretovariate(alpha=1/10)       2.59    3.34    2.75
expovariate(lambd=1)            1.00    2.00    1.00
expovariate(lambd=1/10)         1.00    2.00    1.00
expovariate(lambd=1/100)        1.28    2.08    1.11
expovariate(lambd=1/1000)       1.88    2.77    1.80
expovariate(lambd=1/10000)      2.18    2.98    2.41
lognormvariate(mu=0,sigma=1)    1.00    2.00    1.00
lognormvariate(mu=0,sigma=10)   1.59    2.49    1.67
lognormvariate(mu=0,sigma=100)  9.01    8.98    8.44
lognormvariate(mu=1,sigma=1)    1.00    2.00    1.00
lognormvariate(mu=1,sigma=10)   1.68    2.56    1.78
lognormvariate(mu=1,sigma=100)  9.11    9.06    8.53
lognormvariate(mu=2,sigma=1)    1.00    2.00    1.00
lognormvariate(mu=2,sigma=10)   1.76    2.64    1.88
lognormvariate(mu=2,sigma=100)  9.19    9.13    8.61


(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:January 29th, 2007 12:12 am (UTC)

Re: What's the goal of coalescing chunks?

(Link)

I'm not overly concerned with efficiency beyond a certain point. And it looks like a CAKE count has a slight advantage there in most situations anyway.

ASN.1 has a reputation for being horribly obscure and painful to use. I don't know enough about Scarab to make a judgement. I would like to use something that was relatively nice and commonly accepted. Especially if lots of parsers existed for it already. Thinking about it, this makes D-BUS vaguely attractive, though that seems to have more other baggage than I want.

OTOH, I don't need the data format to be self-describing. Having type information encoded in the data isn't a negative if it doesn't impact efficiency, but I think it's largely superfluous for CAKE.

(Reply) (Parent) (Thread)