I've been a bit inspired, but also sick - Journal of Omnifarious
Feb. 25th, 2008
08:57 pm - I've been a bit inspired, but also sick
I've been pretty sick, but in the few productive hours I've had today and over the weekend, I've been working on this project I've been suddenly inspired to do, mostly because of the investigation required for me to write this post on Thrift, D-Bus and RPC.
I've been wanting a self-describing binary data format that's simpler and less ugly than ASN.1. I also wanted one in which an extremely fast parser could be built for fixed-length data structures known at compile time. Either through the use of an IDML (which would be optional) to generate code, or crazier techniques like template meta-programming in C++.
There are several things I would change about this parser, and a few features I would like to add.
Selected data types are currently variable length and I would like to introduce a '
<count>' syntax for annotating these types with a length and thereby making them fixed-length. I would like to add this capability to the arbitrary precision integer type, the binary blob type, the string type and the array type.
It needs a type for time that very explicitly states that the time is represented as a arbitrary-precision integer that encodes an offset (positive or negative) in seconds from some base time in UTC.
I would sort of like to incorporate Thrift's idea of field tags so data structures could be upgraded in a backwards compatible way.
My current idea for this is a variant of the tuple type that would require a field tag after every type element in the tuple.
Also, that parser is inefficient. Ideally it would build up the parse as one or more calls to Python's
struct.unpack each of which would unpack multiple values. Right now, though
struct.unpack is used fairly heavily it only ever (well, my fancy arbitrary precision integer parser not-withstanding) unpacks one element in any call.
Lastly, right now, it expects the type value to be immediately preceded by a type spec. That's a design mistake. The type spec and type value should be handled separately except for the 'variant' type.
This brings me to another couple of features I think would be interesting, but very tricky. It would be nice if 'variant' types could refer to a previously used 'variant' type. Partly for efficiency reasons, and partly for better clarity since one use of the variant type is to record information present in various derived classes of some base class. It would also enable encoding recursive data structures in a saner way. Additionally it might be nice to be able to refer to previously decoded values in some way for data structures that couldn't fit into a strict tree.
On interesting thing, I think you could conceivably use this type tag system to describe IP packet layouts or other binary formats that have existed previously.