I made two new versions of the parser. At first I thought it would be "faster" to parse directly from a byte array with BitConverter
and make everything immutable. The result is here. But then I realized that in most cases you would be parsing lots of data from a stream anyway. So I made another that still uses a BinaryReader
.
I also got rid of my weird conversion operators and just wrote the type constraints. All this was in an effort to simplify things so that when compiled the IL would be simple, like you said. Amazingly, due to the power of inlining and optimization, the compiler gets rid of all the tuples and dummy values and in the end you have something that looks like you just wrote the BinaryReader
code yourself! You have to compile in release mode, of course.
For example, consider the following function that just parses a tuple:
let parseTuple bytes =
parseBinary bytes : int32 * int16 * int16
When I disassembled this to C# with Reflector, this is what I got:
public static Tuple<int, short, short> parseTuple(byte[] bytes)
{
using (BinaryReader reader = new BinaryReader(new MemoryStream(bytes)))
{
return new Tuple<int, short, short>(reader.ReadInt32(), reader.ReadInt16(), reader.ReadInt16());
}
}
Note that the tuple is only there because it's my return type. In other cases (such as a pattern match) it would be optimized away like everything else.
The code has some weirdness about it due to the limitations of the static type constraints. I keep trying things that I think should work but the compiler complains so I end up doing it some other way. I haven't found many comprensive resources on this.
One thing that it doesn't have right now is the ability to parse an array of things (primitives or tuples), which I could probably add. I looked a little at your MD3 parser and it looks like you're parsing some fixed-length strings, which would be hard to specify with this thing since it's all based on types. In order to control the string lengths, I would have to add a tuple of lengths or something as an argument to the parseBinary
function, but that would somewhat defeat the purpose by making the code much more verbose.
One thing that would be really cool is if I could create a type provider that allowed me to "annotate" a string type with its length, i.e. string<4>
, which would create a static property on the fake "type" to allow me to retrieve the length when I was parsing it. That would seem to solve the problem, if I could figure out how to make it work. I have a feeling that type providers and statically-resolved type parameters might not play all that well together. Also, since string
is sealed, I can't actually derive from it, so there might unfortunately have to be a wrapper type. I don't know if you can create fake derived types via erasure, but I'm guessing not.