Tordek/gist:3549370

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    wNetStrings

A retake on netstrings.
wNetstrings are a silly format I thought of while writing a tnetstrings parser with Parsec in Haskell.
An arguable weakness of tnetstrings is that you need to hold the string up to the type mark in memory in order to start parsing them. This is arguably minimal, since you still need to have everything in memory in order to parse it properly (into a number or whatever). A slightly bigger issue I ran into while writing the parser is that it was hard to write the parser without backtracking, making it slower than it could be (mainly because of Lists and Dicts). Now, arguably, that's just because I know shit about Haskell.
Another format?

Yes, don't use it.
So, what's a wNetstring?

A wNetstring is a Wrapper Netstring: it simply involves adding typemarks to data, and putting that into a netstring. Like tnetstrings, wnetstrings don't really have a "string" type, as much as a "raw data" type. You're in charge of deciding what encoding you put into them.
Why?

Because.
Advantages and disadvantages


It's parsable with a parser with a single lookahead char, unlike tnetstrings.


It's typed, like tnetstrings.


It's (somewhat) more human-readable than a tnetstring.


Wnetstrings are valid netstrings, so you can use any old parser, in principle.


It's about as human-writable as the others.


It's 1 or 2 characters longer per field than the equivalent tnetstring.


I came up with it, so it's probably wrong, or already done.


Valid netstrings and tnetstrings are NOT valid wnetstrings.


Format:

Obvously the EBNF is insufficient, because size depends on its contents.
wnetstring = size, ':', datum, ',' ;
size = digit { digit } ;
datum = wnull | wbool | winteger | wfloat | wstring | wlist | wdict
wnull = '~' ;
wbool = '!', ('t' | 'f') ;
winteger = '#', [ '-' | '+' ], digit, { digit } ;
wfloat = '^', float ;
wstring = '"', { byte }, '"' ;
wlist = '[', { wnetstring }, ']' ;
wdict = '{', { size, ':', wstring, wnetstring }, '}' ;

Implementations must support at least 32b integers. You're allowed to put larger numbers in, but there are no guarantees.
float is vague because I'm not sure how to specify them. Implementations must be able to handle at least single precision (IEEE 754-1985). You may put in more digits, but implementations may ignore them.
Samples:

"Foo" -> '5:"Foo",'
12345 -> '6:#12345,'
[1,2,3] -> '17:[2:#1,2:#2,2:#3,],'
{ 'bar': [ 1252, True, None ] } -> 34:{5:"bar",23:[5:#1252,2:!t,1:~,],},

History

Revision: Yeah... the paired "", {} and [] are nice, but dupes look stupid on numbers an other things, so let's just kill'em.
Now, arguably, the integer typemark can be gotten rid of, but it makes a special case of numbers (and '+' and '-'), which may or may not be a good idea. It does make for shorter strings like this one:
[1,2,3] -> '14:[1:1,1:2,1:3,],'