fbuchinger/gist:674212

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    OxJs - jQuery for your Bytes

OxJs (speak: HexaJs) is a utility library for javascript, that facilitates parsing and creation of binary data in Javascript. While this sounds weird in the first moment, there are a few areas where OxJs can come in handy:


If you need to transfer huge amounts of numercial data via AJAX, binary encoding can be much more space-efficient than transfering it in JSON. The number 41239 for example needs 5 Bytes in JSON, but only 2 Bytes in binary encoding (unsigned short). A more practical example is the  The Google Maps Polyline Utility, which saves over 80% of the json payload by binary encoding. However, de- and encoding binary information can be a quite tricky and this is were OxJs helps.


For client-side file parsing: Recent enhancements in Javascript allow us byte access for to-be-uploaded files. This can be used to perform enhanced sanity checks before the upload happens, even if the browser doesn't understand a file format natively. Imagine a print shop than only accepts PDFs with an embedded ICC profile or at a certain resolution.  By checking the to-be-uploaded file for this requirement, we can warn the user before a multi-megabyte upload happens (cf.ExifTool's PDF info).


But why OxJs, aren't there enough utility functions for binary operations?

Surely, there are already a lot of impressive file parsers done in Javascript, notably Jacob Seidelin's excellent Exif Parser and ID3 Reader. However, we are still in need of a common API that eases binary operations for a the common user and enables many people to write file parsers. Furthermore it should abstract the different methods of accessing the binary content in Javascript (Binary Ajax Request, W3C File API)
The ultimate goal is to enable a port of something like ExifTool to Javascript.
Recent Findings


Implement OxJS in Coffeescript, but offer a pre-compiled Javascript version
OxJS Blocks (parsers/containers of binary file segments like exif) could be implemented as coffeescript classes, using class variables for all the necessary setup (e.g. bind the block to certain parser events, specify supported file formats/mime types)
use an export of exiftool's tag database  (e.g. exiftool -listx -EXIF:All > exif.xml for all exif tags) and convert it to json with a python script. seems the most reliable way to get a cumulative index of all exif tags. Each parser class could specify such a tagdef file as a class variable.

OxJS API

Core Functions

Ox(data,[offset],[length])

initializes the HexaJs Object with data, optionally you can specify an initial offset and its length. You can think of Ox as a synonym for jQuery's popular $ shortcut.
About the data parameter:

If data is a string, we assume its a binary string (1 char = 1 byte).
If data is another javascript data type (array, object), its stored internally to be later used with the pack/unpack functions.
TODO: maybe add support for string initialisation in the format Ox(string, encoding) like Ox('abdê','utf-8')

Here is an example:
 OxBytes = Ox('\x00\x01\x00\x02\x00\x00\x00\x03');

slice(offset, length)

returns the slice of the Ox object at the specified offset with the provided length. The slice is returned as an Ox object
OxBytes.slice(1,2); // returns new Ox object with a buffer of '\x01\x00\x02'

read([type])

reads one byte from the current offset and returns its integer value. The offset of the Ox object is increased by 1.
With the optional type parameter (string), the following types can be read:

'byte' read a single byte
'short' and 'ushort' (2 bytes)
'long' and 'ulong' (4 bytes)
'float' (8 bytes) (not sure if possible)
'double' (8bytes) (not sure if possible)

the internal offset is increased according to the value length.
 OxBytes.read('byte') //returns 1 

length()

returns the length of the binary buffer in bytes. TODO think about possible options
 OxBytes.length() //returns 8 

endian(['big'|'little'])

defines the endianess of the HexaJs Object. calling endian() without parameters returns the endianess, by specifying ['big'|'little'] it can be set. The endianess defines how certain datatypes are interpreted.
readAt(offset, [type])

like read, but with a specified offset
OxBytes.readAt(3,'byte') //returns 1 

seek(offset)

sets the internal offset to the specified value, all read() calls and other HexaJs objects will use that offset value
OxBytes.seek(4)
tell(offset)

returns the current offset
OxBytes.tell() //returns 4

raw()

returns the raw binary data of the HexaJS object
OxBytes.raw() //returns "�������"

Structs

these functions work like C/Python's struct.pack/unpack functions, but they don't return a binary string directly, but a HexaJS object
pack(format)

 Ox([1,2,3]).pack('hh1'); // returns Ox object with value '\x00\x01\x00\x02\x00\x00\x00\x03'

unpack(format)

 Ox('\x00\x01\x00\x02\x00\x00\x00\x03').unpack('hh1'); // returns [1,2,3]

Binary AJAX

Ox.get(url, data, callback)

simple wrapper for Jacob Seidelin's Binary Ajax request.
Ox.getChunked(url, data, chunkSize, callback)

wrapper that fetches a file in chunks, so that e.g. not the whole file has to be downloaded to do some metadata parsing in the file header. callback signature is chunkCallback(completeChunkData, currentChunk), where completeChunkData is an Ox object with all chunks transferred until now and currentChunk is the currently transferred chunk. Requires that the webserver supports http-accept-range headers, otherwise simply delegates to Ox.get()
Ox.binaryAjax (?)###

fully-fledged binaryAjax function that is entirely configurable. Not yet sure about all features.
Blocks

One aim of OxJs is to ease file parsing drastically. You should be able to exactly extract the info you want, with a mininum of overhead. For this purpose, OxJs introduces the concept of blocks. Like a house consists of different components assembled together, a file format is made of different data blocks. The sub-entries of a block are items. An item consists of an id/index, a name, a datatype and a value.
OxJs blocks understand a certain data structure and can decode the items stored in it. An OxJs block is not a complete file parser, but it is for example able to decode a jpeg file inzo its different segments and some basic parameters like resolution and comment. If you now wanted to decode the exif info in the jpeg, you'd have to throw the buffer of the APP1 segments into an Exif block. If you wanted to decode the Makernote contained in the Exif.Makernote item, you'd have to throw it into an Exif.CanonMakernote block.
This modular approach gives you the freedom to extract exact the info you want. You don't have to sacrifice performance by extracting information you don't need.
A datablock in Ox is defined like so:
 OxJs.blocks.extend ({
    type: 'jfif',
    containers: 'file' //file = top-level-container
    identify: function (OxFirst1024b){
        firstByte = OxFirst1024b.read();
        secondByte = OxFirst1024b.read();
        return (firstByte == 0xFFD8 && secondByte == 0xFFD9);
    },
    parse: function (OxObj){
        var SEG_NUMS = {
           0xE0: "APP0", 
           0xE1: "APP1", 
           0xE2: "APP2", 
           0xE3: "APP3", 
           0xE4: "APP4", 
           0xE5: "APP5", 
           0xE6: "APP6", 
           0xE7: "APP7", 
           0xE8: "APP8", 
           0xE9: "APP9", 
           0xEA: "APP10",
           0xEB: "APP11",
           0xEC: "APP12",
           0xED: "APP13",
           0xEE: "APP14",
           0xEF: "APP15",
           0xDB: "DQT",
           0xC0: "SOF0", 
           0xC1: "SOF1", 
           0xC2: "SOF2", 
           0xC3: "SOF3", 
           0xC4: "DHT",  
           0xDD: "DRI",  
           0xDA: "SOS",  
           0xFE: "COM",  
           0xD9: "EOI"  
        };
        OxObj.read('ss', function(bytes){ //read 2 bytes as signed ints at once.
            if (bytes[0] == 0xFF && SEG_NUMS[bytes[2]]){
                var segmentName = 'segment:' + SEG_NUMS[bytes[2]];
                var segmentData = Ox.slice(segOffset, segLength);
                this.contains(segmentName, segmentData);
            }
        });
    },
});


OxJs.blocks.extend ({
    type: 'exif',
    containers: [
        'jfif:marker:app1' //exif data is in the app1 marker of jfif format
    ],
    identify: function (OxFirst1024b){
        //perform a truth test for the presence of exif data
    },
    parse: function (OxObj){
        //parse the data - todo - specify a callback for the results
    },
});