Skip to content

Instantly share code, notes, and snippets.

@borgar
borgar / Tiny JavaScript tokenizer.js
Created June 24, 2010 12:33
A compact tokenizer written in JavaScript.
/*
* Tiny tokenizer
*
* - Accepts a subject string and an object of regular expressions for parsing
* - Returns an array of token objects
*
* tokenize('this is text.', { word:/\w+/, whitespace:/\s+/, punctuation:/[^\w\s]/ }, 'invalid');
* result => [{ token="this", type="word" },{ token=" ", type="whitespace" }, Object { token="is", type="word" }, ... ]
*
*/