laughinghan/MathSON.md

## MathSON.md

      
    Raw
  

              MathSON.md
            
          
    MathSON

Status: Draft 1 In Progress. This document is undergoing its first
revision. Initial implementation has begun alongside editing Draft 1.
Your feedback is hoped and dreamed of.
Mathematical Structured Object Notation is a JSON-based representation
for most of the common subset of what LaTeX and Presentation MathML can
represent, but distilled down to the essential content and structure.
It can also represent diffs between two formulas in a similar fashion to
ottypes/text and ottypes/rich-text, and the associated cursor
positions are a better fit for editing math than the DOM Ranges
associated with MathML.
Just show me some code already

Okay.
// the quadratic formula
> MathSON.fromLatex('x = \\frac{ -b \\pm \\sqrt{ b^2 - 4ac } }{ 2a }').ops
['x=',
 { numer: ['-b±', { sqrt: ['b', { sup: ['2'] }, '-4ac'] }],
   denom: ['2a'] }]

// diff: simplifying the derivative of square root
// \frac{1}{2} x^{ - \frac{1}{2} }
> var a = MathSON([{numer: ['1'], denom: ['2']}, 'x', {sup: ['-', {numer: ['1'], denom: ['2']}]}]);
// \frac{ 1 }{ 2 \sqrt{x} }
> var b = MathSON([{numer: ['1'], denom: ['2', {sqrt: ['x']}]}]);

> a.diff(b).ops
[{ _denom: [1, { sqrt: ['x'] }] },
 { delete_: ['x', { sup: ['-', { numer: ['1'], denom: ['2'] }] }] }]
When editing the quadratic formula, '2.numer.3.sqrt.5' represents a cursor in this position:


Rationale

Currently, the viable Web formats for math formulae are:

AMS-LaTeX subset (as rendered by MathJax and KaTeX) is a compromise between
human- and machine-readability, plus legacy/compat concerns.
MathML is a compromise between people who like XML and people who are
reasonable...who am I kidding, MathML is even further unnecessarily complex
than just due to XML (see "What do you have against MathML?")
obscure formats like AsciiMath that optimize even more for human-readability
than TeX & friends, at the expense of machine-readability and simplicity

MathSON is intended to fill the niche of being easily parseable (unlike TeX)
into a tree structure that's easy to understand and use (unlike MathML).
My primary motivation is MathQuill, my formula editor whose API uses an
AMS-LaTeX subset to represent math. It's impractical to use this API to
implement stuff like what typing a slash / does in MathQuill, which is to
scan backwards until a + or similar and move the group into the numerator of
a fraction (so typing 1+1/x yields 1+\frac{1}{x}). You'd need to parse the
LaTeX into an AST, scan & modify the AST, then serialize the AST back to LaTeX.
Which sucks, because MathQuill already has a perfectly good (well, not
perfectly, but still) internal AST that it parses the LaTeX to, so that'd be
so much wasted, duplicated parsing & serialization. (More on why MathQuill
needs MathSON)
Completely unintentionally, this format turns out to be surprisingly useful for
for accessibility: for most math its tree structure is isomorphic to the
corresponding MathSpeak, the speech protocol used by mathematician Abe Nemeth
(inventor of the widely-used Nemeth Braille Code for Math). Notably, whereas
MathML has lots of extraneous information that'd be ignored when converting to
MathSpeak (like <mo> vs <mi> vs <mn>), that's all implicit in MathSON just
like in MathSpeak; in other words, virtually anything explicit in MathSON is also
explicit in MathSpeak. Even the cursor positions represent closely what a screen
reader would read when navigating an editing interface for MathSpeak.
(It wasn't until we started work on just such an accessible math editing
interface that I noticed this.)
This suggests that MathSON does in fact perfectly extract the essential
content and structure of any given math.
Details

Proposed examples of the subset that is just math, not math diffs:

> MathSON.fromLatex('x = \\frac{ -b \\pm \\sqrt{ b^2 - 4ac } }{ 2a }')
['x=',
 { numer: ['-b±', { sqrt: ['b', { sup: ['2'] }, '-4ac'] }],
   denom: ['2a'] }]
> MathSON.fromLatex('\\frac{ \\sin x }{ x }')
[{ numer: [{ inline_op: ['sin'] }, 'x'], denom: ['x'] }]
> MahtSON.fromLatex('\\sin\\left( \\frac{1}{x} \\right)')
[{inline_op: ['sin']}, {$left: '(', group: [{numer: ['1'], denom: ['x']}], $right: ')'}]

// KaTeX's homepage example:
> MathSON.fromLatex('f(x) = \\int_{-\\infty}^\\infty \\hat f(\\xi) e^{2 \\pi i \\xi x} d\\xi')
[
    "f(x)=∫",
    { "sub": ["-∞"], "sup": ["∞"] },
    { "hat": ["f"] },
    "(ξ)e",
    { "sup": ["2πiξx"] },
    "dξ"
]
Notes:

The top-level MathSON object is always an array. Arrays represent
snippets of math known as "blocks". Arrays contain strings which
represent math symbols, and objects which represent "commands"
i.e. complex math notation like fractions and paren groups.
Command objects' keys will usually be letters-only (not even _
allowed), in which case the value must be a math block (an array
of strings and objects, as described above; in the future we may
allow arrays of arrays to support LaTeX's cases and matrix).
Command objects' keys can also be any string starting with a dollar
sign $, these can have any JSON value (these are "attributes"
rather than "content", basically).
What do the special inline_* keys do? They're for blocks of math
that don't have a boundary or border that the cursor has to cross.
At the edge of a normal block of math, like a square root or paren
group, the cursor can cross between inside and outside; but at the
left edge of sin, there's no inside or outside, the s, i,
and n are "inline" in the containing block. At most one is ever
allowed, and if present, all other keys must start with $ (i.e.
be "attribute" keys, so that cursor position can make sense,
see below).
This just uses Unicode rather than TeX and friends' backslash names
for fancy math symbols, which strikes me as both simpler (it's just
text!) and better specced (Unicode is a mess but at least has a
standards body, nobody likes hunting through Plain TeX, LaTeX,
AMS-LaTeX, nonstandard MathJax commands and more for the right
backslash name for every symbol).

We do still have to restrict to a subset of Unicode and ban stuff
like Unicode subscripts and superscripts.
For ASCII-only environments, built-into JSON is a Unicode escape
sequence (e.g. \u2264 for ≤).


Uniqueness: any given MathSON value is represented by exactly one
JSON value. The serialization of a JSON value isn't unique, of course
(e.g. whitespace insensitivity), but this means that deep comparison
of JSON values tells you all you need to know about MathSON.
For MathML, what if two trees are equivalent except for an id
attribute? What about different lspace or minsize/maxsize
attributes? Who knows?

Now, extend that to a math diff:

// \frac{1}{2} x^{ - \frac{1}{2} }
var a = MathSON([{numer: ['1'], denom: ['2']}, 'x', {sup: ['-', {numer: ['1'], denom: ['2']}]}]);
// \frac{ 1 }{ 2 \sqrt{x} }
var b = MathSON([{numer: ['1'], denom: ['2', {sqrt: ['x']}]}]);

a.diff(b) // => [{_denom: [1, {sqrt: ['x']}]}, {delete_: ['x', {sup: ['-', {numer: ['1'],
          //                                                                denom: ['2']}]}]}]
Note that just like in ottypes/text, an insert of a piece of MathSON is
represented by "itself", a retain/skip is represented by a raw number (different
from a numeral string), and a delete is represented by an object with a special
key (but it's an invertible delete because, c'mon, diffs should be invertible).
One new thing is a syntax to mutate an existing thing, using keys prefixed with
_ or insert_ or delete_:
// \frac{1}{2} + \frac{1}{2} + x_1 + x_1 + x_2
var c = MathSON([{numer: ['1'], denom: ['2']}, '+', {numer: ['1'], denom: ['2']}, '+x', {sub: ['1']},
                 '+x', {sub: ['1']}, '+x', {sub: ['2']}]);

// \frac{x}{y}\frac{1}{2} + \frac{x1}{y2} + x_1^2 + x^2{}_1 + x^2
//   (the second-to-last one can be typed in MathQuill by typing x^2 y_1 and backspacing the y)
var d = MathSON([{numer: ['x'], denom: ['y']}, {numer: ['1'], denom: ['2']}, '+', {numer: ['x1'], denom: ['y2']},
                 '+x', {sub: ['1'], sup: ['2']}, '+x', {sup: ['2']}, {sub: ['1']}, '+x', {sup: ['2']}]);

c.diff(d) // => [{numer: ['x'], denom: ['y']}, 2, {_numer: ['x'], _denom: ['y']}, 2, {insert_sup: ['2']},
          //     2, {sup: ['2']}, 3, {delete_sub: ['2'], insert_sup: ['2']}]
(Alternative: ottypes/text is deliberately noninvertible, we could do
the same to slightly simplify our syntax:
a.diff(b) // => [{_denom: [1, {sqrt: ['x']}]}, {_delete_: 2}]
c.diff(d) // => [{numer: ['x'], denom: ['y']}, 2, {_numer: ['x'], _denom: ['y']}, 2,
          //     {insert_sup: ['2']}, 2, {sup: ['2']}, 3, {_delete_: 'sub', insert_sup: ['2']}]
)
Finally, cursor positions:

A cursor position is just a sequence of indicies and keys (typically alternating,
but cases and matrix may change that), always starting and ending with an index.
For example, consider:


[{numer: ['1'], denom: ['2']}, 'x', {sup: ['-', {numer: ['1'], denom: ['2']}]}]
To get to the cursor position, we start in the root block, go to its 2nd item
(0-indexed) which is the superscript, go into its sup block, go to its 1st item
which is the fraction, go into its denominator, and go to slice index 1 (slicing
from index 0 would slice from before the 2). In JavaScript this could be
mathObj[2].sup[1].denom[1]; for simplicity, in MathSON this is represented by
the string '2.sup.1.denom.1'.
Note that these indices aren't quite array indices, since strings can span a range
of indices. Consider:


['ax', {sup: ['2']}, '+by', {sub: ['2']}]
The cursor is in the middle of the string '+by', which is the 2nd item in the
array, but in MathSON there are cursor positions between adjacent symbols, so the
cursor is at index 4.
There is one special case, inline_* blocks. Whereas normal commands only count
for one index increment, inline_* blocks are like strings, they can span a range
of indices. Consider:


[{numer: [{inline_op: ['sin']}, 'x+', {inline_op: ['cos']}, 'x'],
  denom: ['x']}]
In this case, the cursor position is '0.numer.7', there isn't a step in the
cursor position where we go "into" the cos. This makes sense if you consider what
happens between the cos and the x: there is no going "into" or coming "out of"
the cos, from the cursor's perspective the c, o, and s are at the same
"level" as the x.
This is also why inline_* commands may only have "attributes" but no "content" child
blocks. If the cos block had a child: ['y'], what would be the cursor position
of a cursor next to the y? The c is index 5, the o is index 6, the s is
index 7, but what index is the command with a .child?
Did you notice the extensibility?

Nothing about diffs or cursor positions is math-specific. We could use this for
rich text:
['This sentence has both ', {inline_text: ['bold'], $bold: true}, ' and ',
 {inline_text: ['italic'], $italic: true}, ' words.']
and the diff and cursor position definitions (and hopefully, implementations)
would work equally well.
I'm not sure what to call that—MathSON Level 0, Base MathSON, Core MathSON,
DocSON, EditSON, EdSON—but I think it's very important. MathQuill's edit
tree and associated cursor and selection model was originally designed by Jeanine
and then haphazardly evolved by me basically just to generalize the cases we
thought of at the time (fractions, square roots, and paren groups I
guess—we also had supsubs but the tree model already didn't generalize well
to them—hence the double-layered tree where blocks have a variable number
of commands, each with some fixed number of blocks).
We didn't, couldn't, and can't think about all the other math notation supported
by TeX and friends that we want to eventually support. There will be continuing
work to add commands to MathQuill, and that needs to be possible without having
to change the underlying tree and cursor model that everything else relies on.
In fact, a safe tree and cursor model opens up entirely new API possibilities,
since the lack of safety is a key reason that MathQuill's tree and cursor are super
hidden away from the API ([For MathQuill, this is more than just a notation.]
(#for-mathquill-this-is-more-than-just-a-notation)).
By the by, this is why inline_* needs to be a block (array) and not just a
string, even though the only immediate use-case is operator names like sin
whose contents are only ever strings. MathSON Level 0 shouldn't know about only
being strings, it only knows about cursor position semantics. And, I can totally
imagine use cases that aren't just strings, like exotic sup/sub,
or like if in the rich text example above, a bold region of text had some math
in it:
{$bold: true, inline_text: ['7⋅10', {sup: ['2']}, ' weight bold']}
Separate from MathSON Level 0, there of course needs to be a spec for MathSON
Level 1 listing the kinds of commands accepted, {numer, denom} for fractions,
{$left, group, $right} for paren/bracket/brace groups, {sup, sub}, etc.
"Isn't this just the 'Extensible Markup Language' + MathML with a different syntax?"

First of all, syntax matters. Syntax is a UI, and shapes every interaction that
people have with something.
Secondly, syntax isn't even that big a part of MathSON Level 0, as described.
I've talked enough about semantics that it's more like XML + DOM (including
DOM Ranges, kinda analogous to cursor positions).
Thirdly, by relegating e.g. Unicode escaping and most well-formedness concerns
(matching braces etc) to the "lower level" JSON spec, JSON + MathSON Level 0 are
better organized than XML which deals with all of that in one monolithic spec.
Finally, you know what's crazy? Even with all that, JSON + MathSON Level 0
combined is still simpler than XML alone. There are no intricate whitespace
semantics, no Text vs CharacterData vs Comments vs Processing Instructions,
no custom character entity references, no self-closing tags. Hell, the only
consideration we have to make that XML doesn't (that isn't because XML is
missing a feature we need), as far as I can think of, is that JSON inherited
JS's UTF-16 surrogate pairs for "astral plane" Unicode characters, and I dunno
how our indicies should treat those.
(See also "Wait so, what do you have against XML?")
Open Questions


should commands have a type? Seems unnecessary to me
full words (numerator, subscript) or abbreviations (numer, sub)?
should the format be even more minimal? Currently arrays are required in
more places than are strictly necessary for Level 0 to be unambiguous, for
example one-half (1/2) is [{numer: ['1'], denom: ['2']}] when it could
be {numer: '1', denom: '2'} instead. I prefer arrays because I think it
makes it clearer why the cursor position rightward of the 2 is 0.denom.1,
for instance, whereas without the outer array making the root block explicit
it seems like it should just be denom.1 or something
should there be a "noncanonical" variant where prohibited Unicode characters
like subscript and superscript characters are allowed, and a canonicalization
that'll convert them into "proper" {sub: ...} objects?
(Folding them into nearby ones as necessary)

what about the goddamned Mathematical Alphanumeric Symbols? Bold,
italic, serif/sans-serif/monospace clearly need to be canonicalized as a
font style thing, but what about calligraphic, fraktur, and double-struck?
Do we have to use a different font? MathQuill doesn't; then again,
MathQuill's font, Symbola, doesn't support that full range, only the subset
that's actually in the Letterlike Symbols block.
the "noncanonical" variant could also feature unmerged consecutive strings
(i.e. canonicalize(['ab', 'cd']) => ['abcd'])


Asides

For MathQuill, this is more than just a notation.

This is a way of life.
Really though, I'm so excited about this as an API to manipulate MathQuill's
tree structure, even by internal code. MathQuill's internal tree manipulation API
is so prone to becoming ill-formed if you sneeze at it that there are 750 lines
of 89 tests for paren typing behavior, to make sure that the tree
and cursor doesn't become ill-formed in the course of the manipulation in all the
different cases. There are intrinsically a lot of cases, don't get me wrong...
but for any given case, there's 4 or 8 tests checking the same paren typing
behavior in similar tree shapes. That's not cool.
One major source of bugs in particular has been that the cursor position
is represented by pointers to nodes in the tree, and that can easily
become ill-formed due to simple modifications to the tree. (#429 is an
example of this class of bugs that was fixed not that long ago.) This is
actually kind of a blocker for exposing the tree and cursor to manipulation
by external API calls: how do we ensure well-formedness without the API
feeling like moving piles of rice around with tweezers (like if all you had
was cursor.moveLeft() and cursor.moveRight() or something)? Well, how
come flat text fields don't have this problem? The answer is in the
data model.
In flat text fields, a cursor position is an index, so even if it is
ill-formed (i.e. out of bounds), the right way to normalize it is obvious,
just clamp it to the nearest bound. By contrast, in MathQuill's current
representation where the cursor position is pointers to tree nodes, if the
cursor's parent is a detached node, there's no obvious way to normalize that
into where the cursor "should" be. However, if the cursor position is a path
through the ancestors like proposed here, normalization is obvious, put the
cursor in the deepest ancestor that still exists.
And externally, of course, the LaTeX imported and exported by MathQuill isn't
meant to be human-edited (the point of MathQuill is to edit math visually,
which is more human-readable than a text format could ever be), so LaTeX
compromising machine-readability for human-readability doesn't really serve
MathQuill well. MathQuill needs a format where the overriding concern is being
dead simple for machines to read, possibly at the expense of human-readibility.
"Why don't super/subscripts have a base?"

"...like they do in both MathML and KaTeX's AST?"
Because that lets you do stuff like {\frac{ \frac{1}{2} }{3} + 1 + 2}^2:


What the hell is that? How do you edit that? How do you show whether the
cursor is inside or outside the base of the super/subscript? There's no
analogue when writing math on a whiteboard.
Note that something like it is still possible with e.g.:
{inline_base: [{numer: ['1'], denom: ['2']}, '+1', {sup: ['2']}]}
and a special relationship between the containing thingy and the sup node.
"What do you have against MathML?"

Okay so, (Presentation) MathML is supposed to, more or less, represent the
same data (structure and content) as the relevant AMS-LaTeX subset, but more
machine-readable and amenable to the horrifying existing ecosystem of
XML tools, right? What are the other reasons people think everything should
be in XML? Tim B-L talked about "the fruits of well-formed systems" but
like, TeX and friends don't suffer from the rapidly evolving incompatibilities
that HTML had, nor the ill-formedness problems inherent to SGML descendants
like <b><i>LOL</b></i>, it's not like influential TeX tools are forgiving of
unmatched braces and handling them in undocumented, ill-understood ways.
Okay so great, MathML lets you leverage existing XML tools for parsing and
stuff, maximizing your synergy for win-win solutions, etc. Which is great if
you necessarily need a format that makes parsing and stuff hard. But wouldn't
it be even better if you could use a format so trivially simple that parsing
and stuff is easier to do by hand than it would be to configure and use giant
heavyweight XML parsing tools?
Even beyond the whole XML thing, MathML is unnecessarily complex, encompassing
aspects of semantics or presentation that fundamentally are neither structure
nor content. Especially having to specify <mo> vs <mi> vs <mn>, whereas
in LaTeX that's implicit in the normal case, yet no one worries that LaTeX
isn't expressive enough compared to MathML.
This is even more apparent contrasting with MathSpeak. <mo> vs <mi> vs
<mn>? Not even representable, should belong to Content MathML or OpenMath.
Attributes like form or lspace or stretchy? Ignored, belongs solidly in
the domain of visual display styling. <mrow>? Is that meaningful to anyone?
LaTeX lets you put braces {} anywhere, which leads to shitty situations
with super/subscripts that aren't representable in MathSpeak nor MathSON.
"Wait so, what do you have against XML?"

Well, I could attempt thoughtful, balanced reasoning of why XML's tradeoffs are
a poor fit for math, but if I can't do better than this HN commenter, is it
really worth it? Instead I shall present a more visceral argument.
"Is parsing XML really that hard and heavyweight?" Look, parsing XML isn't
hard like parsing HTML is hard, but just look at this JSON:
[
  {
    numer: ['1'],
    denom: ['2']
  },
  'x',
  {
    sup: [
      '-',
      {
        numer: ['1'],
        denom: ['2']
      }
    ]
  }
]
In MathML, that'd be, what:
<math>
  <mfrac>
    <mn>1</mn>
    <mn>2</mn>
  </mfrac>
  <msup>
    <mi>x</mi>
    <mrow>
      <mo>-</mo>
      <mfrac>
        <mn>1</mn>
        <mn>2</mn>
      </mfrac>
    </mrow>
  </msup>
</math>

Don't worry if that's not valid MathML, this is about XML. My point is, here's how
to get to the 2 in the exponent in MathSON:
mathObj[2].sup[1].denom
(That's JS but it'd be similarly straightforward in Python or Ruby or whatever.)
By comparison, in MathML:
mathTree.children[1].children[1].children[1].children[1]
That gets you the <mn>, by the way, not the Text node containing the string
'2'. There's a difference. Now, which would you rather deal with? These generic
tree node things, or plain old dictionaries and arrays?
Credits

This gimmick:

Just show me some code already

Okay.

was blatantly stolen from @jneen's literary masterpiece.

  
## show-up-after-.md-MathSON.js
this.MathSON = (function () {
  var unicodeToLatex = {
    '±': 'pm'
  };
  var keyToCommand = {
    numer: Fraction,
    denom: Fraction,
    sqrt: SquareRoot,
    sup: SupSub,
    sub: SupSub
  };
  function Fraction(cmd) { this.cmd = cmd; }
  Fraction.prototype.toLatex = function () {
    return [
      '\\frac{',
      this.cmd.numer.toLatex(),
      '}{',
      this.cmd.denom.toLatex(),
      '}'
    ].join('');
  };
  function SquareRoot(cmd) { this.cmd = cmd; }
  SquareRoot.prototype.toLatex = function () {
    return [
      '\\sqrt{',
      this.cmd.sqrt.toLatex(),
      '}'
    ].join('');
  };
  function SupSub(cmd) { this.cmd = cmd; }
  SupSub.prototype.toLatex = function () {
    return [
      '^{',
      this.cmd.sup.toLatex(),
      '}'
    ].join('');
  };

  function MathSON(ops) {
    if (!Array.isArray(ops)) throw 'Need Array, got ' + JSON.stringify(ops);
    if (!(this instanceof MathSON)) return new MathSON(ops);
    this.ops = ops;
  }
  MathSON.prototype.toLatex = function () {
    return this.ops.map(function (op, i, ops) {
      // for strings, translate Unicode chars to LaTeX, like ± to \pm
      if (typeof op === 'string') {
        return op.split('').map(function (ch) {
          if (ch in unicodeToLatex) {
            return '\\' + unicodeToLatex[ch] + ' ';
          }
          return ch;
        }).join('');
      }
      // for objects,
      if (typeof op === 'object' && op !== null) {
        var Command = keyToCommand[Object.keys(op)[0]];
        if (Command) {
          var cmd = {};
          Object.keys(op).forEach(function (key) {
            if (/^[a-z]+$/i.test(key)) cmd[key] = MathSON(op[key]);
            else if (key.charAt(0) === '$') cmd[key] = op[key];
            else throw 'Unexpected key \'' + key + "'";
          });
          return new Command(cmd).toLatex();
        }
      }
      throw 'Unexpected ' + JSON.stringify(op);
    }).join('')
    .replace(/ (?![a-z])/ig, '');
  };
  MathSON.toLatex = function (ops) {
    return MathSON(ops).toLatex();
  };
  return MathSON;
}());

// tests
console.assert(MathSON([{numer: ['1'], denom: ['2']}]).toLatex() === '\\frac{1}{2}');
console.assert(MathSON.toLatex(['x=',
 { numer: ['-b±', { sqrt: ['b', { sup: ['2'] }, '-4ac'] }],
   denom: ['2a'] }]) === "x=\\frac{-b\\pm\\sqrt{b^{2}-4ac}}{2a}");
	this.MathSON = (function () {
	var unicodeToLatex = {
	'±': 'pm'
	};
	var keyToCommand = {
	numer: Fraction,
	denom: Fraction,
	sqrt: SquareRoot,
	sup: SupSub,
	sub: SupSub
	};
	function Fraction(cmd) { this.cmd = cmd; }
	Fraction.prototype.toLatex = function () {
	return [
	'\\frac{',
	this.cmd.numer.toLatex(),
	'}{',
	this.cmd.denom.toLatex(),
	'}'
	].join('');
	};
	function SquareRoot(cmd) { this.cmd = cmd; }
	SquareRoot.prototype.toLatex = function () {
	return [
	'\\sqrt{',
	this.cmd.sqrt.toLatex(),
	'}'
	].join('');
	};
	function SupSub(cmd) { this.cmd = cmd; }
	SupSub.prototype.toLatex = function () {
	return [
	'^{',
	this.cmd.sup.toLatex(),
	'}'
	].join('');
	};

	function MathSON(ops) {
	if (!Array.isArray(ops)) throw 'Need Array, got ' + JSON.stringify(ops);
	if (!(this instanceof MathSON)) return new MathSON(ops);
	this.ops = ops;
	}
	MathSON.prototype.toLatex = function () {
	return this.ops.map(function (op, i, ops) {
	// for strings, translate Unicode chars to LaTeX, like ± to \pm
	if (typeof op === 'string') {
	return op.split('').map(function (ch) {
	if (ch in unicodeToLatex) {
	return '\\' + unicodeToLatex[ch] + ' ';
	}
	return ch;
	}).join('');
	}
	// for objects,
	if (typeof op === 'object' && op !== null) {
	var Command = keyToCommand[Object.keys(op)[0]];
	if (Command) {
	var cmd = {};
	Object.keys(op).forEach(function (key) {
	if (/^[a-z]+$/i.test(key)) cmd[key] = MathSON(op[key]);
	else if (key.charAt(0) === '$') cmd[key] = op[key];
	else throw 'Unexpected key \'' + key + "'";
	});
	return new Command(cmd).toLatex();
	}
	}
	throw 'Unexpected ' + JSON.stringify(op);
	}).join('')
	.replace(/ (?![a-z])/ig, '');
	};
	MathSON.toLatex = function (ops) {
	return MathSON(ops).toLatex();
	};
	return MathSON;
	}());

	// tests
	console.assert(MathSON([{numer: ['1'], denom: ['2']}]).toLatex() === '\\frac{1}{2}');
	console.assert(MathSON.toLatex(['x=',
	{ numer: ['-b±', { sqrt: ['b', { sup: ['2'] }, '-4ac'] }],
	denom: ['2a'] }]) === "x=\\frac{-b\\pm\\sqrt{b^{2}-4ac}}{2a}");