paulfitz/Makefile

## Makefile
default:
	asciidoc tdiff_spec_draft.txt

## tdiff_spec_draft.txt
tDiff Format Specification
==========================
:Author: Paul Fitzpatrick, Joe Panico
:Email: paulfitz@alum.mit.edu, diffkit-user@googlegroups.com
:Date: 2011-01-12
:Revision: 0.2

[abstract]

Purpose
-------
This document defines the tDiff format; a text encoding for describing
the differences between two tables of data.

tDiff output is intended for two audiences: end-users who are looking
for an easily readable report that describes the differences between
two tables; and tools that will interpret tDiff output in order to
perform further processing or analysis (e.g. "patch" tools).

Scope
-----
tDiff should be able to describe the differences between *any types*
of data tables. That includes: tables that have an identifiable unique
key (e.g. RDBMS tables with primary key) and tables that don't (RDBMS
tables without primary key, or heaps); tables where row ordering is
meaningful, whether explicit or implied, such as spreadsheets or CSV
files; and tables where row ordering has no inherent meaning (RDBMS
tables). Further, tDiff should be able to represent the differences
between two different *types* of table, e.g. an RDBMS table versus a
spreadsheet table.

Limitations
-----------
tDiff is designed to described the differences in *content* between
two tables-- it does not have any facilities for describing the
differences in *structure* between two tables. A future revision will
expand the specification to include structure or schema information.

tDiff is designed to described text (character) content. It has no
facilities to describe differences in content that does not have a
natural default character representation. A future revision may expand
the specification to include arbitrary binary data.

tDiff can represent the differences between exactly *two* tables. It
cannot represent an n-way compare where n > 2.  It also cannot
represent the differences between collections of tables, such
as multiple-sheet spreadsheets.

Representation
--------------
tDiff documents use the UTF-8 character encoding.


Top-level structure
-------------------
tDiff documents comprise any number of <<Comment,comment blocks>>,
<<Control,control blocks>>, and
<<Hunk,diff hunks>>, interleaved in any order.
Each diff hunk describes a related set of differences
between two tables. Each hunk could stand on its own as
an independent tDiff document.  When there is a choice
in how to decompose differences between two tables as
a sequence of hunks,
generators are encouraged to choose a decomposition that
minimizes ordering effects between hunks.


.Document example
----
# tdiff version 0.2

/*
 * this comment could describe the whole document
 */

/*
 * this comment could describe the following hunk
 */
* |bridge=Brooklyn|designer:'J.A.Roebling'|length:1595|
= |bridge=Williamsburg|designer:'D.D.Duck'->'L.L.Buck'|length:1600|
* |bridge=Queensborough|designer:'Palmer & Hornbostel'|length:1182|

/*
 * this comment could describe the following hunk, even
 * though it's not immediately adjacent to the hunk (there is
 * an intervening newline).
 */

- |bridge=Spamspan|designer:'S.Spamington'|length:10000|
+ |bridge=Manhattan|designer:'G.Lindenthal'|length:1470|

/*
 * final, wrap up, comment
 */

----


[[Comment]]
Comment blocks
--------------
Comment blocks are delimited using: /* */ (C style). Any content can
occur within a comment block.

.Example comment
----
/* This is an example single-line comment */
/* This is an
   example multi-line
   comment */
----

[[Control]]
Control blocks
--------------
Control blocks begin with "# ", and are delimited by a newline or
linefeed.  Control blocks may hold meta information about diffs,
or environmental information that might be useful to an interpreter.
Apart from the special <<Header,header control block>>, they
lie outside of the scope of this specification.

[[Header]]
Header
------
A tdiff document should begin with a special
<<Control, control block>> called the header.
The header begins with the characters "# tdiff".  It is there
to aid in rapid identification of tdiff documents.  Documents
*without* this block should be accepted by interpreters, but
this block should always be included by generators.

.Example header
----
# tdiff version 0.2
----

[[Hunk]]
Hunk
----
A hunk is a series of one or more adjacent <<Line,diff lines>>,
optionally preceeded by a <<Column,column line>>, where each
diff line represents the
differences between the source tables for a single row. The
lines within a hunk should be separated by only the newline
characters that terminate each diff line, so that they all appear as
adjacent lines within a text editor. Within a tDiff document,
hunks are delimited from each other via intermediate filler
or comment blocks.

.Example hunk
----
- |bridge=Spamspan|designer:'S.Spamington'|length:10000|
= |bridge=Williamsburg|designer:'D.D.Duck'->'L.L.Buck'|length:1600|
+ |bridge=Manhattan|designer:'G.Lindenthal'|length:1470|
----

[[Line]]
Diff line
---------
A diff line describes differences in a single row of the two tables that
were compared. One table is designated the left or local table (called *L*)
and the other table is designated the right or remote table (called *R*).

There are three types of diff lines:

- MISSING line: describes a row that is present in *L* but absent in *R*.
- EXTRA line: describes a row that is absent in *L* but present in *R*.
- CHANGE line: describes a row that is present in both tables, but differs
in some specific column values.

Each diff line occupies its own line in the document, and begins with
one of three characters. These three characters are called "line type"
characters:

- MISSING lines begin with plus: _'+'_.  In order to make *R* look like *L* we would have to add the missing row to *R*.
- EXTRA lines begin with minus: _'-'_. In order to make *R* look like *L* we would remove the extra row from *R*.
- CHANGE lines begin with equals: _'='_. In order to make *R* look like *L* we would update some of its column values.

The line type character can be left or right padded with any amount of
whitespace, for readability. The line type character is followed by
any number of name-value pairs, where the names represent column
names, and the values are the values for the corresponding column name
in that particular row. The name is separated from the value by an
equals ('=') sign for identifying columns (usually part of the primary
key, but see <<Identity,Keys versus identity>>)
or a colon (':') sign for all other columns.
The name-value pairs, as well as the line type
character, are delimited with whitespace and/or a pipe _'|'_ character.

.Example diff line
----
= |bridge=Williamsburg|designer:'D.D.Duck'->'L.L.Buck'|length:1600|
----

[[Column]]
Column line
-----------

Optionally, key names can be removed from diff lines. To do this, a
special column line may be given within a hunk.  This lists
column names, followed by "=" for identifying columns.

.Column line example
----
@ |bridge=|designer|length|
----

This establishes bridge as an identifying column that appears first,
followed by designer and length columns.  We can now rewrite this:

.before
----
= |bridge=Williamsburg|designer:'D.D.Duck'->'L.L.Buck'|length:1600|
----

as this:

.after
----
= |Williamsburg|'D.D.Duck'->'L.L.Buck'|1600|
----

The effect of column lines is limited to within a single hunk.

ROW Pseudo Column
-----------------
Some sources of table data have a meaningful notion of row or line
number (e.g. spreadsheets, CSV files) but others do not (RDBMS
tables). The representation of the row number may be
external to the table data itself. In order to accommodate these
sources, document generators can make use of the ROW pseudo
column.

.pseudo column examples
----
+ |ROW=3|name3:value3|name4:value4|
- |ROW=4|
= |ROW=5|name3:old-value9->new-value9|name4:old-value10->new-value10 |
----

Note that a diff that uses ROW numbers may be less useful for applying to
RDBMS tables, and alternate row identity choices may be considered by the
generator (or specified to the generator by the user).  See
<<Identity,Keys versus identity>>.

Column Numbers
--------------
Some sources of table data do not use named columns-- for instance,
CSV files that have not designated a column header row. In those
cases, column names are replaced with column number designators. The
designators follow this pattern:

1, 2, ...

N.B. The ordinal numbers embedded within column number designators are
1's based, not 2's based. In other words, counting starts at 1, not at
zero.

.column number example
----
= |1=value1|2=value2|3:old-value3->new-value3|
----

If column names are omitted, and no <<Column,column line>> is given,
then ordinal numbers are assumed.  All columns are assumed to be
identifying, unless they are modified.
The previous example
can therefore be equivalently expressed as:

.column number example (rewritten)
----
= |value1|value2|old-value3->new-value3|
----


Context
-------

Though not always strictly required to fully and accurately described
differences between two tables, generators are allowed to include
extra rows and extra column values as contextual information. These
can help human readers oriented themselves within the data, and might
help machine interpreters to resolve otherwise ambiguities in applying
a tDiff document.

In the case of missing or extra row lines, extra
column data can be included within the row as name-value pairs, just
like any other name-value pairs. In the case of a change line,
extra name-value pairs can be included within the line.
These contextual name-value pairs should also
include only the old value. In other words, contextual column values should
not include "->new-value". In addition to contextual column
information, a hunk may contain any number of contextual rows. These
rows appear with the line type character _'*'_. Streaks of contextual
rows can prefix and suffix the block of actual (+, -, =) real diff
lines, but cannot be interspersed amongst them. In other words, a
hunk is:

- <streak of context lines>
- <streak of real (+ or - or =) diff lines>
- <streak of context lines>

.context example
----
* |bridge=Brooklyn|designer:'J.A.Roebling'|length:1595|
= |bridge=Williamsburg|designer:'D.D.Duck'->'L.L.Buck'|length:1600|
* |bridge=Queensborough|designer:'Palmer & Hornbostel'|length:1182|
----

Some tables are ordered, some unordered.  The context features
of tDiff make sense for ordered tables.  When a diff between
ordered tables is applied as a patch to an unordered table,
context features may be ignored.  When a diff between unordered
tables is applied as a patch to an ordered table, the resulting
order may be underspecified (but presumably does not matter).

[[Identity]]
Keys versus identity
--------------------
Determining whether a row is present in *L* and *R* requires
a judgment about row identity.  This judgment may be simple.
For example, the identity of a row may simply be the value of
its primary key.  However, it is possible that the identity
of a row is distinct from its primary key.  Consider for
example a table with an auto-incrementing integer primary key,
rather than something derived from the row data.  Comparison
of that key between separately maintained copies of that table
will be meaningless.  For meaningful comparison, an alternate
row identity would need to be constructed.

This issue lies outside the tDiff specification, but it is
important that implementors be aware that columns used for
identification may in some cases not be part of the primary key.

Appendix: Quoting
-----------------
Names or values may be quoted in a tDiff document.  Quoting is done
as follows:

- All instances of the single-quote character are duplicated into pairs.
- All control characters and the backslash character are escaped as for C literals.
- The name or value is wrapped in single-quotes

It is always safe to single-quote a name or value.  Names or values
*must* by quoted in any of the following conditions:

- A name or value conflicts with a reserved word: NULL, ROW.
- A name or value contains any character in the 7-bit ASCII range that is *not* in the following set: [A-Za-z0-9+.]
- A *name* begins with any of the characters [0-9+.].

In the case of column diffs, for each cell that was different between
*L* and *R*, both the old and new values are displayed. The
old value must come first, followed by '->' (dash greater than),
followed by the new value. For all three diff line types, the
generator may include *L* name-value pairs that are not
strictly needed, but may help with row identification.


Appendix: Grammar
-----------------

.Grammar
----
 document ::= header (block)*
 block ::= hunk | control | comment | filler

 hunk ::= (hunk_header)? (diff_line)+
 hunk_header ::= '@' (divider name)* break
 diff_line ::= ('-' | '+' | '=') (divider term)* break
 term ::= (name ('='|':'))? (value '->')? value
 break ::= divider? linebreak

 control ::= "#" (divider value)* break

 comment ::=  | ("/*" comment_body "*/")

 filler ::= (linebreak | divider)+

 header ::= '#' (divider term)* break

 divider ::= (' ' | '\t')* ('|')? (' ' | '\t')*
 linebreak ::= ('\r\n' | '\r' | '\n')

 # unspecified in grammar: name, value, comment_body
----

The linebreak non-terminal needs to be parsed greedily to work with all
common end-of-line formats, since the number of linebreaks is
significant in the grammar.  The comment_body non-terminal is
as for C.


Appendix: Complete examples
---------------------------

In example one, both tables are in an RDBMS, both tables have the same
column names, and the rows are identified using column1.

.Example 1 tables
----


L:				    R:
column1,column2,column3,column4     column1,column2,column3,column4
1,      0000,   x,      aaaa	    ----------------------------
----------------------------	    2,      1111,   x,      aaaa
3,      2222,   x,      aaaa	    3,      2222,   y,      aaaa
4,      3333,   x,      aaaa	    4,      0000,   z,      bbbb
5,      4444,   x,      aaaa	    5,      4444,   z,      bbbb
6,      5555,   x,      aaaa	    6,      5555,   u,      aaaa
----------------------------	    7,      0000,   v,      aaaa
----------------------------        8,      1111,   x,      aaaa
----


.Example 1 diff, variant 1
----
# tdiff version 0.2

/*
 * this is the tDiff document for example 1, using 1 hunk only and no context.
 * Note the "|" usage varies from previous examples in this document.
 * "|" plays the same role as spaces and tabs in the spec, so varying
 * styles are possible.
 */
- | column1=1
+ | column1=2| column2:1111| column3:x| column4:aaaa
= | column1=3| column3:x->y
= | column1=4| column2:3333->0000| column3:x->z| column4:aaaa->bbbb
= | column1=5| column3:x->z| column4:aaaa->bbbb
= | column1=6| column3:x->u
+ | column1=7| column2:0000| column3:v| column4:aaaa
+ | column1=8| column2:1111| column3:x| column4:aaaa
/*
 * end of tDiff document
 */
----

.Example 1 diff, variant 2
----
# tdiff version 0.2

/*
 * here is a tDiff document that is equivalent to the document above, except
 * that it uses 8 hunks, more comments, and adds in some context
 */
/*
 * hunk 1: notice that columns 2,3,4 are context-- not strictly necessary
 * to specify a remove
 */
- | column1=1| column2:0000| column3:x| column4:aaaa

/*
 * hunk 2: notice that the hunks are separated by standalone newline
 */
+ | column1=2| column2:1111| column3:x| column4:aaaa

/*
 * hunk 3: notice that column2 and column4 are merely context
 */
= | column1=3| column2:2222| column3:x->y| column4:aaaa

/*
 * hunk 4: notice that the column diff line is surrounded by context rows, and
 * that the context rows describe the values on the RHS.
 */
* | column1=3| column2:2222| column3:x| column4:aaaa
= | column1=4| column2:3333->0000| column3:x->z| column4:aaaa->bbbb
* | column1=5| column2:4444| column3:x| column4:aaaa

/*
 * hunk 5
 */
= | column1=5| column3:x->z| column4:aaaa->bbbb

/*
 * hunk 6
 */
= | column1=6| column3:x->u

/*
 * hunk 7
 */
+ | column1=7| column2:0000| column3:v| column4:aaaa

/*
 * hunk 8
 */
+ | column1=8| column2:1111| column3:x| column4:aaaa
----
	tDiff Format Specification
	==========================
	:Author: Paul Fitzpatrick, Joe Panico
	:Email: paulfitz@alum.mit.edu, diffkit-user@googlegroups.com
	:Date: 2011-01-12
	:Revision: 0.2

	[abstract]

	Purpose
	-------
	This document defines the tDiff format; a text encoding for describing
	the differences between two tables of data.

	tDiff output is intended for two audiences: end-users who are looking
	for an easily readable report that describes the differences between
	two tables; and tools that will interpret tDiff output in order to
	perform further processing or analysis (e.g. "patch" tools).

	Scope
	-----
	tDiff should be able to describe the differences between any types
	of data tables. That includes: tables that have an identifiable unique
	key (e.g. RDBMS tables with primary key) and tables that don't (RDBMS
	tables without primary key, or heaps); tables where row ordering is
	meaningful, whether explicit or implied, such as spreadsheets or CSV
	files; and tables where row ordering has no inherent meaning (RDBMS
	tables). Further, tDiff should be able to represent the differences
	between two different types of table, e.g. an RDBMS table versus a
	spreadsheet table.

	Limitations
	-----------
	tDiff is designed to described the differences in content between
	two tables-- it does not have any facilities for describing the
	differences in structure between two tables. A future revision will
	expand the specification to include structure or schema information.

	tDiff is designed to described text (character) content. It has no
	facilities to describe differences in content that does not have a
	natural default character representation. A future revision may expand
	the specification to include arbitrary binary data.

	tDiff can represent the differences between exactly two tables. It
	cannot represent an n-way compare where n > 2. It also cannot
	represent the differences between collections of tables, such
	as multiple-sheet spreadsheets.

	Representation
	--------------
	tDiff documents use the UTF-8 character encoding.


	Top-level structure
	-------------------
	tDiff documents comprise any number of <<Comment,comment blocks>>,
	<<Control,control blocks>>, and
	<<Hunk,diff hunks>>, interleaved in any order.
	Each diff hunk describes a related set of differences
	between two tables. Each hunk could stand on its own as
	an independent tDiff document. When there is a choice
	in how to decompose differences between two tables as
	a sequence of hunks,
	generators are encouraged to choose a decomposition that
	minimizes ordering effects between hunks.


	.Document example
	----
	# tdiff version 0.2

	/*
	* this comment could describe the whole document
	*/

	/*
	* this comment could describe the following hunk
	*/
	* \|bridge=Brooklyn\|designer:'J.A.Roebling'\|length:1595\|
	= \|bridge=Williamsburg\|designer:'D.D.Duck'->'L.L.Buck'\|length:1600\|
	* \|bridge=Queensborough\|designer:'Palmer & Hornbostel'\|length:1182\|

	/*
	* this comment could describe the following hunk, even
	* though it's not immediately adjacent to the hunk (there is
	* an intervening newline).
	*/

	- \|bridge=Spamspan\|designer:'S.Spamington'\|length:10000\|
	+ \|bridge=Manhattan\|designer:'G.Lindenthal'\|length:1470\|

	/*
	* final, wrap up, comment
	*/

	----


	[[Comment]]
	Comment blocks
	--------------
	Comment blocks are delimited using: /* */ (C style). Any content can
	occur within a comment block.

	.Example comment
	----
	/* This is an example single-line comment */
	/* This is an
	example multi-line
	comment */
	----

	[[Control]]
	Control blocks
	--------------
	Control blocks begin with "# ", and are delimited by a newline or
	linefeed. Control blocks may hold meta information about diffs,
	or environmental information that might be useful to an interpreter.
	Apart from the special <<Header,header control block>>, they
	lie outside of the scope of this specification.

	[[Header]]
	Header
	------
	A tdiff document should begin with a special
	<<Control, control block>> called the header.
	The header begins with the characters "# tdiff". It is there
	to aid in rapid identification of tdiff documents. Documents
	without this block should be accepted by interpreters, but
	this block should always be included by generators.

	.Example header
	----
	# tdiff version 0.2
	----

	[[Hunk]]
	Hunk
	----
	A hunk is a series of one or more adjacent <<Line,diff lines>>,
	optionally preceeded by a <<Column,column line>>, where each
	diff line represents the
	differences between the source tables for a single row. The
	lines within a hunk should be separated by only the newline
	characters that terminate each diff line, so that they all appear as
	adjacent lines within a text editor. Within a tDiff document,
	hunks are delimited from each other via intermediate filler
	or comment blocks.

	.Example hunk
	----
	- \|bridge=Spamspan\|designer:'S.Spamington'\|length:10000\|
	= \|bridge=Williamsburg\|designer:'D.D.Duck'->'L.L.Buck'\|length:1600\|
	+ \|bridge=Manhattan\|designer:'G.Lindenthal'\|length:1470\|
	----

	[[Line]]
	Diff line
	---------
	A diff line describes differences in a single row of the two tables that
	were compared. One table is designated the left or local table (called L)
	and the other table is designated the right or remote table (called R).

	There are three types of diff lines:

	- MISSING line: describes a row that is present in L but absent in R.
	- EXTRA line: describes a row that is absent in L but present in R.
	- CHANGE line: describes a row that is present in both tables, but differs
	in some specific column values.

	Each diff line occupies its own line in the document, and begins with
	one of three characters. These three characters are called "line type"
	characters:

	- MISSING lines begin with plus: _'+'_. In order to make R look like L we would have to add the missing row to R.
	- EXTRA lines begin with minus: _'-'_. In order to make R look like L we would remove the extra row from R.
	- CHANGE lines begin with equals: _'='_. In order to make R look like L we would update some of its column values.

	The line type character can be left or right padded with any amount of
	whitespace, for readability. The line type character is followed by
	any number of name-value pairs, where the names represent column
	names, and the values are the values for the corresponding column name
	in that particular row. The name is separated from the value by an
	equals ('=') sign for identifying columns (usually part of the primary
	key, but see <<Identity,Keys versus identity>>)
	or a colon (':') sign for all other columns.
	The name-value pairs, as well as the line type
	character, are delimited with whitespace and/or a pipe _'\|'_ character.

	.Example diff line
	----
	= \|bridge=Williamsburg\|designer:'D.D.Duck'->'L.L.Buck'\|length:1600\|
	----

	[[Column]]
	Column line
	-----------

	Optionally, key names can be removed from diff lines. To do this, a
	special column line may be given within a hunk. This lists
	column names, followed by "=" for identifying columns.

	.Column line example
	----
	@ \|bridge=\|designer\|length\|
	----

	This establishes bridge as an identifying column that appears first,
	followed by designer and length columns. We can now rewrite this:

	.before
	----
	= \|bridge=Williamsburg\|designer:'D.D.Duck'->'L.L.Buck'\|length:1600\|
	----

	as this:

	.after
	----
	= \|Williamsburg\|'D.D.Duck'->'L.L.Buck'\|1600\|
	----

	The effect of column lines is limited to within a single hunk.

	ROW Pseudo Column
	-----------------
	Some sources of table data have a meaningful notion of row or line
	number (e.g. spreadsheets, CSV files) but others do not (RDBMS
	tables). The representation of the row number may be
	external to the table data itself. In order to accommodate these
	sources, document generators can make use of the ROW pseudo
	column.

	.pseudo column examples
	----
	+ \|ROW=3\|name3:value3\|name4:value4\|
	- \|ROW=4\|
	= \|ROW=5\|name3:old-value9->new-value9\|name4:old-value10->new-value10 \|
	----

	Note that a diff that uses ROW numbers may be less useful for applying to
	RDBMS tables, and alternate row identity choices may be considered by the
	generator (or specified to the generator by the user). See
	<<Identity,Keys versus identity>>.

	Column Numbers
	--------------
	Some sources of table data do not use named columns-- for instance,
	CSV files that have not designated a column header row. In those
	cases, column names are replaced with column number designators. The
	designators follow this pattern:

	1, 2, ...

	N.B. The ordinal numbers embedded within column number designators are
	1's based, not 2's based. In other words, counting starts at 1, not at
	zero.

	.column number example
	----
	= \|1=value1\|2=value2\|3:old-value3->new-value3\|
	----

	If column names are omitted, and no <<Column,column line>> is given,
	then ordinal numbers are assumed. All columns are assumed to be
	identifying, unless they are modified.
	The previous example
	can therefore be equivalently expressed as:

	.column number example (rewritten)
	----
	= \|value1\|value2\|old-value3->new-value3\|
	----


	Context
	-------

	Though not always strictly required to fully and accurately described
	differences between two tables, generators are allowed to include
	extra rows and extra column values as contextual information. These
	can help human readers oriented themselves within the data, and might
	help machine interpreters to resolve otherwise ambiguities in applying
	a tDiff document.

	In the case of missing or extra row lines, extra
	column data can be included within the row as name-value pairs, just
	like any other name-value pairs. In the case of a change line,
	extra name-value pairs can be included within the line.
	These contextual name-value pairs should also
	include only the old value. In other words, contextual column values should
	not include "->new-value". In addition to contextual column
	information, a hunk may contain any number of contextual rows. These
	rows appear with the line type character _'*'_. Streaks of contextual
	rows can prefix and suffix the block of actual (+, -, =) real diff
	lines, but cannot be interspersed amongst them. In other words, a
	hunk is:

	- <streak of context lines>
	- <streak of real (+ or - or =) diff lines>
	- <streak of context lines>

	.context example
	----
	* \|bridge=Brooklyn\|designer:'J.A.Roebling'\|length:1595\|
	= \|bridge=Williamsburg\|designer:'D.D.Duck'->'L.L.Buck'\|length:1600\|
	* \|bridge=Queensborough\|designer:'Palmer & Hornbostel'\|length:1182\|
	----

	Some tables are ordered, some unordered. The context features
	of tDiff make sense for ordered tables. When a diff between
	ordered tables is applied as a patch to an unordered table,
	context features may be ignored. When a diff between unordered
	tables is applied as a patch to an ordered table, the resulting
	order may be underspecified (but presumably does not matter).

	[[Identity]]
	Keys versus identity
	--------------------
	Determining whether a row is present in L and R requires
	a judgment about row identity. This judgment may be simple.
	For example, the identity of a row may simply be the value of
	its primary key. However, it is possible that the identity
	of a row is distinct from its primary key. Consider for
	example a table with an auto-incrementing integer primary key,
	rather than something derived from the row data. Comparison
	of that key between separately maintained copies of that table
	will be meaningless. For meaningful comparison, an alternate
	row identity would need to be constructed.

	This issue lies outside the tDiff specification, but it is
	important that implementors be aware that columns used for
	identification may in some cases not be part of the primary key.

	Appendix: Quoting
	-----------------
	Names or values may be quoted in a tDiff document. Quoting is done
	as follows:

	- All instances of the single-quote character are duplicated into pairs.
	- All control characters and the backslash character are escaped as for C literals.
	- The name or value is wrapped in single-quotes

	It is always safe to single-quote a name or value. Names or values
	must by quoted in any of the following conditions:

	- A name or value conflicts with a reserved word: NULL, ROW.
	- A name or value contains any character in the 7-bit ASCII range that is not in the following set: [A-Za-z0-9+.]
	- A name begins with any of the characters [0-9+.].

	In the case of column diffs, for each cell that was different between
	L and R, both the old and new values are displayed. The
	old value must come first, followed by '->' (dash greater than),
	followed by the new value. For all three diff line types, the
	generator may include L name-value pairs that are not
	strictly needed, but may help with row identification.


	Appendix: Grammar
	-----------------

	.Grammar
	----
	document ::= header (block)*
	block ::= hunk \| control \| comment \| filler

	hunk ::= (hunk_header)? (diff_line)+
	hunk_header ::= '@' (divider name)* break
	diff_line ::= ('-' \| '+' \| '=') (divider term)* break
	term ::= (name ('='\|':'))? (value '->')? value
	break ::= divider? linebreak

	control ::= "#" (divider value)* break

	comment ::= \| ("/" comment_body "/")

	filler ::= (linebreak \| divider)+

	header ::= '#' (divider term)* break

	divider ::= (' ' \| '\t')* ('\|')? (' ' \| '\t')*
	linebreak ::= ('\r\n' \| '\r' \| '\n')

	# unspecified in grammar: name, value, comment_body
	----

	The linebreak non-terminal needs to be parsed greedily to work with all
	common end-of-line formats, since the number of linebreaks is
	significant in the grammar. The comment_body non-terminal is
	as for C.


	Appendix: Complete examples
	---------------------------

	In example one, both tables are in an RDBMS, both tables have the same
	column names, and the rows are identified using column1.

	.Example 1 tables
	----


	L: R:
	column1,column2,column3,column4 column1,column2,column3,column4
	1, 0000, x, aaaa ----------------------------
	---------------------------- 2, 1111, x, aaaa
	3, 2222, x, aaaa 3, 2222, y, aaaa
	4, 3333, x, aaaa 4, 0000, z, bbbb
	5, 4444, x, aaaa 5, 4444, z, bbbb
	6, 5555, x, aaaa 6, 5555, u, aaaa
	---------------------------- 7, 0000, v, aaaa
	---------------------------- 8, 1111, x, aaaa
	----


	.Example 1 diff, variant 1
	----
	# tdiff version 0.2

	/*
	* this is the tDiff document for example 1, using 1 hunk only and no context.
	* Note the "\|" usage varies from previous examples in this document.
	* "\|" plays the same role as spaces and tabs in the spec, so varying
	* styles are possible.
	*/
	- \| column1=1
	+ \| column1=2\| column2:1111\| column3:x\| column4:aaaa
	= \| column1=3\| column3:x->y
	= \| column1=4\| column2:3333->0000\| column3:x->z\| column4:aaaa->bbbb
	= \| column1=5\| column3:x->z\| column4:aaaa->bbbb
	= \| column1=6\| column3:x->u
	+ \| column1=7\| column2:0000\| column3:v\| column4:aaaa
	+ \| column1=8\| column2:1111\| column3:x\| column4:aaaa
	/*
	* end of tDiff document
	*/
	----

	.Example 1 diff, variant 2
	----
	# tdiff version 0.2

	/*
	* here is a tDiff document that is equivalent to the document above, except
	* that it uses 8 hunks, more comments, and adds in some context
	*/
	/*
	* hunk 1: notice that columns 2,3,4 are context-- not strictly necessary
	* to specify a remove
	*/
	- \| column1=1\| column2:0000\| column3:x\| column4:aaaa

	/*
	* hunk 2: notice that the hunks are separated by standalone newline
	*/
	+ \| column1=2\| column2:1111\| column3:x\| column4:aaaa

	/*
	* hunk 3: notice that column2 and column4 are merely context
	*/
	= \| column1=3\| column2:2222\| column3:x->y\| column4:aaaa

	/*
	* hunk 4: notice that the column diff line is surrounded by context rows, and
	* that the context rows describe the values on the RHS.
	*/
	* \| column1=3\| column2:2222\| column3:x\| column4:aaaa
	= \| column1=4\| column2:3333->0000\| column3:x->z\| column4:aaaa->bbbb
	* \| column1=5\| column2:4444\| column3:x\| column4:aaaa

	/*
	* hunk 5
	*/
	= \| column1=5\| column3:x->z\| column4:aaaa->bbbb

	/*
	* hunk 6
	*/
	= \| column1=6\| column3:x->u

	/*
	* hunk 7
	*/
	+ \| column1=7\| column2:0000\| column3:v\| column4:aaaa

	/*
	* hunk 8
	*/
	+ \| column1=8\| column2:1111\| column3:x\| column4:aaaa
	----