Skip to content

Instantly share code, notes, and snippets.

@marcelovani
Last active October 13, 2016 22:19
Show Gist options
  • Save marcelovani/525617662154e00a65a8e6439badcadb to your computer and use it in GitHub Desktop.
Save marcelovani/525617662154e00a65a8e6439badcadb to your computer and use it in GitHub Desktop.
<div id="content">
<p>
<dfn>Diff</dfn> is the name of a file comparison program released for the Unix operating system in 1974.
The word <dfn>diff</dfn> is now used more generally to refer both to a function that compares strings or files, and to the output of that function.
On this page you can download a PHP class containing a diff implementation.
The class can be used to generate output such the following in a single line of code:
</p>
<table class="diff">
<tbody><tr>
<td class="diffUnmodified"><span>&lt;?php</span><span></span><span>// initialise the terms</span><span>$a = 1;</span></td>
<td class="diffUnmodified"><span>&lt;?php</span><span></span><span>// initialise the terms</span><span>$a = 1;</span></td>
</tr>
<tr>
<td class="diffDeleted"><span>$b = 0;</span></td>
<td class="diffInserted"><span>$b = 1;</span></td>
</tr>
<tr>
<td class="diffUnmodified"><span></span><span>// loop until we hit 150</span><span>while ($a &lt; 150){</span><span></span></td>
<td class="diffUnmodified"><span></span><span>// loop until we hit 150</span><span>while ($a &lt; 150){</span><span></span></td>
</tr>
<tr>
<td class="diffDeleted"><span> // output the current term</span><span> echo $a . ' ';</span></td>
<td class="diffInserted"><span> // output the current two terms</span><span> echo $a . ' ' . $b . ' ';</span></td>
</tr>
<tr>
<td class="diffUnmodified"><span></span></td>
<td class="diffUnmodified"><span></span></td>
</tr>
<tr>
<td class="diffDeleted"><span> // compute the next term</span><span> $c = $a;</span></td>
<td class="diffInserted"><span> // compute the next two terms</span></td>
</tr>
<tr>
<td class="diffUnmodified"><span> $a = $a + $b;</span></td>
<td class="diffUnmodified"><span> $a = $a + $b;</span></td>
</tr>
<tr>
<td class="diffDeleted"><span> $b = $c;</span></td>
<td class="diffInserted"><span> $b = $a + $b;</span></td>
</tr>
<tr>
<td class="diffUnmodified"><span> </span><span>}</span><span></span><span>?&gt;</span><span></span></td>
<td class="diffUnmodified"><span> </span><span>}</span><span></span><span>?&gt;</span><span></span></td>
</tr>
</tbody></table>
<h2 id="download">Download <cite>Diff</cite></h2>
<p>
Download the file below and upload it to your web server.
</p>
<table class="formattedTable">
<tbody><tr>
<th>File</th>
<th>Size</th>
<th>Description</th>
</tr>
<tr>
<td><a href="class.Diff.php">class.Diff.php</a></td>
<td class="number">11,230 bytes</td>
<td>PHP class</td>
</tr>
</tbody></table>
<h2 id="comparing">Comparing strings and files</h2>
<p>
The compare function is used to compare two strings and determine the differences between them on a line-by-line basis.
Setting the optional third parameter to <code>true</code> will change the comparison to be character-by-character.
For example:
</p>
<table class="codeListing">
<tbody><tr>
<td>
<pre>1
2
3
4
5
6
7
8
</pre>
</td>
<td>
<pre class="php"><code><span class="comment">// include the Diff class</span>
<span class="keyword">require_once</span> <span class="literal">'./class.Diff.php'</span>;
<span class="comment">// compare two strings line by line</span>
<span class="variable">$diff</span> = <span class="userClass">Diff</span>::<span class="userFunction">compare</span>(<span class="literal">"line1\nline2"</span>, <span class="literal">"lineA\nlineB"</span>);
<span class="comment">// compare two strings character by character</span>
<span class="variable">$diff</span> = <span class="userClass">Diff</span>::<span class="userFunction">compare</span>(<span class="literal">'abcmnz'</span>, <span class="literal">'amnxyz'</span>, <span class="keyword">true</span>);
</code></pre>
</td>
</tr>
</tbody></table>
<p>
The compareFiles function behaves identically, except that its first two parameters are paths to files:
</p>
<table class="codeListing">
<tbody><tr>
<td>
<pre>1
2
3
4
5
6
7
8
</pre>
</td>
<td>
<pre class="php"><code><span class="comment">// include the Diff class</span>
<span class="keyword">require_once</span> <span class="literal">'./class.Diff.php'</span>;
<span class="comment">// compare two files line by line</span>
<span class="variable">$diff</span> = <span class="userClass">Diff</span>::<span class="userFunction">compareFiles</span>(<span class="literal">'old.txt'</span>, <span class="literal">'new.txt'</span>);
<span class="comment">// compare two files character by character</span>
<span class="variable">$diff</span> = <span class="userClass">Diff</span>::<span class="userFunction">compareFiles</span>(<span class="literal">'old.bin'</span>, <span class="literal">'new.bin'</span>, <span class="keyword">true</span>);
</code></pre>
</td>
</tr>
</tbody></table>
<h2 id="differences">The differences array</h2>
<p>
The result of calling the compare and compareFiles functions is an array.
Each value in the array is itself an array containing two values.
The first value is a line (or character, if the third parameter was set to <code>true</code>) from one of the strings or files being compared.
The second value is one of the following three constants:
</p>
<table class="formattedTable">
<tbody><tr>
<th>Constant</th>
<th>Meaning</th>
</tr>
<tr>
<td>Diff::UNMODIFIED</td>
<td>The line or character is present in both strings or files</td>
</tr>
<tr>
<td>Diff::DELETED</td>
<td>The line or character is present only in the first string or file</td>
</tr>
<tr>
<td>Diff::INSERTED</td>
<td>The line or character is present only in the second string or file</td>
</tr>
</tbody></table>
<h2 id="output">Output functions</h2>
<p>
The Diff class includes three output functions, which cover many use cases and often mean you will not need to process the differences array directly.
</p>
<p>
The toString function returns a string representation of the differences.
The first parameter is the differences array, and the optional second parameter is the separator to use between lines of the output (by default, the newline character).
For example:
</p>
<table class="codeListing">
<tbody><tr>
<td>
<pre>1
2
3
4
5
</pre>
</td>
<td>
<pre class="php"><code><span class="comment">// include the Diff class</span>
<span class="keyword">require_once</span> <span class="literal">'./class.Diff.php'</span>;
<span class="comment">// output the result of comparing two files as plain text</span>
<span class="keyword">echo</span> <span class="userClass">Diff</span>::<span class="userFunction">toString</span>(<span class="userClass">Diff</span>::<span class="userFunction">compareFiles</span>(<span class="literal">'old.txt'</span>, <span class="literal">'new.txt'</span>));
</code></pre>
</td>
</tr>
</tbody></table>
<p>
Each line in the resulting string is a line (or character) from one of the strings or files being compared, prefixed by two spaces, a minus sign and a space, or a plus sign and a space, indicating which string or file contained the lines.
For example:
</p>
<pre id="toStringOutput"> An unmodified line
- A deleted line
+ An inserted line
</pre>
<p>
The toHTML function behaves similarly to the toString function, except that unmodified, deleted, and inserted lines are wrapped in span, del, and ins elements respectively, and the default separator is &lt;br&gt;.
For example:
</p>
<table class="codeListing">
<tbody><tr>
<td>
<pre>1
2
3
4
5
</pre>
</td>
<td>
<pre class="php"><code><span class="comment">// include the Diff class</span>
<span class="keyword">require_once</span> <span class="literal">'./class.Diff.php'</span>;
<span class="comment">// output the result of comparing two files as HTML</span>
<span class="keyword">echo</span> <span class="userClass">Diff</span>::<span class="userFunction">toHTML</span>(<span class="userClass">Diff</span>::<span class="userFunction">compareFiles</span>(<span class="literal">'old.txt'</span>, <span class="literal">'new.txt'</span>));
</code></pre>
</td>
</tr>
</tbody></table>
<p>
The toTable function produces a more advanced output, as shown in the example at the top of this page.
It returns the code for an HTML table whose columns contain the text of the two strings or files.
Each row corresponds either to a set of lines that have not been modified, or to a set of lines that have been deleted from the first string or file and a set of lines that have been added to the second string or file.
The function takes three parameters: the differences array, an amount of extra indentation to use in each line of the resulting HTML (which defaults to no extra indentation), and a separator (which defaults to &lt;br&gt;).
For example:
</p>
<table class="codeListing">
<tbody><tr>
<td>
<pre>1
2
3
4
5
</pre>
</td>
<td>
<pre class="php"><code><span class="comment">// include the Diff class</span>
<span class="keyword">require_once</span> <span class="literal">'./class.Diff.php'</span>;
<span class="comment">// output the result of comparing two files as a table</span>
<span class="keyword">echo</span> <span class="userClass">Diff</span>::<span class="userFunction">toTable</span>(<span class="userClass">Diff</span>::<span class="userFunction">compareFiles</span>(<span class="literal">'old.txt'</span>, <span class="literal">'new.txt'</span>));
</code></pre>
</td>
</tr>
</tbody></table>
<h2 id="styling">Styling the differences table</h2>
<p>
The toTable function applies various classes to the code it returns, including the class ‘diff’ on the table element itself.
At a minimum the table cells should be styled so that text appears at the top, as neighbouring cells may contain differing amounts of text.
If the strings or files being compared are source code, white space should be preserved and the text should be shown in a monospace typeface.
For example:
</p>
<table class="codeListing">
<tbody><tr>
<td>
<pre>1
2
3
4
5
6
</pre>
</td>
<td>
<pre class="css"><code><span class="classSelector">.diff</span> <span class="elementSelector">td</span>{
<span class="property">vertical-align</span> : <span class="value">top</span>;
<span class="property">white-space</span> : <span class="value">pre</span>;
<span class="property">white-space</span> : <span class="value">pre-wrap</span>;
<span class="property">font-family</span> : <span class="value">monospace</span>;
}
</code></pre>
</td>
</tr>
</tbody></table>
<p>
The two white-space rules are required for correct display in Internet Explorer prior to version 8 (see <cite><a href="../../html-and-css/white-space-handling/">White space handling: from HTML 2.0 to CSS3</a></cite> for more details).
See <cite><a href="../../html-and-css/fixing-browsers-broken-monospace-font-handling/">Fixing browsers’ broken monospace font handling</a></cite> for some important considerations when using monospace typefaces.
</p>
<p>
Each cell in the table has one of four classes: diffUnmodified, diffDeleted, diffInserted, and diffBlank.
The class diffBlank is used for the empty tables cells that occur when a deletion does not have a corresponding insertion, or the other way round.
In the example at the top of this page these classes are used to show deletions in red and insertions in green.
</p>
</div>
<?php
/*
class.Diff.php
A class containing a diff implementation
Created by Stephen Morley - http://stephenmorley.org/ - and released under the
terms of the CC0 1.0 Universal legal code:
http://creativecommons.org/publicdomain/zero/1.0/legalcode
*/
// A class containing functions for computing diffs and formatting the output.
class Diff{
// define the constants
const UNMODIFIED = 0;
const DELETED = 1;
const INSERTED = 2;
/* Returns the diff for two strings. The return value is an array, each of
* whose values is an array containing two values: a line (or character, if
* $compareCharacters is true), and one of the constants DIFF::UNMODIFIED (the
* line or character is in both strings), DIFF::DELETED (the line or character
* is only in the first string), and DIFF::INSERTED (the line or character is
* only in the second string). The parameters are:
*
* $string1 - the first string
* $string2 - the second string
* $compareCharacters - true to compare characters, and false to compare
* lines; this optional parameter defaults to false
*/
public static function compare(
$string1, $string2, $compareCharacters = false){
// initialise the sequences and comparison start and end positions
$start = 0;
if ($compareCharacters){
$sequence1 = $string1;
$sequence2 = $string2;
$end1 = strlen($string1) - 1;
$end2 = strlen($string2) - 1;
}else{
$sequence1 = preg_split('/\R/', $string1);
$sequence2 = preg_split('/\R/', $string2);
$end1 = count($sequence1) - 1;
$end2 = count($sequence2) - 1;
}
// skip any common prefix
while ($start <= $end1 && $start <= $end2
&& $sequence1[$start] == $sequence2[$start]){
$start ++;
}
// skip any common suffix
while ($end1 >= $start && $end2 >= $start
&& $sequence1[$end1] == $sequence2[$end2]){
$end1 --;
$end2 --;
}
// compute the table of longest common subsequence lengths
$table = self::computeTable($sequence1, $sequence2, $start, $end1, $end2);
// generate the partial diff
$partialDiff =
self::generatePartialDiff($table, $sequence1, $sequence2, $start);
// generate the full diff
$diff = array();
for ($index = 0; $index < $start; $index ++){
$diff[] = array($sequence1[$index], self::UNMODIFIED);
}
while (count($partialDiff) > 0) $diff[] = array_pop($partialDiff);
for ($index = $end1 + 1;
$index < ($compareCharacters ? strlen($sequence1) : count($sequence1));
$index ++){
$diff[] = array($sequence1[$index], self::UNMODIFIED);
}
// return the diff
return $diff;
}
/* Returns the diff for two files. The parameters are:
*
* $file1 - the path to the first file
* $file2 - the path to the second file
* $compareCharacters - true to compare characters, and false to compare
* lines; this optional parameter defaults to false
*/
public static function compareFiles(
$file1, $file2, $compareCharacters = false){
// return the diff of the files
return self::compare(
file_get_contents($file1),
file_get_contents($file2),
$compareCharacters);
}
/* Returns the table of longest common subsequence lengths for the specified
* sequences. The parameters are:
*
* $sequence1 - the first sequence
* $sequence2 - the second sequence
* $start - the starting index
* $end1 - the ending index for the first sequence
* $end2 - the ending index for the second sequence
*/
private static function computeTable(
$sequence1, $sequence2, $start, $end1, $end2){
// determine the lengths to be compared
$length1 = $end1 - $start + 1;
$length2 = $end2 - $start + 1;
// initialise the table
$table = array(array_fill(0, $length2 + 1, 0));
// loop over the rows
for ($index1 = 1; $index1 <= $length1; $index1 ++){
// create the new row
$table[$index1] = array(0);
// loop over the columns
for ($index2 = 1; $index2 <= $length2; $index2 ++){
// store the longest common subsequence length
if ($sequence1[$index1 + $start - 1]
== $sequence2[$index2 + $start - 1]){
$table[$index1][$index2] = $table[$index1 - 1][$index2 - 1] + 1;
}else{
$table[$index1][$index2] =
max($table[$index1 - 1][$index2], $table[$index1][$index2 - 1]);
}
}
}
// return the table
return $table;
}
/* Returns the partial diff for the specificed sequences, in reverse order.
* The parameters are:
*
* $table - the table returned by the computeTable function
* $sequence1 - the first sequence
* $sequence2 - the second sequence
* $start - the starting index
*/
private static function generatePartialDiff(
$table, $sequence1, $sequence2, $start){
// initialise the diff
$diff = array();
// initialise the indices
$index1 = count($table) - 1;
$index2 = count($table[0]) - 1;
// loop until there are no items remaining in either sequence
while ($index1 > 0 || $index2 > 0){
// check what has happened to the items at these indices
if ($index1 > 0 && $index2 > 0
&& $sequence1[$index1 + $start - 1]
== $sequence2[$index2 + $start - 1]){
// update the diff and the indices
$diff[] = array($sequence1[$index1 + $start - 1], self::UNMODIFIED);
$index1 --;
$index2 --;
}elseif ($index2 > 0
&& $table[$index1][$index2] == $table[$index1][$index2 - 1]){
// update the diff and the indices
$diff[] = array($sequence2[$index2 + $start - 1], self::INSERTED);
$index2 --;
}else{
// update the diff and the indices
$diff[] = array($sequence1[$index1 + $start - 1], self::DELETED);
$index1 --;
}
}
// return the diff
return $diff;
}
/* Returns a diff as a string, where unmodified lines are prefixed by ' ',
* deletions are prefixed by '- ', and insertions are prefixed by '+ '. The
* parameters are:
*
* $diff - the diff array
* $separator - the separator between lines; this optional parameter defaults
* to "\n"
*/
public static function toString($diff, $separator = "\n"){
// initialise the string
$string = '';
// loop over the lines in the diff
foreach ($diff as $line){
// extend the string with the line
switch ($line[1]){
case self::UNMODIFIED : $string .= ' ' . $line[0];break;
case self::DELETED : $string .= '- ' . $line[0];break;
case self::INSERTED : $string .= '+ ' . $line[0];break;
}
// extend the string with the separator
$string .= $separator;
}
// return the string
return $string;
}
/* Returns a diff as an HTML string, where unmodified lines are contained
* within 'span' elements, deletions are contained within 'del' elements, and
* insertions are contained within 'ins' elements. The parameters are:
*
* $diff - the diff array
* $separator - the separator between lines; this optional parameter defaults
* to '<br>'
*/
public static function toHTML($diff, $separator = '<br>'){
// initialise the HTML
$html = '';
// loop over the lines in the diff
foreach ($diff as $line){
// extend the HTML with the line
switch ($line[1]){
case self::UNMODIFIED : $element = 'span'; break;
case self::DELETED : $element = 'del'; break;
case self::INSERTED : $element = 'ins'; break;
}
$html .=
'<' . $element . '>'
. htmlspecialchars($line[0])
. '</' . $element . '>';
// extend the HTML with the separator
$html .= $separator;
}
// return the HTML
return $html;
}
/* Returns a diff as an HTML table. The parameters are:
*
* $diff - the diff array
* $indentation - indentation to add to every line of the generated HTML; this
* optional parameter defaults to ''
* $separator - the separator between lines; this optional parameter
* defaults to '<br>'
*/
public static function toTable($diff, $indentation = '', $separator = '<br>'){
// initialise the HTML
$html = $indentation . "<table class=\"diff\">\n";
// loop over the lines in the diff
$index = 0;
while ($index < count($diff)){
// determine the line type
switch ($diff[$index][1]){
// display the content on the left and right
case self::UNMODIFIED:
$leftCell =
self::getCellContent(
$diff, $indentation, $separator, $index, self::UNMODIFIED);
$rightCell = $leftCell;
break;
// display the deleted on the left and inserted content on the right
case self::DELETED:
$leftCell =
self::getCellContent(
$diff, $indentation, $separator, $index, self::DELETED);
$rightCell =
self::getCellContent(
$diff, $indentation, $separator, $index, self::INSERTED);
break;
// display the inserted content on the right
case self::INSERTED:
$leftCell = '';
$rightCell =
self::getCellContent(
$diff, $indentation, $separator, $index, self::INSERTED);
break;
}
// extend the HTML with the new row
$html .=
$indentation
. " <tr>\n"
. $indentation
. ' <td class="diff'
. ($leftCell == $rightCell
? 'Unmodified'
: ($leftCell == '' ? 'Blank' : 'Deleted'))
. '">'
. $leftCell
. "</td>\n"
. $indentation
. ' <td class="diff'
. ($leftCell == $rightCell
? 'Unmodified'
: ($rightCell == '' ? 'Blank' : 'Inserted'))
. '">'
. $rightCell
. "</td>\n"
. $indentation
. " </tr>\n";
}
// return the HTML
return $html . $indentation . "</table>\n";
}
/* Returns the content of the cell, for use in the toTable function. The
* parameters are:
*
* $diff - the diff array
* $indentation - indentation to add to every line of the generated HTML
* $separator - the separator between lines
* $index - the current index, passes by reference
* $type - the type of line
*/
private static function getCellContent(
$diff, $indentation, $separator, &$index, $type){
// initialise the HTML
$html = '';
// loop over the matching lines, adding them to the HTML
while ($index < count($diff) && $diff[$index][1] == $type){
$html .=
'<span>'
. htmlspecialchars($diff[$index][0])
. '</span>'
. $separator;
$index ++;
}
// return the HTML
return $html;
}
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment