Skip to content

Instantly share code, notes, and snippets.

Created July 10, 2014 12:44
Show Gist options
  • Save anonymous/f6aec3479828d8939456 to your computer and use it in GitHub Desktop.
Save anonymous/f6aec3479828d8939456 to your computer and use it in GitHub Desktop.
Presentation
<!DOCTYPE html>
<html>
<head>
<title>Foo</title>
<meta charset='utf-8' />
<meta name='viewport' content='width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0' />
<style type='text/css'>
body {
font-family: 'Helvetica';
letter-spacing:-5px;
background:#000;
background-size:100%;
color:#fff;
margin:0;
padding:0;
font-weight:bold;
}
h1, h2, h3, p {
margin:0;
}
em, a {
font-style:normal;
color:#8dbd0c;
}
a {
background: #34d0e7;
color:#000;
text-decoration:none;
}
img {
width:100%;
}
div {
cursor:pointer;
cursor:hand;
position:absolute;
top:0;
left:0;
}
</style>
<script type='text/javascript'>
window.onload = function() {
var s = document.getElementsByTagName('div'), cur = 0, ti;
if (!s) return;
function go(n) {
cur = n;
var i = 1e3, e = s[n], t;
document.body.className = e.dataset.bodyclass || '';
for (var k = 0; k < s.length; k++) s[k].style.display = 'none';
e.style.display = 'inline';
e.style.fontSize = i + 'px';
if (e.firstChild && e.firstChild.nodeName === 'IMG') {
document.body.style.backgroundImage = 'url(' + e.firstChild.src + ')';
e.firstChild.style.display = 'none';
if ('classList' in e) e.classList.add('imageText');
} else {
document.body.style.backgroundImage = '';
document.body.style.backgroundColor = e.style.backgroundColor;
}
if (ti !== undefined) window.clearInterval(ti);
t = parseInt(e.dataset.timeToNext || 0, 10);
if (t > 0) ti = window.setTimeout(fwd, (t * 1000));
while (
e.offsetWidth > window.innerWidth ||
e.offsetHeight > window.innerHeight) {
e.style.fontSize = (i -= 2) + 'px';
if (i < 0) break;
}
e.style.marginTop = ((window.innerHeight - e.offsetHeight) / 2) + 'px';
if (window.location.hash !== n) window.location.hash = n;
document.title = e.textContent || e.innerText;
}
document.onclick = function() { go(++cur % (s.length)); };
function fwd() { go(Math.min(s.length - 1, ++cur)); }
function rev() { go(Math.max(0, --cur)); }
document.onkeydown = function(e) {
if (e.which === 39 || e.which === 34 || e.which === 40) fwd();
if (e.which === 37 || e.which === 33 || e.which === 38) rev();
};
document.ontouchstart = function(e) {
var x0 = e.changedTouches[0].pageX;
document.ontouchend = function(e) {
var x1 = e.changedTouches[0].pageX;
if (x1 - x0 < 0) fwd();
if (x1 - x0 > 0) rev();
};
};
function parse_hash() {
return Math.max(Math.min(
s.length - 1,
parseInt(window.location.hash.substring(1), 10)), 0);
}
if (window.location.hash) cur = parse_hash() || cur;
window.onhashchange = function() {
var c = parse_hash();
if (c !== cur) go(c);
};
go(cur);
};
</script></head><body>
<div><h1 id="framing-the-problem">Framing the Problem</h1>
</div>
<div><p>When the Supreme Court (SCOTUS) decides a case, there is often an immediate need for the publication of the opinion.</p>
</div>
<div><p><img src="http://cdn.pjmedia.com/tatler/files/2012/06/CNNstruckdown.jpg" alt=""></p>
</div>
<div><p>So, the Supreme Court publishes &quot;slip opinions&quot; on its website.</p>
</div>
<div><p>This is a good thing!</p>
</div>
<div><p>But, earlier this year, Professor Richard Lazarus published an article explaining that...</p>
</div>
<div><p>Slip opinions are not FINAL.</p>
</div>
<div><p>SCOTUS makes changes to their opinions after first publication.</p>
</div>
<div><h2 id="so-what-">So what?</h2>
</div>
<div><p><img src="http://upload.wikimedia.org/wikipedia/commons/9/97/Antonin_Scalia%2C_SCOTUS_photo_portrait.jpg" alt=""></p>
</div>
<div><p>In May, Justice Scalia wrote a dissenting opinion in a case involving the EPA.</p>
</div>
<div><p>In the opinion, he incorrectly described an earlier case involving the EPA.</p>
</div>
<div><p>What made this error noteworthy is that Justice Scalia incorrectly described an earlier case in which he wrote the opinion.</p>
</div>
<div><p>And, true to Professor Lazarus&#39;s academic research, Justice Scalia changed his dissenting opinion.</p>
</div>
<div><p>So, the New York Times decided to find out what <em>other</em> changes have been made to opinions.</p>
</div>
<div><p>SCOTUS said &quot;No.&quot;</p>
</div>
<div><h1 id="enter-scotus-servo-">Enter <code>scotus-servo</code></h1>
</div>
<div><h2 id="observation-1">Observation #1</h2>
</div>
<div><p>There are easy ways to tell whether an opinion (in PDF) has changed.</p>
</div>
<div><p>E.g., SHA values</p>
</div>
<div><p>But, a SHA value would require downloading each file, each time...</p>
</div>
<div><p>That is expensive, and heavy.</p>
</div>
<div><p>Enter ETAGS.</p>
</div>
<div><p>ETAGS are generated by the server and reported in an HTTP Header Response.</p>
</div>
<div><p>If a file is different, the ETAG is different.</p>
</div>
<div><pre><code>function getHeaders (link, callback) {
request.head({method:&quot;GET&quot;, url:link}, function (e,r,b) {
try {
callback(link, r.headers.etag.split(&quot;:&quot;)[0].replace(&#39;&quot;&#39;,&quot;&quot;))
}
catch (err) {
console.log([link, err]);
callback(link, r.headers.etag)
}
})
}
</code></pre></div>
<div><p>So, <code>scotus-servo</code> does a <code>HEAD</code> request for each link to:</p>
<ol>
<li>Get the ETAG</li>
<li>Check whether a flat JSON file contains the ETAG </li>
<li>If new, download the file.</li>
</ol>
</div>
<div><h2 id="observation-2">Observation 2</h2>
</div>
<div><p><code>Git</code> makes the next step easy.</p>
</div>
<div><p>When a file is downloaded, <code>git</code> automatically checks the SHA for the file.</p>
</div>
<div><p>Running <code>git status</code> on the file will show whether it&#39;s been changed or not.</p>
</div>
<div><pre><code>function gitTweet (link, op, fname, callback) {
var repository = git.open(__dirname) //Open the repository
var statusObj = _.pairs(repository.getStatus()); // Get array of [file, status] in the repository.
tweet(link, fname, repository.getStatus()[fname], op)
child_process.exec(&#39;git add &#39; + fname, function (err, stdout, stderr) {
callback()
})
}
</code></pre></div>
<div><p>Side discovery: ETAGS give false positives. Sometimes the same file will have different ETAGS.</p>
</div>
<div><h2 id="observation-3">Observation 3</h2>
</div>
<div><p>Once a file has changed, need to visualize the change.</p>
</div>
<div><p>This is the part that still needs work.</p>
</div>
<div><p>@joshdata built a library called <code>pdf-diff</code></p>
</div>
<div><p>It gives you beautiful diffs of PDFs</p>
</div>
<div><p><img src="https://pbs.twimg.com/media/Brel_9rIEAA1LtQ.png" alt=""></p>
</div>
<div><p>But even though @joshdata started it in Node, he ended up doing it in Python.</p>
</div>
<div><p>So, it&#39;s functional, but requires human intervention.</p>
</div>
<div><h2 id="observation-4">Observation 4</h2>
</div>
<div><p>You need a way to communicate/syndicate.</p>
</div>
<div><p>Twitter</p>
</div>
<div><p>using <code>twit</code> module, it is trivial to tweet when <code>git status</code> shows a change.</p>
</div>
<div><h2 id="observation-5">Observation 5</h2>
</div>
<div><p>Using <code>cron</code>, I run <code>app.js</code> every 5 minutes. </p>
</div>
<div><p>And it works.</p>
</div>
<div><p>And, because it&#39;s node: It&#39;s Fast!</p>
</div>
<div><h1 id="also">Also</h1>
</div>
<div><p>Decided to build a module for similar projects</p>
</div>
<div><p>Check out <code>servojs</code> on npm.</p>
</div>
<div><p><a href="http://github.com/vzvenyach/servo">http://github.com/vzvenyach/servo</a></p>
</div>
<div><h1 id="thanks-">Thanks!</h1>
</div>

Framing the Problem


When the Supreme Court (SCOTUS) decides a case, there is often an immediate need for the publication of the opinion.



So, the Supreme Court publishes "slip opinions" on its website.


This is a good thing!


But, earlier this year, Professor Richard Lazarus published an article explaining that...


Slip opinions are not FINAL.


SCOTUS makes changes to their opinions after first publication.


So what?



In May, Justice Scalia wrote a dissenting opinion in a case involving the EPA.


In the opinion, he incorrectly described an earlier case involving the EPA.


What made this error noteworthy is that Justice Scalia incorrectly described an earlier case in which he wrote the opinion.


And, true to Professor Lazarus's academic research, Justice Scalia changed his dissenting opinion.


So, the New York Times decided to find out what other changes have been made to opinions.


SCOTUS said "No."


Enter scotus-servo


Observation #1


There are easy ways to tell whether an opinion (in PDF) has changed.


E.g., SHA values


But, a SHA value would require downloading each file, each time...


That is expensive, and heavy.


Enter ETAGS.


ETAGS are generated by the server and reported in an HTTP Header Response.


If a file is different, the ETAG is different.


function getHeaders (link, callback) {
	request.head({method:"GET", url:link}, function (e,r,b) {
		try {
			callback(link, r.headers.etag.split(":")[0].replace('"',""))
		}
		catch (err) {
			console.log([link, err]);
			callback(link, r.headers.etag)
		}
	})
}

So, scotus-servo does a HEAD request for each link to:

  1. Get the ETAG
  2. Check whether a flat JSON file contains the ETAG
  3. If new, download the file.

Observation 2


Git makes the next step easy.


When a file is downloaded, git automatically checks the SHA for the file.


Running git status on the file will show whether it's been changed or not.


function gitTweet (link, op, fname, callback) {
	var repository = git.open(__dirname)	//Open the repository
	var statusObj = _.pairs(repository.getStatus());	// Get array of [file, status] in the repository.
	tweet(link, fname, repository.getStatus()[fname], op)
	child_process.exec('git add ' + fname, function (err, stdout, stderr) {	
		callback()
	})
}

Side discovery: ETAGS give false positives. Sometimes the same file will have different ETAGS.


Observation 3


Once a file has changed, need to visualize the change.


This is the part that still needs work.


@joshdata built a library called pdf-diff


It gives you beautiful diffs of PDFs



But even though @joshdata started it in Node, he ended up doing it in Python.


So, it's functional, but requires human intervention.


Observation 4


You need a way to communicate/syndicate.


Twitter


using twit module, it is trivial to tweet when git status shows a change.


Observation 5


Using cron, I run app.js every 5 minutes.


And it works.


And, because it's node: It's Fast!


Also


Decided to build a module for similar projects


Check out servojs on npm.


http://github.com/vzvenyach/servo


Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment