When the Supreme Court (SCOTUS) decides a case, there is often an immediate need for the publication of the opinion.
So, the Supreme Court publishes "slip opinions" on its website.
This is a good thing!
But, earlier this year, Professor Richard Lazarus published an article explaining that...
Slip opinions are not FINAL.
SCOTUS makes changes to their opinions after first publication.
In May, Justice Scalia wrote a dissenting opinion in a case involving the EPA.
In the opinion, he incorrectly described an earlier case involving the EPA.
What made this error noteworthy is that Justice Scalia incorrectly described an earlier case in which he wrote the opinion.
And, true to Professor Lazarus's academic research, Justice Scalia changed his dissenting opinion.
So, the New York Times decided to find out what other changes have been made to opinions.
SCOTUS said "No."
There are easy ways to tell whether an opinion (in PDF) has changed.
E.g., SHA values
But, a SHA value would require downloading each file, each time...
That is expensive, and heavy.
Enter ETAGS.
ETAGS are generated by the server and reported in an HTTP Header Response.
If a file is different, the ETAG is different.
function getHeaders (link, callback) {
request.head({method:"GET", url:link}, function (e,r,b) {
try {
callback(link, r.headers.etag.split(":")[0].replace('"',""))
}
catch (err) {
console.log([link, err]);
callback(link, r.headers.etag)
}
})
}
So, scotus-servo
does a HEAD
request for each link to:
- Get the ETAG
- Check whether a flat JSON file contains the ETAG
- If new, download the file.
Git
makes the next step easy.
When a file is downloaded, git
automatically checks the SHA for the file.
Running git status
on the file will show whether it's been changed or not.
function gitTweet (link, op, fname, callback) {
var repository = git.open(__dirname) //Open the repository
var statusObj = _.pairs(repository.getStatus()); // Get array of [file, status] in the repository.
tweet(link, fname, repository.getStatus()[fname], op)
child_process.exec('git add ' + fname, function (err, stdout, stderr) {
callback()
})
}
Side discovery: ETAGS give false positives. Sometimes the same file will have different ETAGS.
Once a file has changed, need to visualize the change.
This is the part that still needs work.
@joshdata built a library called pdf-diff
It gives you beautiful diffs of PDFs
But even though @joshdata started it in Node, he ended up doing it in Python.
So, it's functional, but requires human intervention.
You need a way to communicate/syndicate.
using twit
module, it is trivial to tweet when git status
shows a change.
Using cron
, I run app.js
every 5 minutes.
And it works.
And, because it's node: It's Fast!
Decided to build a module for similar projects
Check out servojs
on npm.
http://github.com/vzvenyach/servo