Skip to content

Instantly share code, notes, and snippets.

@krishnanraman
Last active August 29, 2015 14:03
Show Gist options
  • Save krishnanraman/e80f83f8d98613fb195e to your computer and use it in GitHub Desktop.
Save krishnanraman/e80f83f8d98613fb195e to your computer and use it in GitHub Desktop.
$ scala lsh
From time to time this submerged or latent theater in becomes almost overt. It is close to the surface in Hamlet?s pretense of madness, the ?antic disposition? he puts on to protect himself and prevent his antagonists from plucking out the heart of his mystery. It is even closer to the surface when Hamlet enters his mother?s room and holds up, side by side, the pictures of the two kings, Old Hamlet and Claudius, and proceeds to describe for her the true nature of the choice she has made, presenting truth by means of a show. Similarly, when he leaps into the open grave at Ophelia?s funeral, ranting in high heroic terms, he is acting out for Laertes, and perhaps for himself as well, the folly of excessive, melodramatic expressions of grief.
is 13.52 % similar to
Almost all of Shakespeare?s Hamlet can be understood as a play about acting and the theater. For example, there is Hamlet?s pretense of madness, the ?antic disposition? that he puts on to protect himself and prevent his antagonists from plucking out the heart of his mystery. When Hamlet enters his mother?s room, he holds up, side by side, the pictures of the two kings, Old Hamlet and Claudius, and proceeds to describe for her the true nature of the choice she has made, presenting truth by means of a show. Similarly, when he leaps into the open grave at Ophelia?s funeral, ranting in high heroic terms, he is acting out for Laertes, and perhaps for himself as well, the folly of excessive, melodramatic expressions of grief.
import util.{Random => rnd}
object lsh extends App { //locality sensitive hashing
def nRandomIndices(n:Int, size:Int) = Seq.fill[Int](n)(rnd.nextInt(size)) // n random indices into a list with a given size
def kIndexSamples(k:Int, n:Int, size:Int) = Seq.fill[Seq[Int]](k)(nRandomIndices(n, size))
def hash[T](words:Seq[T], indices:Seq[Int]) = indices.map(words).mkString.hashCode
def jaccard(a:Set[Int], b:Set[Int]) = a.intersect(b).size.toDouble / a.union(b).size
def hashSeq( wordSize:Int ) = kIndexSamples(1000, 4, wordSize) // create 100 random sequences of size 3 each
def text2words(text:String) = text.split(" ").toSeq // split text into words
def lsh(words:Seq[String], hashSeq: Seq[Seq[Int]]) = hashSeq.map{ myseq => hash( words, myseq) }.toSet // a union of all the hashes
def compare(s:String, t:String) = {
val sWords = text2words(s)
val tWords = text2words(t)
val allWords = sWords.union(tWords).toSet.toSeq.sorted
val e = ""
val newS = allWords.map { w => if (sWords.contains(w)) w else e }
val newT = allWords.map { w => if (tWords.contains(w)) w else e }
val myseq = hashSeq(allWords.size)
jaccard(lsh(newS, myseq), lsh(newT, myseq))
}
// See http://www.princeton.edu/pr/pub/integrity/pages/plagiarism/
def a = "From time to time this submerged or latent theater in becomes almost overt. It is close to the surface in Hamlet’s pretense of madness, the “antic disposition” he puts on to protect himself and prevent his antagonists from plucking out the heart of his mystery. It is even closer to the surface when Hamlet enters his mother’s room and holds up, side by side, the pictures of the two kings, Old Hamlet and Claudius, and proceeds to describe for her the true nature of the choice she has made, presenting truth by means of a show. Similarly, when he leaps into the open grave at Ophelia’s funeral, ranting in high heroic terms, he is acting out for Laertes, and perhaps for himself as well, the folly of excessive, melodramatic expressions of grief."
def b = "Almost all of Shakespeare’s Hamlet can be understood as a play about acting and the theater. For example, there is Hamlet’s pretense of madness, the “antic disposition” that he puts on to protect himself and prevent his antagonists from plucking out the heart of his mystery. When Hamlet enters his mother’s room, he holds up, side by side, the pictures of the two kings, Old Hamlet and Claudius, and proceeds to describe for her the true nature of the choice she has made, presenting truth by means of a show. Similarly, when he leaps into the open grave at Ophelia’s funeral, ranting in high heroic terms, he is acting out for Laertes, and perhaps for himself as well, the folly of excessive, melodramatic expressions of grief."
def c = "Almost all of Shakespeare’s Hamlet can be understood as a play about acting and the theater. For example, in Act 1, Hamlet adopts a pretense of madness that he uses to protect himself and prevent his antagonists from discovering his mission to revenge his father’s murder. He also presents truth by means of a show when he compares the portraits of Gertrude’s two husbands in order to describe for her the true nature of the choice she has made. And when he leaps in Ophelia’s open grave ranting in high heroic terms, Hamlet is acting out the folly of excessive, melodramatic expressions of grief."
def d = "Almost all of Shakespeare’s Hamlet can be understood as a play about acting and the theater. For example, in Act 1, Hamlet pretends to be insane in order to make sure his enemies do not discover his mission to revenge his father’s murder. The theme is even more obvious when Hamlet compares the pictures of his mother’s two husbands to show her what a bad choice she has made, using their images to reveal the truth. Also, when he jumps into Ophelia’s grave, hurling his challenge to Laertes, Hamlet demonstrates the foolishness of exaggerated expressions of emotion."
Seq(a,b,c,c,d,d).combinations(2).toSeq.foreach {
seq => println( "%s \n\n is %.2f % similar to \n\n %s".format( seq.head, 100.0 * compare(seq.head, seq.last), seq.last))
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment