Skip to content

Instantly share code, notes, and snippets.

@ZachOrr
Forked from willf/inverted_index.scala
Created January 24, 2014 03:17
Show Gist options
  • Save ZachOrr/8591509 to your computer and use it in GitHub Desktop.
Save ZachOrr/8591509 to your computer and use it in GitHub Desktop.
/**
* From a file that contains
* doc_id w1 w2 w3 ... lines, separated by tabs
* return an inverted index Map of w -> Set(doc_id)
*
* @param filename well isn't it obvious
* @return Map[String,Set[String]]
*/
import scala.collection.immutable.Map
def invertedIndex(filename:String) = {
io.Source.fromFile(filename).getLines. // this is an iterator over lines
map(_.split("\t")). // split at tabs
filter(_.size > 0). // make sure there is at least one item
map(x => x.drop(1).map(y => (y,x(0)) )). // get inverted pairs for all lines
toList. // ? required but i'm not sure why...
flatMap(x => x). // flatten to pairs -- you could filter on these
groupBy(_._1). // group by the first key
map(p => (p._1,p._2.map(_._2).toSet)) // map over groups values, turning 2nd value into sets
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment