Skip to content

Instantly share code, notes, and snippets.

@pjazdzewski1990
Created February 4, 2016 07:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pjazdzewski1990/1f68813ea7d35808eaf5 to your computer and use it in GitHub Desktop.
Save pjazdzewski1990/1f68813ea7d35808eaf5 to your computer and use it in GitHub Desktop.
private def cleanData(spark: SparkContext, file: String) = {
spark.textFile(file).filter { line =>
line.contains("Author") || (line.startsWith(" ") && line.trim.length > 0)
}.map(_.trim.toLowerCase.replaceAll("\"", "").replaceAll("\'", ""))
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment