Skip to content

Instantly share code, notes, and snippets.

@JoshRosen
Created October 30, 2015 20:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JoshRosen/3340bbd893ae48a6526b to your computer and use it in GitHub Desktop.
Save JoshRosen/3340bbd893ae48a6526b to your computer and use it in GitHub Desktop.
Caching RDD[String] via dataFrames (more efficient if file is highly compressible via dictionary encoding).
val fileToRead = "/path/to/my/file"
val df = sc.textFile(fileToRead).map(l => Tuple1(l)).toDF("line").cache
val rdd: RDD[String] = df.rdd.map(_.getString(0))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment