Skip to content

Instantly share code, notes, and snippets.

View crocker's full-sized avatar

Jason Crocker crocker

  • Clinetic
  • Raleigh, NC
View GitHub Profile
@marmbrus
marmbrus / gist:15e72f7bc22337cf6653
Created November 27, 2014 03:10
Parallel list files on S3 with Spark
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.conf.Configuration
case class S3File(path: String, isDir: Boolean, size: Long) {
def children = listFiles(path)
}
def listFiles(path: String): Seq[S3File] = {
val fs = FileSystem.get(new java.net.URI(path), new Configuration())
fs.listStatus(new Path(path)).map(s => S3File(s.getPath.toString, s.isDir, s.getLen))