Skip to content

Instantly share code, notes, and snippets.

@timvw
Created July 17, 2016 19:24
  • Star 10 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save timvw/4ec727de9b76d9afc51298d9e68c4241 to your computer and use it in GitHub Desktop.
scala wrapper for hadoop remote iterator
case class RemoteIteratorWrapper[T](underlying: org.apache.hadoop.fs.RemoteIterator[T]) extends scala.collection.AbstractIterator[T] with scala.collection.Iterator[T] {
def hasNext = underlying.hasNext
def next() = underlying.next()
}
object Conversions {
implicit def remoteIterator2ScalaIterator[T](underlying: org.apache.hadoop.fs.RemoteIterator[T]) : scala.collection.Iterator[T] = RemoteIteratorWrapper[T](underlying)
}
@skoky
Copy link

skoky commented Jan 18, 2017

How to use it?

@vsimko
Copy link

vsimko commented Aug 30, 2017

what about this:

import org.apache.hadoop.fs.RemoteIterator

/**
  * Converts RemoteIterator from Hadoop to Scala Iterator that provides all the familiar functions such as map,
  * filter, foreach, etc.
  *
  * @param underlying The RemoteIterator that needs to be wrapped
  * @tparam T Items inside the iterator
  * @return Standard Scala Iterator
  */
implicit def convertToScalaIterator[T](underlying: RemoteIterator[T]): Iterator[T] = {
  case class wrapper(underlying: RemoteIterator[T]) extends Iterator[T] {
    override def hasNext = underlying.hasNext

    override def next = underlying.next
  }
  wrapper(underlying)
}

@timvw
Copy link
Author

timvw commented Nov 20, 2017

Here is a usage example:

    val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
    val hdfsPath = new Path(basePath)
    fs.mkdirs(hdfsPath)
    
    import Conversions._
    fs.listFiles(hdfsPath, true).foreach(x => println(s"found file ${x.getPath.toUri}"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment