Skip to content

Instantly share code, notes, and snippets.

@sadache
Created June 15, 2012 23:37
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save sadache/2939230 to your computer and use it in GitHub Desktop.
Save sadache/2939230 to your computer and use it in GitHub Desktop.
Parsing progressively a csv like file with Play2 and Iteratees

If your csv doesn't contain escaped newlines then it is pretty easy to do a progressive parsing without putting the whole file into memory. The iteratee library comes with a method search inside play.api.libs.iteratee.Parsing :

def search (needle: Array[Byte]): Enumeratee[Array[Byte], MatchInfo[Array[Byte]]]

which will partition your stream into Matched[Array[Byte]] and Unmatched[Array[Byte]]

Then you can combine a first iteratee that takes a header and another that will fold into the umatched results. This should look like the following code:

// break at each match and concat unmatches and drop the last received element (the match)
val concatLine: Iteratee[Parsing.MatchInfo[Array[Byte]],String] = 
  ( Enumeratee.breakE[Parsing.MatchInfo[Array[Byte]]](_.isMatch) ><>
    Enumeratee.collect{ case Parsing.Unmatched(bytes) => new String(bytes)} &>>
    Iteratee.consume() ).flatMap(r => Iteratee.head.map(_ => r))

// group chunks using the above iteratee and do simple csv parsing
val csvParser: Iteratee[Array[Byte], List[List[String]]] =
  Parsing.search("\n".getBytes) ><>
  Enumeratee.grouped( concatLine ) ><>
  Enumeratee.map(_.split(',').toList) &>>
  Iteratee.head.flatMap( header => Iteratee.getChunks.map(header.toList ++ _) )

// an example of a chunked simple csv file
val chunkedCsv: Enumerator[Array[Byte]] = Enumerator("""a,b,c
""","1,2,3","""
4,5,6
7,8,""","""9
""") &> Enumeratee.map(_.getBytes)

// get the result
val csvPromise: Promise[List[List[String]]] = chunkedCsv |>>> csvParser

// eventually returns List(List(a, b, c),List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))

Of course you can improve the parsing. If you do, I would appreciate if you share it with the community.

So your Play2 controller would be something like:

val requestCsvBodyParser = BodyParser(rh => csvParser.map(Right(_)))

// progressively parse the big uploaded csv like file
def postCsv = Action(requestCsvBodyParser){ rq: Request[List[List[String]]] => 
  //do something with data
}
@fcroiseaux
Copy link

When I try this code.

"...Iteratee.head..."

doesn't compile with Play 2.0.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment