-
-
Save pomu0325/1168507 to your computer and use it in GitHub Desktop.
def time(fn: => Unit) = { | |
val start = System.currentTimeMillis | |
fn | |
println("%d ms".format(System.currentTimeMillis - start)) | |
} | |
import dispatch._ | |
import json._ | |
import JsHttp._ | |
time { Http(url("http://search.twitter.com/search.json?q=dispatch") >- {s => Js(s)}) } // 400-500ms NORMAL | |
time { Http(url("http://search.twitter.com/search.json?q=dispatch") >- {s => Js(s)}) } // 400-500ms STILL NORMAL | |
time { Http(url("http://search.twitter.com/search.json?q=dispatch") ># {s => s}) } // 16000 ms ># TOO SLOW ! | |
time { Http(url("http://search.twitter.com/search.json?q=dispatch") >- {s => Js(s)}) } // 16000 ms >- ALSO TOO SLOW! | |
time { Http(url("http://search.twitter.com/search.json?q=dispatch") >- {s => Js(s)}) } // 500 ms >- BACK TO NORMAL | |
time { Http(url("http://search.twitter.com/search.json?q=dispatch") ># {s => s}) } // 16000-17000 ms ># SLOW AGAIN | |
import java.io._ | |
val js = """{"A":"a","B":[1,2],"C":"c"}""" | |
time { Js(js) } // 0-1ms | |
time { Js(new ByteArrayInputStream(js.getBytes)) } // 50-60 ms |
i found out not only >#, >- sometime becomes very slow.
it seems that using JsonParser(CharArrayReader) and JsonParser(StreamReader) alternately causes parsing to be slow.
what i've found so far:
- at the constructor of scala.util.parsing.combinator.Parsers#NoSuccess, scala.util.parsing.input.Position is compared by Position.< method
https://github.com/scala/scala/blob/master/src/library/scala/util/parsing/combinator/Parsers.scala#L127 - StreamReader.pos returns unnamed instance of Position https://github.com/scala/scala/blob/master/src/library/scala/util/parsing/input/StreamReader.scala#L68
- CharArrayReader.pos returns an instance of OffsetPosition https://github.com/scala/scala/blob/master/src/library/scala/util/parsing/input/CharSequenceReader.scala#L53
- OffsetPosition < OffsetPosition is efficient, but OffsetPosition < Position (and vice versa) uses Position.line which scans all characters in source stream
https://github.com/scala/scala/blob/master/src/library/scala/util/parsing/input/OffsetPosition.scala#L27 - thread dump during very slow parsing
- JsonParser(CharArrayReader) then JsonParser(StreamReader)
"Thread-107" daemon prio=5 tid=00000000032ba400 nid=0xb172f000 runnable [00000000b1729000] java.lang.Thread.State: RUNNABLE at scala.util.parsing.input.OffsetPosition.index(OffsetPosition.scala:27) at scala.util.parsing.input.OffsetPosition.line(OffsetPosition.scala:36) // WONDER WHY OffsetPosition instance remains here at scala.util.parsing.input.Position$class.$less(Position.scala:66) at scala.util.parsing.input.StreamReader$$anon$1.$less(StreamReader.scala:69) at scala.util.parsing.combinator.Parsers$NoSuccess.(Parsers.scala:132)
- JsonParser(StreamReader) then JsonParser(CharArrayReader)
"Thread-113" daemon prio=5 tid=000000000319bc00 nid=0xb172f000 runnable [00000000b172a000] java.lang.Thread.State: RUNNABLE at scala.util.parsing.input.OffsetPosition.index(OffsetPosition.scala:27) at scala.util.parsing.input.OffsetPosition.line(OffsetPosition.scala:36) // if parameter is also OffsetPosition, line should not be called at scala.util.parsing.input.OffsetPosition.$less(OffsetPosition.scala:70) at scala.util.parsing.combinator.Parsers$NoSuccess.(Parsers.scala:132)
- i wonder why this mixture of OffsetPosition and Position happens... its holder "lastNoSuccess variable" should be initialized to null everytime "phrase" is called..
https://github.com/scala/scala/blob/master/src/library/scala/util/parsing/combinator/Parsers.scala#L744
eclipse debug screenshot while running slow JsonParser(StreamReader) after JsonParser(CharArrayReader)
https://picasaweb.google.com/lh/photo/7XFKrIe0pRZgcKZqSjX0wg?feat=directlink
- lastNoSuccess is created from CharSequenceReader at offset 86869
- current StreamReader is still at offset 12504 which is smaller than lastNoSuccess
- lastNoSuccess was created at the previous call of JsonParser(CharArrayReader) ?
i got it!
there are two "lastNoSuccess" in different "trait Parsers" involved here.
first, look at class hierarchy:
trait Scanners extends Parsers ↑ abstract class Lexical extends Scanners with Tokens ↑ class StdLexical extends Lexical with StdTokens ↑ class Lexer extends StdLexical with ImplicitConversions trait TokenParsers extends Parsers ↑ trait StdTokenParsers extends TokenParsers ↑ object JsonParser extends StdTokenParsers with ImplicitConversions
both dispatch.json.JsonParser and scala.util.parsing.json.Lexer (which has type aliased as Tokens) extends trait scala.util.parsing.combinator.Parsers.
thus, both JsonParser and Lexer has "var lastNoSuccess"
JsonParser#lastNoSuccess is updated to null inside JsonParser#phrase, but Lexer#lastNoSuccess is NOT.
thus, lastNoSuccess created at previous parse (if it has larger offset) remains still...
I pulled that commit from your branch into master. Thanks for the extensive debugging!
above patch works in single thread, but might not in multi thread...
Hm... glancing through the inherited traits they don't strike me as thread safe either. It might be better to move the lexical
entirely into the apply
function so that we have one instance per application.
I should also mention that this JSON parser was not written by me and as you can tell I'm not deeply familiar with its internals. The spray-json library is pretty similar to it and I've been considering deprecating it in favor of binding to spray. Any thoughts on that?
trait scala.util.parsing.combinator.Parsers itself is not thread safe due to "var lastNoSuccess".
it is also mentioned here: http://scala-programming-language.1934581.n4.nabble.com/Scala-Parsers-are-not-thread-safe-td2243477.html
scala.util.parsing.json.JSON has similar implementation (having 2 subclass of Parsers) as dispatch.json's, but it only accept CharArrayReader, so performance issue does not happen.
note scala.util.parsing.json.JSON never returns error detail, so overwriting "var lastNoSuccess" by other thread technically seems to be not a problem.
if you give away the memory usage and change JsonParser.apply(Reader[Char])
to JsonParser.apply(CharSequenceReader)
, I think it's technically ok (in terms of performance issue).
about spray-json: +1 for it. I've never used spray-json, but I can see it's pretty similar at a glance :)
one thing I worry about is that sjson (and maybe some other library I don't know of) has a dependency to dispatch-json and change of binding breaks the dependency...
seems working: pomu0325/Databinder-Dispatch@66cb919
tests: pomu0325/Databinder-Dispatch@e87a286
Yeah, this is pretty strange, thanks for looking into it. Dispatch is still ultimately using an inputstream either way, just with >- it has passed through a scala.io.Source.