Skip to content

Instantly share code, notes, and snippets.

@jasonwilliams
Last active July 29, 2019 02:32
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jasonwilliams/5325da61a794d8211dcab846d466c4fd to your computer and use it in GitHub Desktop.
Save jasonwilliams/5325da61a794d8211dcab846d466c4fd to your computer and use it in GitHub Desktop.
Response from Brook Heisler about Boa's benchmarking

Hey, thanks for trying Criterion.rs!

My usual guess when I see this - a trivial change to source code producing a non-trivial change to performance - is that the compiler is simply optimizing something differently. It's more common than you might think; modern optimizing compilers like LLVM are extremely complex beasts full of mostly hand-rolled pattern matching optimization code, which can be very finicky. It's not all that uncommon for an apparently trivial change to the source code to cause some of these pattern matchers to miss optimizations that previously they were performing, and thus produce slower code. (If you want to know more about this, I like this paper by Regehr et al: https://blog.regehr.org/archives/1619). This is especially true in very small benchmarks, where an extra instruction or two can produce enough of a difference for Criterion.rs to detect.

In this particular case, though, there is something more going on. If you take a look at the source code for the Option type (https://doc.rust-lang.org/src/core/option.rs.html), you can see that in unwrap, the panic! macro is invoked directly inside the unwrap function, but for expect it calls out to a separate function called expect_failed that contains the panic! code. That function is marked inline(never). That is actually a pretty substantial change as far as LLVM is concerned; unwrap contains whatever the panic! macro expands to, while expect contains a static function call. I'm not surprised that it would optimize these two functions differently. I don't think you should take away the idea that expect is slow, though; Criterion.rs is sensitive enough to pick up on very small differences, but it's important to keep in mind that just because something is statistically significant doesn't mean that it's practically significant.

For what it's worth, your use of black_box looks fine to me. In typical usage of your library, the Javascript code being executed would not be a compile-time constant, it would be runtime input (eg. downloaded from a web server). That's inconvenient to do in a benchmark, so we use black_box to make the optimizer act as if a compile-time constant were actually runtime input. In your parser benchmark, you might want to move the lexing of the source code out of the benchmark though, so that you're only measuring your parser and not your lexer as well. One other thing is that you might want to return values from your benchmarks; for example, your lexer benchmark drops the lexed tokens instead of returning them. That means that the optimizer knows that the tokens will not be used; Criterion.rs passes values returned by the benchmark through black_box to prevent exactly that. Again, in production usage the tokens wouldn't be dropped instantly, so black_boxing the returned tokens gives a better approximation of real usage. You may also want to look at using iter_batched so that the time for dropping the collection of tokens isn't included in the measurement.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment