These SVG's represent profiling runs of the example shown in hypothesis#914
with hypothesis' new use_coverage feature turned off and on, respectively. The "nocov" case ran in 0.058s while "yescov" ran in 0.262 seconds,
a more than 5x slowdown. In my own work, I have a non-trivial test that slows down from 13s to >50s, a ~4x slowdown, and it follows the same
profile seen below, but with many more calls to lstat()
.
You can see (you'll want to right-click and "open image in new tab") that the largest contributor
to the difference between the scenarios is the lstat()
call.
This is called by coverage.Collector.save_data()
where it normalizes the paths of all files involved in
the coverage trace. save_data()
is called by hypothesis between each example in order to pull the coverage
data out of the collector object.
I can think of these solution alternatives: (can you think of others?)
- monkeypatch coverage library to either cache
realpath()
or even skip it altogether. - send a caching patch upstream to the coverage library.
- patch coverage library to delegate its realpath logic to the data object, which we can specialize for hypothesis.
- avoid calling Collector.save_data, somehow. Is there another api we can use to retrieve the coverage data?