A quick and dirty benchmark of parsing performance on a 1KG complete genotypes file.
To repeat, have virtualenv installed and run the
run.sh
script. Plot the results with %run plot.ipy
from IPython.
The following libraries are benchmarked:
- CyVCF
- PyVCF, with/without Cython
- PyVCF with lazy computation of call stats (proof of concept), with/without Cython
Three modes are benchmarked:
- Only parsing
- Parsing and getting some call stats (num_hom_ref, num_het, num_hom_alt)
- Parsing and getting all call stats (+ num_called, call_rate, num_unknown)
See the results.csv
file for results.