As we were talking about FASTQ parsing on Twitter, I dig out an old experiement and redid it with the latest kseq.h and SeqAn-1.4.2. The task is simple: to count the number of bases in a FASTQ files containing 2 million 100bp short reads. The file is put on the RAM disk. The following gives timing for a few programs/settings:
User time (s) | Sys time (s) | Command line |
---|---|---|
0.98 | 0.16 | kseq-len /dev/shm/tmp.fq |
4.24 | 0.11 | kseq-len /dev/shm/tmp.fq.gz # gzip'd |
5.53 | 0.13 | seqtk fqchk /dev/shm/tmp.fq.gz # more than counting |
12.03 | 0.12 | seqan2-len /dev/shm/tmp.fq |
14.11 | 0.28 | seqan-len /dev/shm/tmp.fq |