-
-
Save SamStudio8/aca1218aee4b7c607cfa to your computer and use it in GitHub Desktop.
Benchmarking reads over a pair of 42GB FASTQ files (~1.5 billion lines)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# awk 3.1.7 | |
# 1hr 1m | |
awk '$1 ~ /@/ {++c} END {print c}' $1 | |
# 1hr 1m | |
awk '/^@/ {c++} END {print c}' $1 | |
# 55m | |
awk '{if (substr($0,0,1) == "@") { ++c }} END {print c}' $1 | |
# 9m 58s | |
LC_ALL=C awk '/^@/ {++c} END {print c}' $1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# python 2.6.6 | |
# 15m | |
while line: | |
if line[0] == '@': | |
count += 1 | |
line = fastq_1_fh.readline() | |
print count |
Why locale matters http://www.inmotionhosting.com/support/website/ssh/speed-up-grep-searches-with-lc-all
Just build from source and run from ~/bin
or does the job not run on same node?
@iiSeymour Doesn't run on the same node, can't seem to request a particular node either.
Thanks for the link, I feel like that was something I should have already known!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'll see if I can get the version of
awk
on the server updated (mawk
doesn't appear to be installed), or whether it can otherwise be made available. The server is a cluster at the university so is presumably on some LTS.