Skip to content

Instantly share code, notes, and snippets.

@SamStudio8
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SamStudio8/aca1218aee4b7c607cfa to your computer and use it in GitHub Desktop.
Save SamStudio8/aca1218aee4b7c607cfa to your computer and use it in GitHub Desktop.
Benchmarking reads over a pair of 42GB FASTQ files (~1.5 billion lines)
# awk 3.1.7
# 1hr 1m
awk '$1 ~ /@/ {++c} END {print c}' $1
# 1hr 1m
awk '/^@/ {c++} END {print c}' $1
# 55m
awk '{if (substr($0,0,1) == "@") { ++c }} END {print c}' $1
# 9m 58s
LC_ALL=C awk '/^@/ {++c} END {print c}' $1
# python 2.6.6
# 15m
while line:
if line[0] == '@':
count += 1
line = fastq_1_fh.readline()
print count
@iiSeymour
Copy link

Why locale matters http://www.inmotionhosting.com/support/website/ssh/speed-up-grep-searches-with-lc-all

Just build from source and run from ~/bin or does the job not run on same node?

@SamStudio8
Copy link
Author

@iiSeymour Doesn't run on the same node, can't seem to request a particular node either.
Thanks for the link, I feel like that was something I should have already known!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment