Skip to content

Instantly share code, notes, and snippets.

@gerdus
Last active December 17, 2015 04:49
Show Gist options
  • Save gerdus/5553616 to your computer and use it in GitHub Desktop.
Save gerdus/5553616 to your computer and use it in GitHub Desktop.
PyPy optimized version for http://saml.rilspace.org/gc-content-continued-python-vs-d-vs-c-vs-free-pascal optimizations: reading file in binary mode and readlines with size hint
import timeit
tfunc = timeit.default_timer
def main():
file = open("Homo_sapiens.GRCh37.61.dna_rm.chromosome.Y.fa","rb")
gcCount = 0
totalBaseCount = 0
while 1:
lines = file.readlines(10000)
if not lines:
break
for line in lines:
if line and not line[0] == ">":
for c in line:
if c == 'C' or c == 'T' or c == 'G' or c == 'A':
totalBaseCount += 1
if c == 'G' or c == 'C':
gcCount += 1
gcFraction = float(gcCount) / float(totalBaseCount)
return gcFraction
if __name__ == '__main__':
t0 = tfunc()
gcFraction = main()
elapsed = tfunc() - t0
print(gcFraction * 100)
print(elapsed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment