Skip to content

Instantly share code, notes, and snippets.

@adamrp
Created June 6, 2014 17:33
Show Gist options
  • Save adamrp/309966469128b495f23c to your computer and use it in GitHub Desktop.
Save adamrp/309966469128b495f23c to your computer and use it in GitHub Desktop.
Calculates the GC content across all sequences in an input fasta file
#!/usr/bin/env python
from sys import argv
from skbio.parse.sequences.fasta import parse_fasta
def calculate_gc_content(input_fasta):
gc = 0
total_length = 0
for _, seq in parse_fasta(input_fasta):
seq = seq.lower()
total_length += len(seq)
gc += seq.count('g')
gc += seq.count('c')
return 1.0 * gc / total_length
if __name__ == '__main__':
with open(argv[1], 'U') as input_fasta:
gc_content = calculate_gc_content(input_fasta)
print 'The total fraction of G/C in the input FASTA file is:', gc_content
print 'Note that this calculation not take into account degeneracies.'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment