Skip to content

Instantly share code, notes, and snippets.

@jmberros
Last active August 25, 2023 06:13
Show Gist options
  • Save jmberros/fdcdc97d101c77d873e52a3a962035b5 to your computer and use it in GitHub Desktop.
Save jmberros/fdcdc97d101c77d873e52a3a962035b5 to your computer and use it in GitHub Desktop.
Get the country field for the given nuccore accession numbers.
from Bio import Entrez
# Read the accessions from a file
accessions_file = 'accessions.txt'
with open(accessions_file) as f:
ids = f.read().split('\n')
# Fetch the entries from Entrez
Entrez.email = 'name@example.org' # Insert your email here
handle = Entrez.efetch('nuccore', id=ids, retmode='xml')
response = Entrez.read(handle)
# Parse the entries to get the country
def extract_countries(entry):
sources = [feature for feature in entry['GBSeq_feature-table']
if feature['GBFeature_key'] == 'source']
for source in sources:
qualifiers = [qual for qual in source['GBFeature_quals']
if qual['GBQualifier_name'] == 'country']
for qualifier in qualifiers:
yield qualifier['GBQualifier_value']
for entry in response:
accession = entry['GBSeq_primary-accession']
for country in extract_countries(entry):
print(accession, country, sep=',')
@jmberros
Copy link
Author

@JackCrook If I understood correctly, you just need to add a print(accession) before the for country in ... line

@AnyaKovalenko
Copy link

super cool script! Thank you so much!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment