Skip to content

Instantly share code, notes, and snippets.

@jmberros
Last active August 25, 2023 06:13
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jmberros/fdcdc97d101c77d873e52a3a962035b5 to your computer and use it in GitHub Desktop.
Save jmberros/fdcdc97d101c77d873e52a3a962035b5 to your computer and use it in GitHub Desktop.
Get the country field for the given nuccore accession numbers.
from Bio import Entrez
# Read the accessions from a file
accessions_file = 'accessions.txt'
with open(accessions_file) as f:
ids = f.read().split('\n')
# Fetch the entries from Entrez
Entrez.email = 'name@example.org' # Insert your email here
handle = Entrez.efetch('nuccore', id=ids, retmode='xml')
response = Entrez.read(handle)
# Parse the entries to get the country
def extract_countries(entry):
sources = [feature for feature in entry['GBSeq_feature-table']
if feature['GBFeature_key'] == 'source']
for source in sources:
qualifiers = [qual for qual in source['GBFeature_quals']
if qual['GBQualifier_name'] == 'country']
for qualifier in qualifiers:
yield qualifier['GBQualifier_value']
for entry in response:
accession = entry['GBSeq_primary-accession']
for country in extract_countries(entry):
print(accession, country, sep=',')
@Enrique-SC
Copy link

Hola,
He tratado de usar tu script, meto mi archivo con sólo los números de acceso como mencionas en el foro, pero al correrlo me sale un error de sintaxis en la ultima linea: "print(accession, country, sep=',')" . Sabes a qué se podría deber?
Muchas gracias
Saludos

@jmberros
Copy link
Author

Funciona en Python 3, el error en el print debe ser porque lo estás corriendo con Python 2.

@Enrique-SC
Copy link

Justo, era eso. Gracias :)

@JackCrook
Copy link

Hello, I have used parts of this code to extract the country information for my accessions. However, where there is no country entry it is just omitting the accession. How can I use this code / alter it so it includes the accessions without a country entry in the output?

Thanks

@jmberros
Copy link
Author

@JackCrook If I understood correctly, you just need to add a print(accession) before the for country in ... line

@AnyaKovalenko
Copy link

super cool script! Thank you so much!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment