Created
October 7, 2014 15:02
-
-
Save danielparton/a7b83c85bc7e06dc5189 to your computer and use it in GitHub Desktop.
Retrieve UniProt function (removes namespace stuff)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def retrieve_uniprot(search_string, maxreadlength=100000000): | |
''' | |
Searches the UniProt database given a search string, and retrieves an XML | |
file, which is returned as a string. | |
maxreadlength is the maximum size in bytes which will be read from the website | |
(default 100MB) | |
Example search string: 'domain:"Protein kinase" AND reviewed:yes' | |
The function also removes the xmlns attribute from <uniprot> tag, as this | |
makes xpath searching annoying | |
''' | |
import msmseeder.core | |
base_url = 'http://www.uniprot.org/uniprot/?query=' | |
search_string_encoded = msmseeder.core.encode_url_query(search_string.replace('=', ':')) | |
query_url = base_url + search_string_encoded + '&format=xml' | |
response = urllib2.urlopen(query_url) | |
page = response.read(maxreadlength) | |
page = page.replace('xmlns="http://uniprot.org/uniprot" ', '', 1) | |
return page |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment