Skip to content

Instantly share code, notes, and snippets.

@chapmanb
Created October 26, 2010 12:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chapmanb/646809 to your computer and use it in GitHub Desktop.
Save chapmanb/646809 to your computer and use it in GitHub Desktop.
(comment "
Retrieve UniProt FASTA files for provided input identifiers using REST API.
http://www.uniprot.org/faq/28
Usage:
cljr run fetch_uniprot_fasta.clj [Any number of Uniprot IDs]
")
(ns biostar.uniprot
(:import [java.net URLEncoder])
(:use [clojure.contrib.duck-streams :only (with-out-writer)])
(:require [clojure.contrib.http.agent :as http]
[clojure.contrib.str-utils2 :as str]))
(defn- url-encode [v] (URLEncoder/encode (str v) "utf-8"))
(defn id-to-acc [uid]
"Map a named identifier to accessions using UniProt REST API."
(let [map-url "http://www.uniprot.org/mapping/"
params {:from "ID" :to "ACC" :format "tab" :query uid}
param-str (str/join "&" (for [[k v] params]
(str (name k) "=" (url-encode v))))
cur-url (format "%s?%s" map-url param-str)
result (http/string (http/http-agent cur-url))]
(for [line (rest (str/split-lines result))]
(second (str/split line #"\t")))))
(defn id-to-fasta [uid]
"Get FASTA from UniProt REST API, removing duplicate IDs with same record."
(let [fasta-url "http://www.uniprot.org/uniprot/"
get-fasta (fn [acc]
(let [acc-url (format "%s%s.fasta" fasta-url acc)]
(http/string (http/http-agent acc-url))))]
(distinct (for [acc (id-to-acc uid)] (get-fasta acc)))))
(defn write-as-fasta [uid]
"Write out FASTA records for a UniProt identifier."
(let [out-file (format "%s.fa" uid)]
(with-out-writer out-file
(doseq [fa (id-to-fasta uid)]
(println fa)))))
(when *command-line-args*
(doseq [uid *command-line-args*]
(write-as-fasta uid))
(shutdown-agents))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment