Skip to content

Instantly share code, notes, and snippets.

@karelfiser
Created November 30, 2012 17:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save karelfiser/4177161 to your computer and use it in GitHub Desktop.
Save karelfiser/4177161 to your computer and use it in GitHub Desktop.
find_kinase_consensus_in_proteins
## Find a kinase (here ABL1) consensus sequence in protein sequences using R.
proteins <- c("P00519", "P46109", "P61769") # ABL1, CRKL, B2M
for (ii in proteins) {
prot_url <- paste("http://www.uniprot.org/uniprot/", ii, ".fasta", sep="") # url of protein fasta
protein_fasta <- scan(file=url(prot_url), what="character", sep="\t") # read the protein fasta
protein_seq <- paste(protein_fasta[2:length(protein_fasta)], collapse="") # amino acid sequence only
# print(protein_seq)
for (ik in which(strsplit(protein_seq, '')[[1]]=='Y')) {
tyr_context <- substr(protein_seq, ik-1, ik+3)
# print(tyr_context)
# search for ABL1 specific sequence: I/V/L-Y-X-X-P/F:
x <- grep(pattern = "[IVL]Y..[PF]", x = tyr_context, value = TRUE)
if (length(x)!=0) {
print(paste(ii, ": ", x, sep=""))
}
}
}
# Homework:
# 1) Try to search through protein_seq directly (don't use strsplit by Y).
# 2) Find other Tyr or Ser/Thr kinases target sequences.
# 3) Convert Uniprot ids to gene symbols.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment