Skip to content

Instantly share code, notes, and snippets.

@clairemcwhite
Last active February 21, 2020 20:49
Show Gist options
  • Save clairemcwhite/a5e889f6192a664be45c0226d0ab5813 to your computer and use it in GitHub Desktop.
Save clairemcwhite/a5e889f6192a664be45c0226d0ab5813 to your computer and use it in GitHub Desktop.
read_fasta <- function(fasta_filename, annot = FALSE){
fasta <- seqinr::read.fasta(fasta_filename, as.string = TRUE)
# Convert seqinr SeqFastadna object to data.frame
fasta_df <- fasta %>%
sapply(function(x){x[1:length(x)]}) %>%
as.data.frame %>%
broom::fix_data_frame(newcol = "ID", newnames = "Sequence")
if(annot == TRUE){
annot_df <- getAnnot(fasta) %>%
sapply(function(x){x[1:length(x)]}) %>%
as.data.frame() %>%
broom::fix_data_frame(newnames = "Annot")
fasta_df <- cbind(fasta_df, annot_df)
}
return(fasta_df)
}
read_fasta('human_uniprot-proteome_human_reviewed.fasta')
read_fasta('https://www.uniprot.org/uniprot/?query=PGH1&format=fasta&limit=10')
@clairemcwhite
Copy link
Author

clairemcwhite commented Feb 21, 2020

Tidy a fasta file

> read_fasta('/project/cmcwhite/data/peptide_elutions/protein_identification/proteomes/human_uniprot-proteome_human_reviewed.fasta')

A tibble: 20,191 x 2
ID Sequence
<chr> <fct>
1 sp|P31946|1433B_HUMAN mtmdkselvqkaklaeqaeryddmaaa..
2 sp|P04439|1A03_HUMAN mavmaprtlllllsgalaltqtwagshsmr...
3 sp|P01889|1B07_HUMAN mlvmaprtvllllsaalaltetwagshsmry...
4 sp|P30464|1B15_HUMAN mrvtaprtvllllsgalaltetwagshsmryf...

Compare to the unwrapped read.fasta output

> seqinr::read.fasta('/project/cmcwhite/data/peptide_elutions/protein_identification/proteomes/human_uniprot-proteome_human_reviewed.fasta', as.string = TRUE)
$sp|P31946|1433B_HUMAN
[1] "mtmdkselvqkaklaeqaeryddmaaamkavteqghelsneernllsvayknvvgarrsswrvissieqkternekkqqmgkeyrekieaelqdicndvlelldkylipnatqpeskvfylkmkgdyfrylsevasgdnkqttvsnsqqayqeafeiskkemqpthpirlglalnfsvfyyeilnspekacslaktafdeaiaeldtlneesykdstlimqllrdnltlwtsenqgdegdagegen"
attr(,"name")
[1] "sp|P31946|1433B_HUMAN"
attr(,"Annot")
[1] ">sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3"
attr(,"class")
[1] "SeqFastadna"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment