Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Grantham Matrix
grantham <- readr::read_tsv("https://gist.githubusercontent.com/danielecook/501f03650bca6a3db31ff3af2d413d2a/raw/5583a134b36b60762be6cd54002a0f4044338cd0/grantham.tsv") %>%
tidyr:: gather(SECOND,SCORE, -FIRST) %>% dplyr::filter(SCORE > 0)
calculate_grantham <- function(a1, a2) {
(grantham %>% dplyr::filter(FIRST == a1, SECOND == a2))$SCORE
}
FIRST R L P T A V G I F Y C H Q N K D E M W
S 110 145 74 58 99 124 56 142 155 144 112 89 68 46 121 65 80 135 177
R 0 102 103 71 112 96 125 97 97 77 180 29 43 86 26 96 54 91 101
L 0 0 98 92 96 32 138 5 22 36 198 99 113 153 107 172 138 15 61
P 0 0 0 38 27 68 42 95 114 110 169 77 76 91 103 108 93 87 147
T 0 0 0 0 58 69 59 89 103 92 149 47 42 65 78 85 65 81 128
A 0 0 0 0 0 64 60 94 113 112 195 86 91 111 106 126 107 84 148
V 0 0 0 0 0 0 109 29 50 55 192 84 96 133 97 152 121 21 88
G 0 0 0 0 0 0 0 135 153 147 159 98 87 80 127 94 98 127 184
I 0 0 0 0 0 0 0 0 21 33 198 94 109 149 102 168 134 10 61
F 0 0 0 0 0 0 0 0 0 22 205 100 116 158 102 177 140 28 40
Y 0 0 0 0 0 0 0 0 0 0 194 83 99 143 85 160 122 36 37
C 0 0 0 0 0 0 0 0 0 0 0 174 154 139 202 154 170 196 215
H 0 0 0 0 0 0 0 0 0 0 0 0 24 68 32 81 40 87 115
Q 0 0 0 0 0 0 0 0 0 0 0 0 0 46 53 61 29 101 130
N 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 23 42 142 174
K 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 101 56 95 110
D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45 160 181
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 126 152
M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 67
@joshyates1980
Copy link

joshyates1980 commented Aug 2, 2016

what is this?

@danielecook
Copy link
Author

danielecook commented Aug 2, 2016

uhhh... This is a function I'm putting together in R for calculating grantham scores.

From this article

Grantham scores, which categorize codon replacements into classes of increasing chemical dissimilarity, were designated conservative (0-50), moderately conservative (51-100), moderately radical (101-150), or radical (≥151) according to the classification proposed by Li et al. (28).

These numbers are estimated by looking at the frequency of all pair-wise amino acid substitutions between species. If you look at that distribution, some changes are much more frequent indicating that they are 'tolerable' and don't have much of an effect on protein function. However, some are much less frequent - indicating they likely cause severe changes to proteins.

In Biology we use them to estimate how severe a mutation (or variant) might be when we identify one through genotyping.

I like to use github gist to store little snippets like this until I find a more permanent home for them (e.g. a project).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment