Skip to content

Instantly share code, notes, and snippets.

@mcfrank
Created November 2, 2022 22:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mcfrank/585756fd5899da5a0ce27cd8bca73b07 to your computer and use it in GitHub Desktop.
Save mcfrank/585756fd5899da5a0ce27cd8bca73b07 to your computer and use it in GitHub Desktop.
simple example of fitting GAMs to CDI data
library(tidyverse)
library(gamlss)
d <- readxl::read_excel("sample_data.xlsx")
# model
max_vocab <- max(d$`Vocabulary production`)
# transformation to 0-1 for beta model
# note that beta data cannot be exactly 0 or 1, it may be necessary to add/subtract .001 for data including 0s and 1s
mod_data <- d |>
mutate(vocab = `Vocabulary production`,
age = `Age in months`) |>
select(vocab, age) |>
mutate(vocab = vocab / max_vocab) |>
filter(vocab > 0, vocab < 1)
# gam will only work with complete data
mod_data <- mod_data[complete.cases(mod_data),]
# fit model
# note very smooth, monotonic splines on both mean and SD of age
gam_mod <- gamlss(vocab ~ pbm(age, lambda = 10000),
sigma.formula = ~ pbm(age, lambda = 10000),
family = BE,
control = gamlss.control(c.crit = .1),
data = mod_data)
# output appropriate percentiles and reshape for plotting
centiles <- centiles.pred(gam_mod, cent = c(10, 25, 50, 75, 90),
xname = "age", xvalues = 8:18,
data = mod_data) |>
pivot_longer(-x, names_to = "percentile", values_to = "predicted") |>
mutate(predicted = predicted * max_vocab)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment