Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
stat_smooth_func <- function(mapping = NULL, data = NULL,
geom = "smooth", position = "identity",
...,
method = "auto",
formula = y ~ x,
se = TRUE,
n = 80,
span = 0.75,
fullrange = FALSE,
level = 0.95,
method.args = list(),
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
xpos = NULL,
ypos = NULL) {
layer(
data = data,
mapping = mapping,
stat = StatSmoothFunc,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
method = method,
formula = formula,
se = se,
n = n,
fullrange = fullrange,
level = level,
na.rm = na.rm,
method.args = method.args,
span = span,
xpos = xpos,
ypos = ypos,
...
)
)
}
StatSmoothFunc <- ggproto("StatSmooth", Stat,
setup_params = function(data, params) {
# Figure out what type of smoothing to do: loess for small datasets,
# gam with a cubic regression basis for large data
# This is based on the size of the _largest_ group.
if (identical(params$method, "auto")) {
max_group <- max(table(data$group))
if (max_group < 1000) {
params$method <- "loess"
} else {
params$method <- "gam"
params$formula <- y ~ s(x, bs = "cs")
}
}
if (identical(params$method, "gam")) {
params$method <- mgcv::gam
}
params
},
compute_group = function(data, scales, method = "auto", formula = y~x,
se = TRUE, n = 80, span = 0.75, fullrange = FALSE,
xseq = NULL, level = 0.95, method.args = list(),
na.rm = FALSE, xpos=NULL, ypos=NULL) {
if (length(unique(data$x)) < 2) {
# Not enough data to perform fit
return(data.frame())
}
if (is.null(data$weight)) data$weight <- 1
if (is.null(xseq)) {
if (is.integer(data$x)) {
if (fullrange) {
xseq <- scales$x$dimension()
} else {
xseq <- sort(unique(data$x))
}
} else {
if (fullrange) {
range <- scales$x$dimension()
} else {
range <- range(data$x, na.rm = TRUE)
}
xseq <- seq(range[1], range[2], length.out = n)
}
}
# Special case span because it's the most commonly used model argument
if (identical(method, "loess")) {
method.args$span <- span
}
if (is.character(method)) method <- match.fun(method)
base.args <- list(quote(formula), data = quote(data), weights = quote(weight))
model <- do.call(method, c(base.args, method.args))
m = model
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 3),
b = format(coef(m)[2], digits = 3),
r2 = format(summary(m)$r.squared, digits = 3)))
func_string = as.character(as.expression(eq))
if(is.null(xpos)) xpos = min(data$x)*0.9
if(is.null(ypos)) ypos = max(data$y)*0.9
data.frame(x=xpos, y=ypos, label=func_string)
},
required_aes = c("x", "y")
)
@matteorizzuto

This comment has been minimized.

Copy link

matteorizzuto commented Feb 4, 2016

Hi,

I am trying to use this function, but every time I source it I get the error:

Error in eval(expr, envir, enclos) : could not find function "eval"

Do you know how I can fix it? I am using R version 3.2.3 and all my packages are up to date.

@tomhawkinsphoto

This comment has been minimized.

Copy link

tomhawkinsphoto commented Feb 5, 2016

Same error here. The function has been running without issue for the last several months, and decided yesterday to throw out the "Error in eval(expr, envir, enclos) : could not find function "eval"" error. Any ideas on what might have happened?

@jbeaulie

This comment has been minimized.

Copy link

jbeaulie commented Feb 17, 2016

Same error here.

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

@andrese52

This comment has been minimized.

Copy link

andrese52 commented Feb 20, 2016

same error!

@kdauria

This comment has been minimized.

Copy link
Owner Author

kdauria commented Feb 21, 2016

Sorry. ggplot2 was updated and I don't know what is causing the eval error. I believe it is related to http://stackoverflow.com/questions/35462240/r-error-in-evalexpr-envir-enclos-could-not-find-function-eval.

@kdauria

This comment has been minimized.

Copy link
Owner Author

kdauria commented Feb 21, 2016

OK. I figured it out. The Gist should now work with the most current version of ggplot2.

@mlist

This comment has been minimized.

Copy link

mlist commented Apr 16, 2016

It does. Thanks a bunch!!

@gennadybm

This comment has been minimized.

Copy link

gennadybm commented May 3, 2016

It worked! Thank you!!!

@erkent

This comment has been minimized.

Copy link

erkent commented May 31, 2016

Hi,
When I source the file I get the message:

Error in ggproto("StatSmooth", Stat, setup_params = function(data, params) { :
object 'Stat' not found

Am I missing something? Any ideas?

@fawda123

This comment has been minimized.

Copy link

fawda123 commented Jul 20, 2016

Hi there, do you have any suggestions for different x/y locations using xpos and ypos with facets?

@kdauria

This comment has been minimized.

Copy link
Owner Author

kdauria commented Jul 21, 2016

Hey @fawda123, the snippet I wrote above moves the position of the equation according to the data range. For example, xpos = min(data$x)*0.9 means that the equation will be started a little left of the minimum x value. If you want to do something fancier, it shouldn't be too hard to write up another function. Some of the extensions at https://www.ggplot2-exts.org/ might give you some ideas. Maybe looking at the source of ggpmisc's stat_poly_eq would help?

@fawda123

This comment has been minimized.

Copy link

fawda123 commented Jul 21, 2016

@kdauria Thanks, I'll have a look.

@low-decarie

This comment has been minimized.

Copy link

low-decarie commented Nov 17, 2016

Great stuff! Thanks! I need to get around to editing it so that the equation and R2 values are on seperate lines. And add p-value.

@jsg51483

This comment has been minimized.

Copy link

jsg51483 commented Nov 29, 2016

Been using this successfully for awhile but seeming to run into a problem with the new ggplot2 version (ggplot2_2.2.0).

Error: StatSmooth was built with an incompatible version of ggproto.
Please reinstall the package that provides this extension.

Anyone else have (or hopefully, solve) this issue?

@ucohen

This comment has been minimized.

Copy link

ucohen commented Dec 9, 2016

for ggplot2_2.2.0, use temporary workaround with filename specification:
devtools::source_gist("524eade46135f6348140", filename = "ggplot_smooth_func.R")

see: http://stackoverflow.com/questions/38345894/r-source-gist-not-working

@psychelzh

This comment has been minimized.

Copy link

psychelzh commented May 5, 2017

It always gets r2 = NULL. What is the problem?

Sorry , I figured out the reason. I used method = 'glm'. Thank you very much! 🎉 💯

@camilla163

This comment has been minimized.

Copy link

camilla163 commented Jul 5, 2017

Thanks! Worked like a charm! :)

@eduardocrichard

This comment has been minimized.

Copy link

eduardocrichard commented Jul 6, 2017

Greeeaaat! Thanks a lot for the code.
Did anyone managed to include p-value on a new line?

I managed to get p-value using lmp ()

lmp <- function (modelobject) {
    if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
    f <- summary(modelobject)$fstatistic
    p <- pf(f[1],f[2],f[3],lower.tail=F)
    attributes(p) <- NULL
    return(p)
}

And adjusted eq as the following:

                        eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2~","~~italic(p)~"="~pval, 
                                         list(a = format(coef(m)[1], digits = 3), 
                                              b = format(coef(m)[2], digits = 3), 
                                              r2 = format(summary(m)$r.squared, digits = 3),
                                              pval = ifelse(lmp(m)<1e-4, "<0.0001",format(lmp(m), digits = 3))))

But still coudlnt find a way to plot it on a new line. If you modify StatSmoothFunc's eq and it's ypos to if(is.null(ypos)) ypos = max(data$y)*0.9 you'll be able to use it, bt you're gonna have to call the fuction twice

@srweintraub

This comment has been minimized.

Copy link

srweintraub commented Jul 9, 2017

This is a great function, thank you! Is it possible to specify unique xpos and ypos if you have multiple equations? For example, if you have three groupings in your data and each has it's own regression, the function does plot all 3, but sometimes they are on top of eachother (unless they happen to have very different min/max values, in which case you luck out and they stagger).

Just a thought...I really appreciate the work.

@owensca

This comment has been minimized.

Copy link

owensca commented Jan 31, 2018

Here is how I was able to add a newline and p-values:

stat_smooth_func_with_pval <- function(mapping = NULL, data = NULL,
                                            geom = "smooth", position = "identity",
                                            ...,
                                            method = "auto",
                                            formula = y ~ x,
                                            se = TRUE,
                                            n = 80,
                                            span = 0.75,
                                            fullrange = FALSE,
                                            level = 0.95,
                                            method.args = list(),
                                            na.rm = FALSE,
                                            show.legend = NA,
                                            inherit.aes = TRUE,
                                            xpos = NULL,
                                            ypos = NULL,
                                            xpos2 = NULL,
                                            ypos2 = NULL) {
  layer(
    data = data,
    mapping = mapping,
    stat = StatSmoothFunc,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      method = method,
      formula = formula,
      se = se,
      n = n,
      fullrange = fullrange,
      level = level,
      na.rm = na.rm,
      method.args = method.args,
      span = span,
      xpos = xpos,
      ypos = ypos,
      xpos2 = xpos2,
      ypos2 = ypos2,
      ...
    )
  )
}

StatSmoothFunc <- ggproto("StatSmooth", Stat,
                          
                          setup_params = function(data, params) {
                            # Figure out what type of smoothing to do: loess for small datasets,
                            # gam with a cubic regression basis for large data
                            # This is based on the size of the _largest_ group.
                            if (identical(params$method, "auto")) {
                              max_group <- max(table(data$group))
                              
                              if (max_group < 1000) {
                                params$method <- "loess"
                              } else {
                                params$method <- "gam"
                                params$formula <- y ~ s(x, bs = "cs")
                              }
                            }
                            if (identical(params$method, "gam")) {
                              params$method <- mgcv::gam
                            }
                            
                            params
                          },
                          
                          compute_group = function(data, scales, method = "auto", formula = y~x,
                                                   se = TRUE, n = 80, span = 0.75, fullrange = FALSE,
                                                   xseq = NULL, level = 0.95, method.args = list(),
                                                   na.rm = FALSE, xpos=NULL, ypos=NULL, 
                                                   xpos2=NULL, ypos2=NULL) {
                            if (length(unique(data$x)) < 2) {
                              # Not enough data to perform fit
                              return(data.frame())
                            }
                            
                            if (is.null(data$weight)) data$weight <- 1
                            
                            if (is.null(xseq)) {
                              if (is.integer(data$x)) {
                                if (fullrange) {
                                  xseq <- scales$x$dimension()
                                } else {
                                  xseq <- sort(unique(data$x))
                                }
                              } else {
                                if (fullrange) {
                                  range <- scales$x$dimension()
                                } else {
                                  range <- range(data$x, na.rm = TRUE)
                                }
                                xseq <- seq(range[1], range[2], length.out = n)
                              }
                            }
                            # Special case span because it's the most commonly used model argument
                            if (identical(method, "loess")) {
                              method.args$span <- span
                            }
                            
                            if (is.character(method)) method <- match.fun(method)
                            
                            base.args <- list(quote(formula), data = quote(data), weights = quote(weight))
                            model <- do.call(method, c(base.args, method.args))
                            
                            m = model
                            eq1 <- substitute(italic(y) == a + b %.% italic(x), 
                                              list(a = format(coef(m)[1], digits = 3), 
                                                   b = format(coef(m)[2], digits = 3)))
                            
                            eq2 <- substitute(italic(r)^2~"="~r2*","~~italic(p)~"="~pval, 
                                              list(r2 = format(summary(m)$r.squared, digits = 3),
                                                   pval = format(summary(m)$coef[2,4], digits = 3)))
                            
                            func_string1 = as.character(as.expression(eq1))
                            func_string2 = as.character(as.expression(eq2))
                            
                            if(is.null(xpos)) xpos = min(data$x)*0.9
                            if(is.null(ypos)) ypos = max(data$y)*0.9
                            if(is.null(xpos2)) xpos2 = xpos
                            if(is.null(ypos2)) ypos2 = max(data$y)*0.6
                            
                            data.frame(x = rbind(xpos, xpos2), 
                                       y = rbind(ypos, ypos2), 
                                       label = rbind(func_string1, func_string2))
                            
                          },
                          
                          required_aes = c("x", "y")
)

# source
# https://gist.github.com/kdauria/524eade46135f6348140#file-ggplot_smooth_func-r-L110-L111
@ghost

This comment has been minimized.

Copy link

ghost commented Jul 19, 2018

Hello, first of all, thank you for this great function. It works great on my plots.

However, I was wondering how to get rid of the c() around the coefficients..

For example, in my graphs, I see:

y = c(9.49) + c(0.797)*x

Thanks,

@falltok

This comment has been minimized.

Copy link

falltok commented Jul 20, 2018

capture
I have the same problem as @jasonbaik94 the c() around the coefficients when I use the function, can someone help ?
Thanks

@carlinstarrs

This comment has been minimized.

Copy link

carlinstarrs commented Jul 26, 2018

@jasonbaik94 and @falltok:

I managed to fix this by changing these lines:

eq1 <- substitute(italic(y) == a + b %.% italic(x), 
                                              list(a = format(coef(m)[1], digits = 3), 
                                                   b = format(coef(m)[2], digits = 3)))

to have a double bracket:

eq1 <- substitute(italic(y) == a + b %.% italic(x), 
                                              list(a = format(coef(m)[[1]], digits = 3), 
                                                   b = format(coef(m)[[2]], digits = 3)))

I don't think that broke anything...

@falltok

This comment has been minimized.

Copy link

falltok commented Jul 28, 2018

It works well

@adamaki

This comment has been minimized.

Copy link

adamaki commented Oct 23, 2018

Excellent! Thanks very much!

@MiG-25Altfuns

This comment has been minimized.

Copy link

MiG-25Altfuns commented Mar 31, 2019

This is a great function, thank you! Is it possible to specify unique xpos and ypos if you have multiple equations? For example, if you have three groupings in your data and each has it's own regression, the function does plot all 3, but sometimes they are on top of eachother (unless they happen to have very different min/max values, in which case you luck out and they stagger).

Just a thought...I really appreciate the work.

Have the exact same problem and can't figure out a way

@bibbers93

This comment has been minimized.

Copy link

bibbers93 commented May 8, 2019

Hi, thank you for creating this. How would you go about changing the size of the equation and r2 value. currently I have a 6x4 facet wrap and the text isn't fully shown.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.