Created
April 1, 2015 19:15
-
-
Save rsnemmen/f2c03beb391db809c90f to your computer and use it in GitHub Desktop.
Calculates the confidence band of a linear regression model at the desired confidence level, using analytical methods
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def confband(xd,yd,a,b,conf=0.95,x=None): | |
""" | |
Calculates the confidence band of the linear regression model at the desired confidence | |
level, using analytical methods. The 2sigma confidence interval is 95% sure to contain | |
the best-fit regression line. This is not the same as saying it will contain 95% of | |
the data points. | |
Arguments: | |
- conf: desired confidence level, by default 0.95 (2 sigma) | |
- xd,yd: data arrays | |
- a,b: linear fit parameters as in y=ax+b | |
- x: (optional) array with x values to calculate the confidence band. If none is provided, will | |
by default generate 100 points in the original x-range of the data. | |
Returns: | |
Sequence (lcb,ucb,x) with the arrays holding the lower and upper confidence bands | |
corresponding to the [input] x array. | |
Usage: | |
>>> lcb,ucb,x=nemmen.confband(all.kp,all.lg,a,b,conf=0.95) | |
calculates the confidence bands for the given input arrays | |
>>> pylab.fill_between(x, lcb, ucb, alpha=0.3, facecolor='gray') | |
plots a shaded area containing the confidence band | |
References: | |
1. http://en.wikipedia.org/wiki/Simple_linear_regression, see Section Confidence intervals | |
2. http://www.weibull.com/DOEWeb/confidence_intervals_in_simple_linear_regression.htm | |
Author: Rodrigo Nemmen | |
v1 Dec. 2011 | |
v2 Jun. 2012: corrected bug in computing dy | |
""" | |
alpha=1.-conf # significance | |
n=xd.size # data sample size | |
if x==None: x=numpy.linspace(xd.min(),xd.max(),100) | |
# Predicted values (best-fit model) | |
y=a*x+b | |
# Auxiliary definitions | |
sd=scatterfit(xd,yd,a,b) # Scatter of data about the model | |
sxd=numpy.sum((xd-xd.mean())**2) | |
sx=(x-xd.mean())**2 # array | |
# Quantile of Student's t distribution for p=1-alpha/2 | |
q=scipy.stats.t.ppf(1.-alpha/2.,n-2) | |
# Confidence band | |
dy=q*sd*numpy.sqrt( 1./n + sx/sxd ) | |
ucb=y+dy # Upper confidence band | |
lcb=y-dy # Lower confidence band | |
return lcb,ucb,x |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For those who are wondering,
scatterfit
would appear to be from this gist: https://gist.github.com/rsnemmen/0eb32832c657c19e4d39