Skip to content

Instantly share code, notes, and snippets.

@TheDataLeek
Last active August 29, 2015 14:00
Show Gist options
  • Save TheDataLeek/99f04b8d3d80eb6efdb4 to your computer and use it in GitHub Desktop.
Save TheDataLeek/99f04b8d3d80eb6efdb4 to your computer and use it in GitHub Desktop.
F Statistic Issue
{
"metadata": {
"name": "",
"signature": "sha256:c0c35d4739a8dfbfcb06697f3c95b44631ed2541420c0d68dccf7feacac61695"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"%pylab inline\n",
"%load_ext rmagic"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load library, load data, standardize `MET`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"library('MASS')\n",
"data <- read.csv('/home/william/class/appm4570/stateExpenditures.csv')\n",
"data$MET <- (data$MET - mean(data$MET))/sd(data$MET)\n",
"data$MET2 <- data$MET^2"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 50
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Linear regression on data. Using same small model from other code."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"fit_full <- lm(EX ~ ECAB + MET + MET2 + GROW + YOUNG + OLD + WEST, data=data)\n",
"fit_small <- lm(EX ~ ECAB + MET2 + WEST, data=data)\n",
"k <- 7\n",
"l <- 3\n",
"print(coefficients(fit_full))\n",
"print('============')\n",
"print(coefficients(fit_small))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"(Intercept) ECAB MET MET2 GROW YOUNG \n",
" 44.5614507 1.3954201 -5.0542271 22.4342650 0.6953360 0.6076015 \n",
" OLD WEST \n",
" 4.1207839 34.0730785 \n",
"[1] \"============\"\n",
"(Intercept) ECAB MET2 WEST \n",
" 97.732033 1.519929 22.548872 39.550509 \n"
]
}
],
"prompt_number": 51
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Find SSE of both fits"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"sse_full <- deviance(fit_full)\n",
"sse_small <- deviance(fit_small)\n",
"print(c('SSE FULL:', sse_full))\n",
"print(c('SSE SMALL:', sse_small))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"[1] \"SSE FULL:\" \"50155.0203618758\"\n",
"[1] \"SSE SMALL:\" \"54718.1056200931\"\n"
]
}
],
"prompt_number": 52
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Calculate $f$ statistic, \n",
"\n",
"\\begin{equation}\n",
" \\frac{\\left( \\text{SSE}_l - \\text{SSE}_k \\right) / (k - l)}\n",
" {\\text{SSE}_k / \\left[ n - (k + l) \\right]}\n",
"\\end{equation}"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"((sse_small - sse_full) / (k - l)) /\n",
"(sse_full / (48 - (k + l)))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"[1] 0.8643065\n"
]
}
],
"prompt_number": 56
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"\n",
"pf(0.95, 4, 40)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"[1] 0.5546512\n"
]
}
],
"prompt_number": 55
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment