Skip to content

Instantly share code, notes, and snippets.

# DavidFirth/grade-inflation-example.Rmd Last active Jan 7, 2019

Code for logistic regression (as in OfS report) to measure "unexplained" First Class degrees awarded
 --- title: OfS report on "grade inflation" in English universities author: "David Firth" date: "Updated 7 January 2019" output: pdf_document fontsize: 11pt geometry: margin=2cm header-includes: - \usepackage{hyperref} - \usepackage{color} --- ```{r setup, include=FALSE} knitr::opts_chunk\$set(echo = TRUE) ``` This file is written in _R Markdown_. ## The small example from my original blog post (2019-01-02) See [\textcolor{blue}{https://statgeek.net/2019/01/02/office-for-students-report-on-grade-inflation/}](http://statgeek.net/2019/01/02/office-for-students-report-on-grade-inflation/) Here I will attempt to use the OfS fixed-effects logistic regression method, for the artificial data that appears in my blog post. First, set up the data: ```{r} provider <- factor(rep(c(rep("A", 2), rep("B", 2)), 2)) year <- factor(c(rep("2010-11", 4), rep("2016-17", 4))) student_type <- factor(rep(c("h", "i"), 4)) n_firsts <- c(1000, 0, 500, 500, 1800, 0, 0, 500) n_other <- c(0, 1000, 500, 500, 200, 0, 0, 1500) total <- n_firsts + n_other the_data <- data.frame(provider, year, student_type, n_firsts, n_other) the_data ``` Now fit the logistic regression model as specified in Annex D (p42) of the OfS report. ```{r} outcome <- cbind(n_firsts, n_other) the_model <- glm(outcome ~ provider + year + provider:year + student_type, family = binomial, data = the_data) ``` Compute the fitted values (expected numbers of Firsts) from that model, and print those out: ```{r} expected_firsts <- total * fitted(the_model) model_predictions <- data.frame(year, provider, student_type, expected_firsts) model_predictions ``` Now get "predicted" counts, as described at point 5 of Annex C in the OfS report. First we make a new dataset in which the year is changed to "2010-11" throughout: ```{r} newdata <- the_data newdata\$year <- factor(rep("2010-11", 8)) ``` And then we compute the predicted values based on graduation in year 2010-11 (as predicted probabilities multiplied by the binomial totals): ```{r} predicted_2010_11_counts <- predict(the_model, newdata, type = "response") * total predicted_2010_11_counts ``` The whole-sector "unexplained" Firsts in 2016-17, as defined at point 5 of Annex C in the OfS report, can now be counted: ```{r} sum((n_firsts - predicted_2010_11_counts)[year == "2016-17"]) ``` --- that's 300 "unexplained" Firsts (or 7.5 percentage points, as a fraction of the total 4000 graduates here). ## The better model ("SASA model") from blog post Part 2 (2019-01-07) This is as described in Sec 2.1 of the "Part 2" blog post at [\textcolor{blue}{https://statgeek.net/2019/01/07/part-2-further-comments-on-ofs-grade-inflation-report/}](http://statgeek.net/2019/01/07/part-2-further-comments-on-ofs-grade-inflation-report/) ```{r} SASA_model <- glm(outcome ~ -1 + provider:student_type + provider:year, family = binomial, data = the_data) ``` (The reported warning relates to the 100% probability of a First in University A in year 2010-11. That's correct, and in this instance the warning can be safely ignored.) Then as above we compute the predicted values based on graduation in year 2010-11 (as predicted probabilities multiplied by the binomial totals): ```{r} predicted_2010_11_counts <- predict(SASA_model, newdata, type = "response") * total ``` The whole-sector "unexplained" Firsts in 2016-17 can now be counted as before: ```{r} sum((n_firsts - predicted_2010_11_counts)[year == "2016-17"]) ``` --- so that's a measured _decrease_ of 700 Firsts (which is 23.3% of the expected firsts based on 2010-11 data, or 17.5% of the total 4000 graduates here).
to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.