Code for logistic regression (as in OfS report) to measure "unexplained" First Class degrees awarded
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: OfS report on "grade inflation" in English universities | |
author: "David Firth" | |
date: "Updated 7 January 2019" | |
output: pdf_document | |
fontsize: 11pt | |
geometry: margin=2cm | |
header-includes: | |
- \usepackage{hyperref} | |
- \usepackage{color} | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE) | |
``` | |
This file is written in _R Markdown_. | |
## The small example from my original blog post (2019-01-02) | |
See [\textcolor{blue}{https://davidfirth.github.io/blog/2019/01/02/office-for-students-report-on-grade-inflation/}](http://davidfirth.github.io/blog/2019/01/02/office-for-students-report-on-grade-inflation/) | |
Here I will attempt to use the OfS fixed-effects logistic regression method, for the artificial data that appears in | |
my blog post. | |
First, set up the data: | |
```{r} | |
provider <- factor(rep(c(rep("A", 2), rep("B", 2)), 2)) | |
year <- factor(c(rep("2010-11", 4), rep("2016-17", 4))) | |
student_type <- factor(rep(c("h", "i"), 4)) | |
n_firsts <- c(1000, 0, 500, 500, 1800, 0, 0, 500) | |
n_other <- c(0, 1000, 500, 500, 200, 0, 0, 1500) | |
total <- n_firsts + n_other | |
the_data <- data.frame(provider, year, student_type, n_firsts, n_other) | |
the_data | |
``` | |
Now fit the logistic regression model as specified in Annex D (p42) of the OfS report. | |
```{r} | |
outcome <- cbind(n_firsts, n_other) | |
the_model <- glm(outcome ~ provider + year + provider:year + student_type, | |
family = binomial, data = the_data) | |
``` | |
Compute the fitted values (expected numbers of Firsts) from that model, and print those out: | |
```{r} | |
expected_firsts <- total * fitted(the_model) | |
model_predictions <- data.frame(year, provider, student_type, expected_firsts) | |
model_predictions | |
``` | |
Now get "predicted" counts, as described at point 5 of Annex C in the OfS report. First we make a new dataset in | |
which the year is changed to "2010-11" throughout: | |
```{r} | |
newdata <- the_data | |
newdata$year <- factor(rep("2010-11", 8)) | |
``` | |
And then we compute the predicted values based on graduation in year 2010-11 (as predicted probabilities multiplied | |
by the binomial totals): | |
```{r} | |
predicted_2010_11_counts <- predict(the_model, newdata, type = "response") * total | |
predicted_2010_11_counts | |
``` | |
The whole-sector "unexplained" Firsts in 2016-17, as defined at point 5 of Annex C in the OfS report, can now be counted: | |
```{r} | |
sum((n_firsts - predicted_2010_11_counts)[year == "2016-17"]) | |
``` | |
--- that's 300 "unexplained" Firsts (or 7.5 percentage points, as a fraction of the total 4000 graduates here). | |
## The better model ("SASA model") from blog post Part 2 (2019-01-07) | |
This is as described in Sec 2.1 of the "Part 2" blog post at | |
[\textcolor{blue}{https://davidfirth.github.io/blog/2019/01/07/part-2-further-comments-on-ofs-grade-inflation-report/}](http://davidfirth.github.io/blog/2019/01/07/part-2-further-comments-on-ofs-grade-inflation-report/) | |
```{r} | |
SASA_model <- glm(outcome ~ -1 + provider:student_type + provider:year, | |
family = binomial, data = the_data) | |
``` | |
(The reported warning relates to the 100% probability of a First in University A in year 2010-11. That's correct, and in this instance the warning can be safely ignored.) | |
Then as above we compute the predicted values based on graduation in year 2010-11 (as predicted probabilities multiplied by the binomial totals): | |
```{r} | |
predicted_2010_11_counts <- predict(SASA_model, newdata, type = "response") * total | |
``` | |
The whole-sector "unexplained" Firsts in 2016-17 can now be counted as before: | |
```{r} | |
sum((n_firsts - predicted_2010_11_counts)[year == "2016-17"]) | |
``` | |
--- so that's a measured _decrease_ of 700 Firsts (which is 23.3% of the expected firsts based on 2010-11 data, or 17.5% of the total 4000 graduates here). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment