Instantly share code, notes, and snippets.

Embed
What would you like to do?
Code for logistic regression (as in OfS report) to measure "unexplained" First Class degrees awarded
---
title: OfS report on "grade inflation" in English universities
author: "David Firth"
date: "Updated 7 January 2019"
output: pdf_document
fontsize: 11pt
geometry: margin=2cm
header-includes:
- \usepackage{hyperref}
- \usepackage{color}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This file is written in _R Markdown_.
## The small example from my original blog post (2019-01-02)
See [\textcolor{blue}{https://statgeek.net/2019/01/02/office-for-students-report-on-grade-inflation/}](http://statgeek.net/2019/01/02/office-for-students-report-on-grade-inflation/)
Here I will attempt to use the OfS fixed-effects logistic regression method, for the artificial data that appears in
my blog post.
First, set up the data:
```{r}
provider <- factor(rep(c(rep("A", 2), rep("B", 2)), 2))
year <- factor(c(rep("2010-11", 4), rep("2016-17", 4)))
student_type <- factor(rep(c("h", "i"), 4))
n_firsts <- c(1000, 0, 500, 500, 1800, 0, 0, 500)
n_other <- c(0, 1000, 500, 500, 200, 0, 0, 1500)
total <- n_firsts + n_other
the_data <- data.frame(provider, year, student_type, n_firsts, n_other)
the_data
```
Now fit the logistic regression model as specified in Annex D (p42) of the OfS report.
```{r}
outcome <- cbind(n_firsts, n_other)
the_model <- glm(outcome ~ provider + year + provider:year + student_type,
family = binomial, data = the_data)
```
Compute the fitted values (expected numbers of Firsts) from that model, and print those out:
```{r}
expected_firsts <- total * fitted(the_model)
model_predictions <- data.frame(year, provider, student_type, expected_firsts)
model_predictions
```
Now get "predicted" counts, as described at point 5 of Annex C in the OfS report. First we make a new dataset in
which the year is changed to "2010-11" throughout:
```{r}
newdata <- the_data
newdata$year <- factor(rep("2010-11", 8))
```
And then we compute the predicted values based on graduation in year 2010-11 (as predicted probabilities multiplied
by the binomial totals):
```{r}
predicted_2010_11_counts <- predict(the_model, newdata, type = "response") * total
predicted_2010_11_counts
```
The whole-sector "unexplained" Firsts in 2016-17, as defined at point 5 of Annex C in the OfS report, can now be counted:
```{r}
sum((n_firsts - predicted_2010_11_counts)[year == "2016-17"])
```
--- that's 300 "unexplained" Firsts (or 7.5 percentage points, as a fraction of the total 4000 graduates here).
## The better model ("SASA model") from blog post Part 2 (2019-01-07)
This is as described in Sec 2.1 of the "Part 2" blog post at
[\textcolor{blue}{https://statgeek.net/2019/01/07/part-2-further-comments-on-ofs-grade-inflation-report/}](http://statgeek.net/2019/01/07/part-2-further-comments-on-ofs-grade-inflation-report/)
```{r}
SASA_model <- glm(outcome ~ -1 + provider:student_type + provider:year,
family = binomial, data = the_data)
```
(The reported warning relates to the 100% probability of a First in University A in year 2010-11. That's correct, and in this instance the warning can be safely ignored.)
Then as above we compute the predicted values based on graduation in year 2010-11 (as predicted probabilities multiplied by the binomial totals):
```{r}
predicted_2010_11_counts <- predict(SASA_model, newdata, type = "response") * total
```
The whole-sector "unexplained" Firsts in 2016-17 can now be counted as before:
```{r}
sum((n_firsts - predicted_2010_11_counts)[year == "2016-17"])
```
--- so that's a measured _decrease_ of 700 Firsts (which is 23.3% of the expected firsts based on 2010-11 data, or 17.5% of the total 4000 graduates here).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment