Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Code for logistic regression (as in OfS report) to measure "unexplained" First Class degrees awarded
---
title: OfS report on "grade inflation" in English universities
author: "David Firth"
date: "Updated 7 January 2019"
output: pdf_document
fontsize: 11pt
geometry: margin=2cm
header-includes:
- \usepackage{hyperref}
- \usepackage{color}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This file is written in _R Markdown_.
## The small example from my original blog post (2019-01-02)
See [\textcolor{blue}{https://statgeek.net/2019/01/02/office-for-students-report-on-grade-inflation/}](http://statgeek.net/2019/01/02/office-for-students-report-on-grade-inflation/)
Here I will attempt to use the OfS fixed-effects logistic regression method, for the artificial data that appears in
my blog post.
First, set up the data:
```{r}
provider <- factor(rep(c(rep("A", 2), rep("B", 2)), 2))
year <- factor(c(rep("2010-11", 4), rep("2016-17", 4)))
student_type <- factor(rep(c("h", "i"), 4))
n_firsts <- c(1000, 0, 500, 500, 1800, 0, 0, 500)
n_other <- c(0, 1000, 500, 500, 200, 0, 0, 1500)
total <- n_firsts + n_other
the_data <- data.frame(provider, year, student_type, n_firsts, n_other)
the_data
```
Now fit the logistic regression model as specified in Annex D (p42) of the OfS report.
```{r}
outcome <- cbind(n_firsts, n_other)
the_model <- glm(outcome ~ provider + year + provider:year + student_type,
family = binomial, data = the_data)
```
Compute the fitted values (expected numbers of Firsts) from that model, and print those out:
```{r}
expected_firsts <- total * fitted(the_model)
model_predictions <- data.frame(year, provider, student_type, expected_firsts)
model_predictions
```
Now get "predicted" counts, as described at point 5 of Annex C in the OfS report. First we make a new dataset in
which the year is changed to "2010-11" throughout:
```{r}
newdata <- the_data
newdata$year <- factor(rep("2010-11", 8))
```
And then we compute the predicted values based on graduation in year 2010-11 (as predicted probabilities multiplied
by the binomial totals):
```{r}
predicted_2010_11_counts <- predict(the_model, newdata, type = "response") * total
predicted_2010_11_counts
```
The whole-sector "unexplained" Firsts in 2016-17, as defined at point 5 of Annex C in the OfS report, can now be counted:
```{r}
sum((n_firsts - predicted_2010_11_counts)[year == "2016-17"])
```
--- that's 300 "unexplained" Firsts (or 7.5 percentage points, as a fraction of the total 4000 graduates here).
## The better model ("SASA model") from blog post Part 2 (2019-01-07)
This is as described in Sec 2.1 of the "Part 2" blog post at
[\textcolor{blue}{https://statgeek.net/2019/01/07/part-2-further-comments-on-ofs-grade-inflation-report/}](http://statgeek.net/2019/01/07/part-2-further-comments-on-ofs-grade-inflation-report/)
```{r}
SASA_model <- glm(outcome ~ -1 + provider:student_type + provider:year,
family = binomial, data = the_data)
```
(The reported warning relates to the 100% probability of a First in University A in year 2010-11. That's correct, and in this instance the warning can be safely ignored.)
Then as above we compute the predicted values based on graduation in year 2010-11 (as predicted probabilities multiplied by the binomial totals):
```{r}
predicted_2010_11_counts <- predict(SASA_model, newdata, type = "response") * total
```
The whole-sector "unexplained" Firsts in 2016-17 can now be counted as before:
```{r}
sum((n_firsts - predicted_2010_11_counts)[year == "2016-17"])
```
--- so that's a measured _decrease_ of 700 Firsts (which is 23.3% of the expected firsts based on 2010-11 data, or 17.5% of the total 4000 graduates here).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.