Last active
April 24, 2018 21:08
-
-
Save cokelly/7ae45d5284d37857c139ce293146ab69 to your computer and use it in GitHub Desktop.
Australia's legal profession and the gender income gap (and learning slopegraphs)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Australia's legal profession and the gender income gap (and learning slopegraphs)" | |
author: Ciaran | |
date: '2018-04-24' | |
slug: australias-legal-profession-and-the-gender-income-gap-and-learning-slopegraphs | |
categories: | |
- rstats | |
tags: | |
- tidy_tuesday | |
- inequality | |
header: | |
caption: '' | |
image: '' | |
--- | |
The [R for Data Science](https://www.jessemaegan.com/post/r4ds-the-next-iteration/) community has been running a '[Tidy Tuesday](https://github.com/rfordatascience/tidytuesday)' project for a few weeks. In essence they link to a data-driven paper and a somewhat tidy version of the paper's underlying dataset. The challenge is to develop some visualisations etc from the data, all within the [R for Data Science](http://r4ds.had.co.nz/) approach to working with R. | |
This week's challenge is drawn from [an article on Australia's pay gap](http://www.womensagenda.com.au/latest/eds-blog/australia-s-50-highest-paying-jobs-are-paying-men-significantly-more/). The article's data is sourced [here](https://data.gov.au/dataset/taxation-statistics-2013-14/resource/c506c052-be2f-4fba-8a65-90f9e60f7775?inner_span=True). | |
```{r setup, echo=FALSE, include=FALSE} | |
knitr::opts_chunk$set(cache = TRUE, | |
echo = FALSE) | |
library(tidyverse) | |
library(cowplot) | |
library(ggrepel) | |
library(kableExtra) | |
suppressWarnings(salaries <- read_csv("../../static/files/data/week4_australian_salary.csv")) | |
``` | |
```{r compare_salaries} | |
# Split the dataset in two (there are probably group_by or tidyr::spread options but I find this easier) | |
male_top_jobs <- salaries %>% | |
filter(gender == "Male") %>% # Filter by gender | |
arrange(desc(average_taxable_income)) %>% | |
select(occupation, `Male Average Taxable Income` = average_taxable_income) | |
female_top_jobs <- salaries %>% | |
filter(gender == "Female") %>% # Filter by gender | |
arrange(desc(average_taxable_income)) %>% | |
select(occupation, `Female Average Taxable Income` = average_taxable_income) | |
# Prepare for plotting | |
male_compared_jobs <- male_top_jobs %>% | |
left_join(., female_top_jobs, by = "occupation") %>% # join the two tables agani | |
arrange(desc(`Male Average Taxable Income`)) %>% # Arrange by male income | |
slice(1:100) %>% # Isolate top 100 by male income | |
mutate(occupation = str_replace_all(occupation, "\uFFFD", "-")) %>% # Tidy the text hyphen artefact | |
add_column(label_male = paste(.$occupation, paste("AUS$", prettyNum(.$`Male Average Taxable Income`, big.mark = ","), sep = ""), sep = " ")) %>% # label for male jobs | |
add_column(label_female00 = round((.$`Female Average Taxable Income`/.$`Male Average Taxable Income`)*100, 0)) %>% # Get female wage as % of male wage | |
mutate(label_female0 = paste(label_female00, "%", sep = "")) %>% | |
mutate(label_female = paste(.$occupation, paste("AUS$", prettyNum(.$`Female Average Taxable Income`, big.mark = ","), " (", label_female0, ")", sep = ""), sep = "\n")) %>% # Assemble female label | |
mutate(label_male = case_when(label_female00 > 38~ "", label_female00 <= 38 | label_female00 > 90 ~ label_male)) %>% # I only want to show the male label where the female wage is less than 38% or more than 90% of the male wage | |
mutate(label_female = case_when(label_female00 > 38 & label_female00 < 90 ~ "", label_female00 > 90 ~ label_female)) %>% # Likewise for the female label | |
mutate(line_colour = case_when(label_female00 > 38 & label_female00 < 90 ~ "gray", label_female00 <= 38 ~ "blue", label_female00 > 90 ~ "green")) %>% # And I want to differentiate these by line colour | |
mutate(line_transparency = case_when(label_female00 > 38 & label_female00 < 90 ~ "0.4", label_female00 <= 38 ~ "0.9", label_female00 > 90 ~ "0.9")) # And to render the uninteresting lines more transparent | |
``` | |
```{r plot, message=FALSE} | |
theend <- 10 # for an xend in the ggplot | |
male_compared_plot <- ggplot(male_compared_jobs) + | |
geom_segment(aes(x=0, y = `Male Average Taxable Income`, xend=theend, yend = `Female Average Taxable Income`), alpha = male_compared_jobs$line_transparency, colour = male_compared_jobs$line_colour) + | |
theme(axis.ticks = element_blank(), | |
axis.text.x = element_blank(), | |
axis.text.y = element_blank()) + | |
theme_void() + # Probably replicating the three lines above to an extent | |
xlab("") + # Clean up labels | |
ylab("") + | |
geom_text(label = "Male", x = 0, y = (max(male_compared_jobs$`Male Average Taxable Income`)), hjust = -0.2, vjust = 0, size = 7, na.rm=TRUE) + # Create points on left y axis | |
geom_text(label = "Female", x = theend, y = (max(male_compared_jobs$`Male Average Taxable Income`)), vjust = 0, size = 7, hjust = 1.1, na.rm=TRUE) + # Create points on right y axis | |
geom_vline(xintercept = 0, linetype="dotted") + # Create left y axis line | |
geom_vline(xintercept = theend, linetype="dotted") + # Create right y axis line | |
geom_text_repel(label = male_compared_jobs$label_male, y = male_compared_jobs$`Male Average Taxable Income`, x = 0, segment.color = "red", na.rm=TRUE) + # Label male points | |
geom_text_repel(label = male_compared_jobs$label_female, y = male_compared_jobs$`Female Average Taxable Income`, x = theend, segment.color = "red", na.rm=TRUE) + # Label female points | |
labs(title = "The Australian Male-Female Income Gap", subtitle = "The 100 best-paid occupations as measured by average male income, with labels highlighting gender disparities") # Title | |
#ggsave("Male Jobs Compared to Female.png", plot = male_compared_plot) | |
``` | |
So I took the opportunity to try figuring out how to build a slopegraph in R. As [Cole Nussbaumer Knaflic](http://www.storytellingwithdata.com/blog/2014/03/more-on-slopegraphs) puts it, slopegraphs are great for highlighting comparisons between two groups, two points in time etc. Here is my attempt at visualising some of the data: | |
```{r print_plot, message=FALSE} | |
suppressMessages(male_compared_plot) | |
``` | |
```{r specific_occupations} | |
futures_traders <- salaries %>% filter(occupation == "Futures trader") | |
legal_occupations <- salaries %>% filter((str_detect(occupation, "Judge") & str_detect(occupation, "law")) | occupation == "Magistrate" | occupation == "Barrister" | occupation == "Lawyer; Solicitor") # str_detect used here where "Judge" and "law" aren't the full cell. | |
law_men <- legal_occupations %>% filter(gender == "Male") %>%# There is likely a handier way to do this with tidyr::spread | |
rename(Men = individuals) %>% | |
rename(`Average taxable income (men)` = average_taxable_income) | |
law_women <- legal_occupations %>% filter(gender == "Female") %>% # There is likely a handier way to do this with tidyr::spread | |
rename(Women = individuals) %>% | |
rename(`Average taxable income (women)` = average_taxable_income) | |
law <- full_join(law_men, law_women, by = "occupation") %>% | |
mutate(Occupation = str_replace_all(occupation, " \uFFFD law", "")) %>% | |
mutate(Occupation = str_replace_all(Occupation, "Lawyer; ", "")) %>% | |
mutate(`Men (%)` = round((Men/(Men+Women)*100), digits = 0)) %>% | |
mutate(`Women (%)` = round((Women/(Men+Women)*100), digits = 0)) %>% | |
mutate(`Women's income as % of men's` = paste(round((`Average taxable income (women)`/`Average taxable income (men)`)*100, digits = 0), "%", sep = "")) %>% | |
mutate(`Average taxable income (men)` = paste("AUS$", prettyNum(`Average taxable income (men)`, big.mark = ","), sep = "")) %>% | |
select(Occupation, `Men (%)`, `Women (%)`, `Average taxable income (men)`, `Women's income as % of men's`) %>% | |
arrange(desc(`Men (%)`)) | |
equal_pay <- law[1:2,] | |
equal_access <- law[3:4,] | |
``` | |
So, what we're looking at is a graph visualising the gender pay gap for the 100 best-paid occupations, as measured by average male income. I've used labels to highlight the five occupations with the worst disparities and the only three occupations where women earn 90% of men or more. | |
So: a critique. The labels obviously need work. They are wordy and it's not obvious which applies where. And the basic design is flawed: if I include more data by putting more labels in, the graph becomes completely cluttered. Likewise if I add more guidance to navigate the data. And lots of information is missing, especially regarding how gendered the occupations are in the first place. I thought about placing points of varying size on each axis to signify this, but there is no tidy way to do so within a single graph. | |
On the data itself, it would be interesting to seek a pattern relating income equality to equality of access, but that's for another day. Anecdotally, don't be fooled by the futures traders: there are `r futures_traders[2,5]` male futures traders in the survey and only `r futures_traders[1,5]` women. | |
Likewise for members of the legal profession. Incomes are more equal where [pay scales apply](http://www.justice.vic.gov.au/home/justice+system/courts+and+tribunals/judicial+salaries+and+entitlements). But women are less likely to occupy those roles. Where occupational access is more equal, employers are freer to set salaries, and women are paid less well. When it comes to the Bar, in all likelihood the male average income is positively skewed by the male-dominated big-earners at the top: | |
```{r print_law_table} | |
knitr::kable(equal_pay, format = "html", align = "l", caption = "Income is more equal but women have less access where pay scales apply") %>% | |
kable_styling(full_width = FALSE, bootstrap_options = "striped") | |
knitr::kable(equal_access, format = "html", align = "l", caption = "Access is more equal but women have less income where pay scales do not apply") %>% | |
kable_styling(full_width = FALSE, bootstrap_options = "striped") | |
``` | |
To my mind, this reflects the classic patterns of gender discrimination. Unless you are in a tightly regulated profession, disparities persist. And when it comes to the tightly regulated top of the profession, the career necessary for access is likely not available to enough women at all. | |
Gist with code [here](https://gist.github.com/cokelly/7ae45d5284d37857c139ce293146ab69). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment