Skip to content

Instantly share code, notes, and snippets.

@thoughtfulbloke
Created July 25, 2022 09:21
Show Gist options
  • Save thoughtfulbloke/2981397b9a4c1f7e3c9f3c832883c861 to your computer and use it in GitHub Desktop.
Save thoughtfulbloke/2981397b9a4c1f7e3c9f3c832883c861 to your computer and use it in GitHub Desktop.
```{r}
library(readr)
library(dplyr)
library(lubridate)
```
David's usage notes for the wastewater data at: https://github.com/ESR-NZ/covid_in_wastewater
I am reading the files directly of github, rather than downloading and then reading locally.
The two key files are ww_data_all.csv, which contains all of the sampling data, and sites.csv which contains information about the testing locations. These have been aggregated into site, region, and national weekly data with accompanying cases for the (combined) catchment areas. For the sites level aggregations one issue is that meshblocks area for cases do not match the catchment boundary population borders, and cases from the entire meshblock, in those cases, are assigned to the every catchment they cross. this leads to an overestimate of cases in certain sites. Site level aggregation also supresses small site summaries for small numbers of people with covid within an area and privacy.
```{r}
ww_data_all.csv <- read_csv("https://raw.githubusercontent.com/ESR-NZ/covid_in_wastewater/main/data/ww_data_all.csv",
col_types=cols(
SampleLocation = col_character(),
sars_gcl = col_double(),
Collected = col_date(format = ""),
Result = col_character(),
copies_per_day_per_person = col_double()
))
sites.csv <- read_csv("https://raw.githubusercontent.com/ESR-NZ/covid_in_wastewater/main/data/sites.csv",
col_types=cols(
SampleLocation = col_character(),
DisplayName = col_character(),
SampleType = col_character(),
Latitude = col_double(),
Longitude = col_double(),
Population = col_double(),
Region = col_character(),
shp_label = col_character()
))
```
Which can be merged on the basis of SampleLocation
```{r}
samples <- ww_data_all.csv %>%
inner_join(sites.csv, by="SampleLocation")
```
The sample itself is stored in sars_gcl, the number of SARS-CoV-2 genome copies per litre of wastewater.
Then there is the derived estimate, copies per day per person, for which the geographic area is the catchment.
So, for example, we can aggregate up to TA level and the week ending dates (Sunday) rather than collection dates to get population, then blend in the catchment case data for each region. Noting also there can be multiple samples for a site, so that needs some handling.
```{r}
cases_regional.csv <- read_csv("https://raw.githubusercontent.com/ESR-NZ/covid_in_wastewater/main/data/cases_regional.csv",
col_types = cols(
week_end_date = col_date(format = ""),
Region = col_character(),
case_7d_avg = col_double()))
ww_regional.csv <- read_csv("https://raw.githubusercontent.com/ESR-NZ/covid_in_wastewater/main/data/ww_regional.csv",
col_types=cols(
week_end_date = col_date(format = ""),
Region = col_character(),
copies_per_day_per_person = col_double(),
n_sites = col_double()))
regional_summary <- samples %>%
mutate(week_end_date = ceiling_date(Collected, unit = "week")) %>%
group_by(SampleLocation, Region, week_end_date) %>%
summarise(Population = mean(Population),
.groups = "drop") %>%
group_by(Region, week_end_date) %>%
summarise(Population = sum(Population),
sites_in_data = n(),
.groups = "drop")
regional_summary %>%
inner_join(cases_regional.csv, by = c("Region", "week_end_date")) %>%
inner_join(ww_regional.csv, by = c("Region", "week_end_date")) %>%
View()
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment