Skip to content

Instantly share code, notes, and snippets.

@halhen
Last active October 25, 2023 01:20
Show Gist options
  • Save halhen/659780120accd82e043986c8b57deae0 to your computer and use it in GitHub Desktop.
Save halhen/659780120accd82e043986c8b57deae0 to your computer and use it in GitHub Desktop.
# data from http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography/geostat
# Originally seen at http://spatial.ly/2014/08/population-lines/
# So, this blew up on both Reddit and Twitter. Two bugs fixed (southern Spain was a mess,
# and some countries where missing -- measure twice, submit once, damnit), and two silly superflous lines removed after
# @hadleywickham pointed that out. Also, switched from geom_segment to geom_line.
# The result of the code below can be seen at http://imgur.com/ob8c8ph
library(tidyverse)
read_csv('../data/geostat-2011/GEOSTAT_grid_POP_1K_2011_V2_0_1.csv') %>%
rbind(read_csv('../data/geostat-2011/JRC-GHSL_AIT-grid-POP_1K_2011.csv') %>%
mutate(TOT_P_CON_DT='')) %>%
mutate(lat = as.numeric(gsub('.*N([0-9]+)[EW].*', '\\1', GRD_ID))/100,
lng = as.numeric(gsub('.*[EW]([0-9]+)', '\\1', GRD_ID)) * ifelse(gsub('.*([EW]).*', '\\1', GRD_ID) == 'W', -1, 1) / 100) %>%
filter(lng > 25, lng < 60) %>%
group_by(lat=round(lat, 1), lng=round(lng, 1)) %>%
summarize(value = sum(TOT_P, na.rm=TRUE)) %>%
ungroup() %>%
complete(lat, lng) %>%
ggplot(aes(lng, lat + 5*(value/max(value, na.rm=TRUE)))) +
geom_line(size=0.4, alpha=0.8, color='#5A3E37', aes(group=lat), na.rm=TRUE) +
ggthemes::theme_map() +
coord_equal(0.9)
ggsave('/tmp/europe.png', width=10, height=10)
@halhen
Copy link
Author

halhen commented May 12, 2017

@jwhendy

it appears ggthemes is not loaded via tidyverse; one has to load it correct?

IIRC, yes. Prefixing with ggthemes:: is probably in my fingers for a reason (which by all means may be that I simply started doing it)

I've never seen such a continuous usage of %>% before! Is the format you used primarily to reproduce as a one-liner? If so, am I correct to believe one's workflow wouldn't typically do this until the final code was known (otherwise you repeat the reading in of data every time you tweak the plot)?

Oh, no, on the contrary. I write 10-30 line %>% flows for most of my analyses. Getting into the pipe way of thinking is super convenient. My hurdle was getting over the vectorization mindset (which is kinda' orthogonal to this anyways). From a naive standpoint %>% simply replaces intermediate variables, or nested functions. If you ever catch yourself doing either, %>% is quite likely a better choice.

I use group_by() quite a bit and have never passed it some var = fun(var) argument before. I take it you're grouping by rounded lat and lon to sort of "cluster" your summed populations (some set of lat/lon combination will have the same summed population since they were in the same group, but they retain their individual values for plotting)?

group_by(var = fun(x)) creates a new variable within the data frame named var with fun(x) as it's value. It's a convenience over mutate(var=fun(x)) %>% group_by(var)

Thanks for making this. I've been meaning to dive deeper into Hadley's magical land of tidyverse and keep not getting around to it. I learned the separate() function as a result of this. and at least more familiar with filter, ungroup, complete, and select, so thanks for posting the code an indirect motivation for me!

http://r4ds.had.co.nz/ . Buy it, read it, practice. Tidyverse is a gift from heaven.

@Meredith95
Copy link

May I ask the meaning of the strings, like '1kmN2689E4337'? Can these represent real geographic coordinates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment