Skip to content

Instantly share code, notes, and snippets.

@briatte
Last active December 22, 2015 01:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save briatte/6398463 to your computer and use it in GitHub Desktop.
Save briatte/6398463 to your computer and use it in GitHub Desktop.
Use Quandl to download country-year observations from the Luxembourg Income Study.
setwd("~/Documents/Data/LIS")
## list of countries
lis = read.csv("lis.tsv", sep = "\t", header = FALSE)
lis$ctn = countrycode(lis$V1, "country.name", "continent")
len = sapply(lis$V1, function(x) max(lis$V3[lis$V1 == x]) - min(lis$V2[lis$V1 == x]))
qplot(data = lis, y = reorder(V1, len, mean), yend = V1, x = V2, xend = V3,
size = I(2), geom = "segment") +
theme_minimal(16) +
labs(y = NULL, x = NULL)
lis = toupper(lis[, 1])
# lis = c("AUSTRIA", "BELGIUM", "BRAZIL", "CANADA", "CHINA", "COLOMBIA",
# "CZECH REPUBLIC", "DENMARK", "ESTONIA", "FINLAND", "FRANCE",
# "GERMANY", "GREECE", "GUATEMALA", "HUNGARY", "INDIA", "IRELAND",
# "ISRAEL", "ITALY", "JAPAN", "LUXEMBOURG", "MEXICO", "NETHERLANDS",
# "NORWAY", "PERU", "POLAND", "ROMANIA", "RUSSIA", "SLOVAK REPUBLIC",
# "SLOVENIA", "SOUTH AFRICA", "SOUTH KOREA", "SPAIN", "SWEDEN",
# "SWITZERLAND", "TAIWAN", "UNITED KINGDOM", "UNITED STATES", "URUGUAY")
## ask for the data
library(Quandl)
library(plyr)
library(reshape)
lis = lapply(lis, function(country) {
q = Quandl(paste0("LIS/LIS_", gsub("\\s", "", country)), collapse = "annual",
authcode = "[TOKEN]")
q = melt(q, id = "Year")
year = as.numeric(substr(q$Year, 0, 4))
q = cbind(country, year, q)
q$Year = NULL
return(q)
})
lis = rbind.fill(lis)
sort_df(lis, c("cty", "year"))
str(lis)
if(!file.exists("lis.rda")) save(lis, file = "lis.rda")
load("lis.rda")
## plot
library(countrycode)
library(ggplot2)
library(RColorBrewer)
eur = (countrycode(lis$country, "country.name", "continent") == "Europe")
usa = (lis$country == "UNITED STATES")
var = "Percentile Ratio (90/10)"
LIS = subset(lis[eur | usa, ], variable == var)
(mu = mean(LIS$value, na.rm = TRUE))
(sd = sd(LIS$value, na.rm = TRUE))
simpleCap <- function(x) {
s <- strsplit(x, " ")[[1]]
paste(toupper(substring(s, 1,1)), substring(s, 2),
sep="", collapse=" ")
}
LIS$country = sapply(as.character(tolower(LIS$country)), simpleCap)
qplot(data = LIS, x = year, y = value, group = country,
colour = factor(as.integer(value)), size = I(1), geom = "line") +
geom_hline(y = mu, linetype = "dashed", colour = "grey") +
geom_hline(y = mu + sd, linetype = "dotted", colour = "grey") +
geom_hline(y = mu - sd, linetype = "dotted", colour = "grey") +
scale_colour_manual("", values = rev(brewer.pal(5, "RdYlGn"))) +
facet_wrap(~ country) +
theme_minimal(16) +
theme(axis.line.y = element_line(size = 1, colour = "black"),
panel.grid = element_blank(),
legend.position = "none") +
labs(x = NULL, y = paste0(var, "\n"))

You will need to insert your authentification token where the code says [TOKEN].

The data availability ranges from null to full in function of the country of interest:

Here's a quick plot of the 90/10 percentile ratio for European countries:

The capitalization of country names is performed by a function from Stack Overflow.

We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 2 columns, instead of 3. in line 1.
Australia 1981 2003
Austria 1987 2004
Belgium 1985 2000
Brazil 2006 2006
Canada 1971 2007
China 2002 2002
Colombia 2004 2010
Czech Republic 1992 2004
Denmark 1987 2004
Estonia 2000 2004
Finland 1987 2004
France 1979 2005
Germany 1973 2010
Greece 1995 2010
Guatemala 2006 2006
Hungary 1991 2005
India 2004 2004
Ireland 1987 2004
Israel 1979 2007
Italy 1986 2010
Japan 2008 2008
Luxembourg 1985 2004
Mexico 1984 2004
Netherlands 1983 2004
Norway 1979 2004
Peru 2004 2004
Poland 1986 2004
Romania 1995 1997
Russia 2000 2000
Slovak Republic 1992 2010
Slovenia 1997 2004
South Africa 2008 2010
South Korea 2006 2006
Spain 1980 2010
Sweden 1967 2005
Switzerland 1982 2004
Taiwan 1981 2005
United Kingdom 1969 2010
United States 1974 2010
Uruguay 2004 2004
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment