Skip to content

Instantly share code, notes, and snippets.

@DrSkippy
Created August 24, 2012 17:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DrSkippy/3453523 to your computer and use it in GitHub Desktop.
Save DrSkippy/3453523 to your computer and use it in GitHub Desktop.
Distribution of languages on Twitter - plot order, freqpoly and grid.arrange different sized plots
#!/usr/bin/env Rscript
library(ggplot2)
library(stringr)
library(gridExtra)
fmt <- function(){
function(x) format(x,nsmall = 2,scientific = FALSE)
}
Y = read.delim("./distro.csv", sep=",", header=TRUE)
Y$rat <- 100 * Y$count/sum(Y$count)
Y$lan <- with(Y, factor(Y$language, levels=Y[order(-rat), ]$language))
p1 <- ggplot(data=Y) +
geom_bar(aes(lan, rat), color="orange", fill="orange", alpha = 0.5, stat="identity") +
xlab("Language") + ylab("% of Tweets") +
opts(legend.position = 'none',
panel.background = theme_rect(fill = "#545454"),
panel.grid.major = theme_line(colour = "#757575"),
panel.grid.minor = theme_line(colour = "#757575"),
title = "Twitter Language Distribution")
p2 <- ggplot(data=Y) +
geom_freqpoly(aes(lan, rat, group=1), size=2, color="orange", alpha = 0.5, stat="identity") +
scale_y_log10(labels = fmt(), breaks = c(0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1,2,5, 10)) +
xlab("Language") + ylab("% of Tweets") +
opts(legend.position = 'none',
panel.background = theme_rect(fill = "#545454"),
panel.grid.major = theme_line(colour = "#757575"),
panel.grid.minor = theme_line(colour = "#757575"),
title = "Twitter Language Distribution - Detail of Tail (log scale)")
png(filename = "distro.png", width = 800, height = 600, units = 'px')
print( grid.arrange(
p1, p2, ncol = 1, heights=unit.c(unit(0.3, "npc"),unit(0.7, "npc")) )
)
dev.off()
count language
980049536 en
194203896 ja
188892666 es
103425034 pt
30962072 id
23544888 fr
19304103 nl
14927804 tr
14657986 ko
13140502 ar
11087671 ru
8708691 it
4866055 de
2070109 th
734441 sv
540976 pl
405598 zh-cn
354493 zh-tw
322538 no
247144 fil
214515 da
196013 ca
170394 hu
140598 msa
129151 fi
54120 he
32670 fa
20586 hi
14265 uk
2291 ur
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment