Skip to content

Instantly share code, notes, and snippets.

benmarwick / docs-per-topic.rmd
Last active Aug 29, 2015
How to find the topic with the highest proportion in a set of documents (after a topic model has been generated with the R package mallet)
View docs-per-topic.rmd
Which documents belong to each topic?
Documents don't belong to a single topic, there is a distribution of topics
over each document.
But we can Find the topic with the highest proportion for each document.
That top-ranking topic might be called the 'topic' for the document, but note
that all docs have all topics to varying proportions
Assume that we start with `topic_docs` from the output of the mallet package
View gist:9204077
## Shell:
git clone --recursive
sudo apt-get install libxt-dev libcurl4-openssl-dev libcairo2-dev libreadline-dev git
Create github app according to instructions here:
Edit conf/rcloud.conf according to instructions here:
benmarwick / bayes-regression-slopes.R
Created Mar 11, 2014
Simple Bayesian methods of linear regression and testing for significant differences between regression line slopes
View bayes-regression-slopes.R
# Investigating some simple Bayesian methods of linear regression
# and testing for significant differences between regression line slopes
n <- 100
x <- seq(n)
y1 = 5 * x + 150
y2 = 1.5 * x + 50
d1 <- data.frame(x1 = x + rnorm(n, sd = n/10),
y1 = y1)
benmarwick / rCharts-in-RPres.Rpres
Last active Aug 29, 2015
Testing rCharts with R Presentation, output here:
View rCharts-in-RPres.Rpres
Interactive charts in the browser with the rCharts package
learning from
```{r setup, cache = F}
opts_chunk$set(results = "asis", comment = NA, tidy = F)
options(RCHART_WIDTH = 600, RCHART_HEIGHT = 400)
# Joseph R. Mihaljevic
# July 2013
# (Partial) Bayesian analysis of variance, accounting for heteroscedasticity
# Generate some artificial data:
# Normally distributed groups, but heteroscedastic
a <- rnorm(25, mean=8, sd=10)
b <- rnorm(50, mean=5, sd=2)
c <- rnorm(25, mean=3, sd=.1)
d <- rnorm(25, mean=11, sd=3)
e <- rnorm(50, mean=13, sd=2)
benmarwick / google-sheet-reliability-test.rmd
Last active Aug 29, 2015
Test for availability of curriculum forecast google sheet. Downloads the sheet every 10 mins over 24 h and tests to see if data are present.
View google-sheet-reliability-test.rmd
output: html_document
# A quick check of the availability of the UW Anthropology curriculum forecast
I have heard occasional reports from faculty and students that the curriculum forecast on the anthropology department website shows no useful data. I have also observed this myself, that the cells in the spreadsheet are sometimes empty or show dashes. Given these reports and observations I thought it would be good to study the availability of the forecast so we can make an informed decision about whether or not the current setup is meeting our needs.
## Method
I wrote a script to automatically access the forecast webpage every 10 minutes, 24 hours per day, for ten days (script is here: []( The script checks to see if there are useful data in the 'this year' columns on the left side of the page and the 'next year' columns on the right side of the page. The script ran from 5 May to 15 May 2014. I informed the oth
View ipak.R
# ipak function: install and load multiple R packages.
# check to see if packages are installed. Install them if they are not, then load them into the R session.
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
benmarwick / saa2014-tweet-count.r
Last active Aug 29, 2015
snapshot of number of unique tweets from SAA2014 (unique means unique combination of username, message text and time, so a RT is not counted a duplicate because it has a different username and time)
View saa2014-tweet-count.r
# load csv file downloaded from google docs at 10:30am PST, 28 April 2014
saa2014_1 <- read.csv("C:/Users/marwick/Downloads/SAA 2014 Tweets - Archive.csv", stringsAsFactors = FALSE)
# Second tweet archive, csv file downloaded from google docs at 10:30am PST, 28 April 2014
saa2014_2 <- read.csv("C:/Users/marwick/Downloads/%23SAA2014 file 2 - Archive.csv", stringsAsFactors = FALSE)
# combine two files
saa2014 <- rbind(saa2014_1, saa2014_2) # 26,611 rows
benmarwick / archaeology-grand-challenges.r
Last active Aug 29, 2015
Sketch of a look at the 'grand challenges' of Kintigh et al. 2014 ( in five archaeology journals
View archaeology-grand-challenges.r
# looking at American Antiquity, Journal of World Prehistory, World Archaeology,
# major groups of challenges
benmarwick / channels.csv
Last active Aug 29, 2015
Get elemental data from the Bruker AXS Tracer handheld pXRF
View channels.csv
130125-brickn679-500sec-5 Energy1 Energy2 Chan-Start Chan-End Chan-Counts Compton
CaKa1 3.6043 3.779 180.2158 188.9522 1208.35 10.102452
ScKa1 4.001 4.1802 200.0508 209.0092 856.45 7.1603729
TiKa1 4.419 4.6027 220.9488 230.1352 1683.66 14.076247
V Ka1 4.858 5.0464 242.9 252.32 671.9 5.6174401
CrKa1 5.3181 5.5113 265.9066 275.5654 477.09 3.9887208
MnKa1 5.7997 5.9978 289.9863 299.8887 1657.74 13.859547
FeKa1 6.3023 6.5053 315.1168 325.2672 99098.04 828.50964
CoKa1 6.8263 7.0343 341.3147 351.7173 8706.4 72.789874
NiKa1 7.3716 7.5847 368.5782 379.2368 892.66 7.4630925