Skip to content

Instantly share code, notes, and snippets.

@benmarwick
benmarwick / docs-per-topic.rmd
Last active August 29, 2015 13:56
How to find the topic with the highest proportion in a set of documents (after a topic model has been generated with the R package mallet)
Which documents belong to each topic?
Documents don't belong to a single topic, there is a distribution of topics
over each document.
But we can Find the topic with the highest proportion for each document.
That top-ranking topic might be called the 'topic' for the document, but note
that all docs have all topics to varying proportions
Assume that we start with `topic_docs` from the output of the mallet package
@benmarwick
benmarwick / gist:9204077
Last active August 29, 2015 13:56
RCloud - https://github.com/att/rcloud - setup on ubuntu
## Shell:
git clone --recursive https://github.com/cscheid/rcloud.git
sudo apt-get install libxt-dev libcurl4-openssl-dev libcairo2-dev libreadline-dev git
Create github app according to instructions here: https://github.com/att/rcloud
Edit conf/rcloud.conf according to instructions here: https://github.com/att/rcloud
@benmarwick
benmarwick / rCharts-in-RPres.Rpres
Last active August 29, 2015 13:57
Testing rCharts with R Presentation, output here: http://rpubs.com/benmarwick/rCharts-in-RPres
Interactive charts in the browser with the rCharts package
====
learning from https://gist.github.com/ramnathv/8118442
```{r setup, cache = F}
require(knitr)
opts_chunk$set(results = "asis", comment = NA, tidy = F)
options(RCHART_WIDTH = 600, RCHART_HEIGHT = 400)
```
# Joseph R. Mihaljevic
# July 2013
# (Partial) Bayesian analysis of variance, accounting for heteroscedasticity
# Generate some artificial data:
# Normally distributed groups, but heteroscedastic
a <- rnorm(25, mean=8, sd=10)
b <- rnorm(50, mean=5, sd=2)
c <- rnorm(25, mean=3, sd=.1)
d <- rnorm(25, mean=11, sd=3)
e <- rnorm(50, mean=13, sd=2)
@benmarwick
benmarwick / google-sheet-reliability-test.rmd
Last active August 29, 2015 14:00
Test for availability of curriculum forecast google sheet. Downloads the sheet every 10 mins over 24 h and tests to see if data are present.
---
output: html_document
---
# A quick check of the availability of the UW Anthropology curriculum forecast
I have heard occasional reports from faculty and students that the curriculum forecast on the anthropology department website shows no useful data. I have also observed this myself, that the cells in the spreadsheet are sometimes empty or show dashes. Given these reports and observations I thought it would be good to study the availability of the forecast so we can make an informed decision about whether or not the current setup is meeting our needs.
## Method
I wrote a script to automatically access the forecast webpage every 10 minutes, 24 hours per day, for ten days (script is here: [https://gist.github.com/benmarwick/11023760](https://gist.github.com/benmarwick/11023760)). The script checks to see if there are useful data in the 'this year' columns on the left side of the page and the 'next year' columns on the right side of the page. The script ran from 5 May to 15 May 2014. I informed the oth
# ipak function: install and load multiple R packages.
# check to see if packages are installed. Install them if they are not, then load them into the R session.
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
@benmarwick
benmarwick / saa2014-tweet-count.r
Last active August 29, 2015 14:00
snapshot of number of unique tweets from SAA2014 (unique means unique combination of username, message text and time, so a RT is not counted a duplicate because it has a different username and time)
# load csv file downloaded from google docs at 10:30am PST, 28 April 2014
# https://docs.google.com/spreadsheet/ccc?key=0Alr3EPKs-tcRdGhVRFNKeHVadDhHNGdGYU84Z255X1E&usp=drive_web
saa2014_1 <- read.csv("C:/Users/marwick/Downloads/SAA 2014 Tweets - Archive.csv", stringsAsFactors = FALSE)
# Second tweet archive, csv file downloaded from google docs at 10:30am PST, 28 April 2014
# https://docs.google.com/spreadsheet/ccc?key=0Ak6w3axv7XKTdHpHdFNfeFNKMk45WFVWQkhCeGdLMWc&usp=drive_web#gid=82
saa2014_2 <- read.csv("C:/Users/marwick/Downloads/%23SAA2014 file 2 - Archive.csv", stringsAsFactors = FALSE)
# combine two files
saa2014 <- rbind(saa2014_1, saa2014_2) # 26,611 rows
@benmarwick
benmarwick / archaeology-grand-challenges.r
Last active August 29, 2015 14:00
Sketch of a look at the 'grand challenges' of Kintigh et al. 2014 (http://www.pnas.org/content/111/3/879.full) in five archaeology journals
library(devtools)
install_github("benmarwick/JSTORr")
load("~/teamviewer/five_journals/five_journals.RData")
# looking at American Antiquity, Journal of World Prehistory, World Archaeology,
library(JSTORr)
# major groups of challenges
@benmarwick
benmarwick / channels.csv
Last active August 29, 2015 14:01
Get elemental data from the Bruker AXS Tracer handheld pXRF
130125-brickn679-500sec-5 Energy1 Energy2 Chan-Start Chan-End Chan-Counts Compton
CaKa1 3.6043 3.779 180.2158 188.9522 1208.35 10.102452
ScKa1 4.001 4.1802 200.0508 209.0092 856.45 7.1603729
TiKa1 4.419 4.6027 220.9488 230.1352 1683.66 14.076247
V Ka1 4.858 5.0464 242.9 252.32 671.9 5.6174401
CrKa1 5.3181 5.5113 265.9066 275.5654 477.09 3.9887208
MnKa1 5.7997 5.9978 289.9863 299.8887 1657.74 13.859547
FeKa1 6.3023 6.5053 315.1168 325.2672 99098.04 828.50964
CoKa1 6.8263 7.0343 341.3147 351.7173 8706.4 72.789874
NiKa1 7.3716 7.5847 368.5782 379.2368 892.66 7.4630925
@benmarwick
benmarwick / 0_reuse_code.js
Created May 28, 2014 18:54
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console