Skip to content

Instantly share code, notes, and snippets.

install.packages("devtools")
devtools::install_github("ropensci/rdpla")
library('rdpla')
mykey = "PUT YOUR KEY FROM DP.LA HERE"
# do a query; here we want ids which we can feed to wget
itemlist = items(key=mykey, q="science", date_before=1900, page_size=100, fields=c("id"))
# this will write the ids to a list; you'll need to open it in a spreadsheet, remove the first row if it's not an id
write.csv(itemlist $data, "itemlist.csv", row.names=FALSE)
# save the csv to txt (utf 8), then you can pass to wget as in Exercise 4 at
# https://github.com/hist3907b-winter2015/module2-findingdata/blob/master/m2-exercises.md
@shawngraham
shawngraham / getting a History Machine
Last active August 29, 2015 14:14
setting up a history research machine. Follow the instructions.
# By Ben Marwick, from: https://gist.github.com/benmarwick/11204658 with modifications by S. Graham
Short instructions to setup a Lubuntu Virtual Machine with
R & RStudio:
1. Download these:
http://lubuntu.net/ (Intel x86 desktop cd)
https://www.virtualbox.org/wiki/Downloads (Oracle VM VirtualBox)
2. Install Oracle VM VirtualBox, open it (if using windows,
setwd("desktop/beals-new")
# give yourself as much memory as you've got
options(java.parameters = "-Xmx5120m")
library(rJava)
## from http://cran.r-project.org/web/packages/mallet/mallet.pdf
library(mallet)
#CND xml file transformed in browser into csv table. copy & paste into excel, saved as csv. Cut the column headers and paste them in the line below:
library(RCurl)
@shawngraham
shawngraham / archaeology-geolocatedtweets.geojson
Created February 17, 2015 01:44
twarc scrape of "archaeology", geolocated tweets
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@shawngraham
shawngraham / gist:7efd64c08a94c39a593f
Last active August 29, 2015 14:15
CND-topic-model-with-guidance.rmd
---
title: "Topic Modeling the Colonial Newspaper Database"
author: "Shawn Graham"
date: "February 17, 2015"
output: html_document
---
In [Module 3](https://github.com/hist3907b-winter2015/module3-wranglingdata), we used TEI to mark up primary documents. Melodee Beals has been using TEI to markup newspaper articles, creating the [Colonial Newspapers Database](https://github.com/mhbeals/Colonial-Newspaper-Database) (which she shared on github). We then used Github Pages and an XLST stylesheet to convert that database into a table of comma-separated values <https://raw.githubusercontent.com/shawngraham/exercise/gh-pages/CND.csv>. We are now going to topic model the text of those newspaper articles, to see what patterns of discourse may lie within.
# Getting Started
@shawngraham
shawngraham / geolooting.geojson
Created February 19, 2015 13:57
looting tweets
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@shawngraham
shawngraham / id.txt
Created February 19, 2015 14:07
list of 'looting' tweets by id - use TWARC hydrate command to get the original tweets again (thus complying with twitter tos)
id
568407121496137000
568407114378395000
568407104077193000
568407096242253000
568407089673957000
568407069016981000
568407057214234000
568406964599791000
568406941086527000
@shawngraham
shawngraham / geolooting-russian.geojson
Created February 19, 2015 14:13
geolocated tweets with russian 'мародерство' ('looting')
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@shawngraham
shawngraham / antiquities.geojson
Created February 19, 2015 20:24
antiquities via twarc, geotagged
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@shawngraham
shawngraham / geolootedtweets.json
Created February 19, 2015 22:51
'looted' search on twitter, geolocated tweets
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.