This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
install.packages("devtools") | |
devtools::install_github("ropensci/rdpla") | |
library('rdpla') | |
mykey = "PUT YOUR KEY FROM DP.LA HERE" | |
# do a query; here we want ids which we can feed to wget | |
itemlist = items(key=mykey, q="science", date_before=1900, page_size=100, fields=c("id")) | |
# this will write the ids to a list; you'll need to open it in a spreadsheet, remove the first row if it's not an id | |
write.csv(itemlist $data, "itemlist.csv", row.names=FALSE) | |
# save the csv to txt (utf 8), then you can pass to wget as in Exercise 4 at | |
# https://github.com/hist3907b-winter2015/module2-findingdata/blob/master/m2-exercises.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# By Ben Marwick, from: https://gist.github.com/benmarwick/11204658 with modifications by S. Graham | |
Short instructions to setup a Lubuntu Virtual Machine with | |
R & RStudio: | |
1. Download these: | |
http://lubuntu.net/ (Intel x86 desktop cd) | |
https://www.virtualbox.org/wiki/Downloads (Oracle VM VirtualBox) | |
2. Install Oracle VM VirtualBox, open it (if using windows, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
setwd("desktop/beals-new") | |
# give yourself as much memory as you've got | |
options(java.parameters = "-Xmx5120m") | |
library(rJava) | |
## from http://cran.r-project.org/web/packages/mallet/mallet.pdf | |
library(mallet) | |
#CND xml file transformed in browser into csv table. copy & paste into excel, saved as csv. Cut the column headers and paste them in the line below: | |
library(RCurl) |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Topic Modeling the Colonial Newspaper Database" | |
author: "Shawn Graham" | |
date: "February 17, 2015" | |
output: html_document | |
--- | |
In [Module 3](https://github.com/hist3907b-winter2015/module3-wranglingdata), we used TEI to mark up primary documents. Melodee Beals has been using TEI to markup newspaper articles, creating the [Colonial Newspapers Database](https://github.com/mhbeals/Colonial-Newspaper-Database) (which she shared on github). We then used Github Pages and an XLST stylesheet to convert that database into a table of comma-separated values <https://raw.githubusercontent.com/shawngraham/exercise/gh-pages/CND.csv>. We are now going to topic model the text of those newspaper articles, to see what patterns of discourse may lie within. | |
# Getting Started |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
id | |
568407121496137000 | |
568407114378395000 | |
568407104077193000 | |
568407096242253000 | |
568407089673957000 | |
568407069016981000 | |
568407057214234000 | |
568406964599791000 | |
568406941086527000 |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.