Skip to content

Instantly share code, notes, and snippets.

Shawn Graham shawngraham

Block or report user

Report or block shawngraham

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@shawngraham
shawngraham / diary-scrape.r
Last active Nov 11, 2019
why is this not working? the loop is the problem
View diary-scrape.r
library(rvest)
base_url <- "https://www.masshist.org"
# Load the page
main.page <- read_html(x = "https://www.masshist.org/digitaladams/archive/browse/diaries_by_date.php")
# Get link URLs
urls <- main.page %>% # feed `main.page` to the next step
html_nodes("a") %>% # get the CSS nodes
html_attr("href") # extract the URLs
# Get link text
View text-analysis-and-topic-model-from-scraping-one-set-of-diaries.r
#let's fix the first column in scrape
#i want to remove the first three characters, leaving us with a date
#or at least something that looks like a date
#this removes the diary metadata from the date
scrape$id <- substring(scrape$id, 4)
#this creates a new column with just the month extracted
month <- str_sub(scrape$id, 5, 6)
scrape['month'] <- month
View topic-model-from-one-diary-scrape.r
#let's fix the first column in scrape
#i want to remove the first three characters, leaving us with a date
#or at least something that looks like a date
scrape$id <- substring(scrape$id, 4)
library(tm)
View scraping-one-set-of-diaries.r
library("rvest")
library(dplyr)
#https://francojc.github.io/2017/11/02/acquiring-data-for-language-research-web-scraping/
#modified
webpage <- "https://www.masshist.org/digitaladams/archive/browse/diaries_by_date.php"
html <- read_html(webpage) # read the raw html
View diaries-to-topicmodels.r
setwd("~/diaries")
library(tm)
#turn entries into a corpus object
docs <- Corpus(VectorSource(entries))
docs <- tm_map(docs, removePunctuation)
#Transform to lower case
docs <- tm_map(docs,content_transformer(tolower))
#Strip digits
View diary-scraper.r
#after https://francojc.github.io/2015/03/01/web-scraping-with-rvest-in-r/
library(rvest)
library(dplyr)
base_url <- "https://www.masshist.org"
# Load the page
main.page <- read_html(x = "https://www.masshist.org/digitaladams/archive/browse/diaries_by_date.php")
# Get link URLs
View getpics.py
"""
pip instal fitz
pip install PyMuPDF
"""
import fitz
doc = fitz.open("file.pdf")
for i in range(len(doc)):
for img in doc.getPageImageList(i):
xref = img[0]
@shawngraham
shawngraham / soc-talk.md
Last active Aug 19, 2019
social media talk notes
View soc-talk.md

Me & My Social Media Habits

Shawn Graham, Department of History, Carleton U @electricarchaeo

tenured-minion

...well, everyone needs an avatar, right? Social media is performative. Choose wisely.

image

image Kate Galloway; see slideshare link at bottom

View golems-in-the-city-with-colds.nlogo
@shawngraham
shawngraham / network-message.nlogo
Created Mar 25, 2019
message-on-a-network for netlogo
View network-message.nlogo
You can’t perform that action at this time.