Skip to content

Instantly share code, notes, and snippets.

Avatar

Shawn Graham shawngraham

View GitHub Profile
View simple-scraper.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@shawngraham
shawngraham / bib.xsl
Last active Oct 15, 2020
sample metadata from jstor-dfr, and an xsl file meant to turn it into a citations.tsv with: xsltproc bib.xsl *.xml > citations.tsv
View bib.xsl
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="article/front">
"<xsl:value-of select="article-meta/contrib-group/contrib/string-name/surname" />, <xsl:value-of select="article-meta/contrib-group/contrib/string-name/given-names" />" <xsl:text>&#9;</xsl:text><xsl:value-of select="article-meta/pub-date/year" /> <xsl:text>&#9;</xsl:text> "<xsl:value-of select="article-meta/title-group/article-title" />" <xsl:text>&#9;"</xsl:text><xsl:value-of select="journal-meta/journal-title-group/journal-title"/>"<xsl:text>&#9;</xsl:text> "<xsl:value-of select="article-meta/volume" />(<xsl:value-of select="article-meta/issue" />)" <xsl:text>&#9;</xsl:text> "<xsl:value-of select="article-meta/page-range" />" <xsl:text>&#9;</xsl:text>"<xsl:value-of select="article-meta/article-id" />" <xsl:text>&#9;</xsl:text><xsl:value-of select="article-meta/self-uri" /><xsl:text>&#xA;</xsl:text>
</xsl:for-each>
@shawngraham
shawngraham / entities.csv
Last active Jul 22, 2020
giving nertwork a spin on chapbooks from the national library of scotland. cancelled out of the script because my machine was running hot; did extract some 13 000 entities though
View entities.csv
doc entity entityType count
104184105 ABERDEEN organization 1
104184105 Navy organization 1
104184105 Sceptre organization 1
104184105 1 person 1
104184105 Brodie person 2
104184105 Cromar person 2
104184105 Earl person 1
104184105 Fife person 3
104184105 Glen person 1
View splitAudio.py
#!/usr/bin/python
## Split audio files into chunks
## Daniel Pett 1/5/2020
__author__ = 'portableant'
## Tested on Python 2.7.13
import argparse
import os
import speech_recognition as sr
@shawngraham
shawngraham / getNotes.scpt
Created Mar 11, 2020
extract notes from skim to clipboard
View getNotes.scpt
(* Inspired and modified based on http://drosophiliac.com/2012/09/an-academic-notetaking-workflow.html and https://gist.github.com/smargh/6068104 *)
(* PROPERTIES *)
property LF : (ASCII character 10)
property tid : AppleScript's text item delimiters
(* THE SCRIPT *)
tell application "Skim"
set the clipboard to ""
activate
View now, extract the lat,long
import json
with open('ottawadata.json') as f:
data = json.load(f)
....now, what cunning piece of code would do the trick? with jqplay I can get eg latitude with
.content.indexedStructured.geoLocation[]|.points[].latitude.content
anyway... off to read some basic python stuff I guess.
View topic-model-john-adams-diaries.r
# Topic Modeling John Adams' Diaries
# slightly modified version of
# https://tm4ss.github.io/docs/Tutorial_6_Topic_Models.html
# by Andreas Niekler, Gregor Wiedemann
library(tidyverse)
library(tidytext)
# go get the diaries
# these were scraped from
View johnadams.csv
id date text
1 1753-06-08 At Colledge. A Clowdy ; Dull morning and so continued till about 5 a Clock when it began to rain ; moderately But continued not long But remained Clowdy all night in which night I watched with Powers.
2 1753-06-09 At Colledge the weather still remaining Clowdy all Day till 6 o'Clock when the Clowds were Dissipated and the sun brake forth in all his glory.
3 1753-06-10 At Colledge a clear morning. Heard Mr. Appleton expound those words in I.Cor.12 Chapt. 7 first verses and in the afternoon heard him preach from those words in 26 of Mathew 41 verse watch and pray that ye enter not into temptation.
4 1753-06-11 At Colledge a fair morning and pretty warm. About 2 o'Clock there appeared some symptoms of an approaching shower attended with some thunder and lightning.
5 1753-06-12 At Colledge a Clowdy morning heard Dr. Wigglesworth Preach from the 20 Chapter of exodus 8 9 and 10th. Verses.
6 1753-06-13 At Colledge a Cloudy morning about 10 o'Clock the Sun shone out very warm but abo
View John-Adams-Diaries.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View text-analysis-and-topic-model-from-scraping-one-set-of-diaries.r
#let's fix the first column in scrape
#i want to remove the first three characters, leaving us with a date
#or at least something that looks like a date
#this removes the diary metadata from the date
scrape$id <- substring(scrape$id, 4)
#this creates a new column with just the month extracted
month <- str_sub(scrape$id, 5, 6)
scrape['month'] <- month