Skip to content

Instantly share code, notes, and snippets.

@wragge
wragge / functions-in-recordsearch.md
Last active March 31, 2017 03:04
Functions currently used in RecordSearch
Term Number of agencies Included in thesaurus
administrative law 169 crsthesaurus, agift1, agift2, agift3, recordsearch
administrative services 3 crsthesaurus, recordsearch
agriculture 118 crsthesaurus, recordsearch
air force 21 crsthesaurus, agift2, agift3, recordsearch
air force administration 23 crsthesaurus, recordsearch
air force commands 77 crsthesaurus, recordsearch
air operations 140 crsthesaurus, recordsearch
air safety 6 crsthesaurus, recordsearch
@wragge
wragge / question-titles-decade-tfidf.txt
Created April 11, 2017 11:25
Most significant words (via TF-IDF) in titles of questions asked in the House of Reps for each decade 1900-1979
1900
kanakas 0.0725222278008
stripper 0.0604896801498
employes 0.0599783041901
increments 0.0539607103026
creswell 0.0528185096992
drawback 0.0528185096992
masters 0.0528185096992
slanders 0.0528185096992
@wragge
wragge / hansard-speaker-similarities.md
Last active April 27, 2017 13:41
Similarities between 1970s members of the House of Reps, based on their speeches in Hansard (showing 10 most similar)

HOWSON, Peter (LP)

* VINER, Ian (LP)                          0.990147362266
* HUNT, Ralph (NCP/NP)                     0.989726340705
* BOWEN, Nigel (LP)                        0.989052644404
* SWARTZ, Reginald (LP)                    0.988994723622
* NEWMAN, Kevin (LP)                       0.988793740506
* GROOM, Ray (LP)                          0.988658546299
* EVERINGHAM, Douglas (ALP)                0.988432787038
  • STEWART, Francis (ALP) 0.98833413308
@wragge
wragge / similar-years.md
Last active May 4, 2017 03:20
Which years are similar to others (according to TF-IDF)?

1970

* 1971                                     0.999590684514
* 1969                                     0.999531189763
* 1972                                     0.99943879755
* 1968                                     0.99938776453
* 1967                                     0.999269072854
* 1973                                     0.998769846893
* 1974                                     0.998373370334
  • 1975 0.998230585413

Getting the text content of articles from the Australian Womens Weekly

The TroveHarvester makes it easy to download articles in bulk from Trove's digitised newspapers. Using the --text option you can also save the fulltext content of every article.

However, this doesn't work for the Australian Womens' Weekly as the full text is not available through the Trove API. Fortunately, the article text can be downloaded from the web interface.

The one-line script below uses wget, so make sure you have it installed before you go any further. (You can install it with Homebrew if you're using a Mac.)

Instructions

@wragge
wragge / speak.py
Last active August 31, 2017 11:25
Python script for MacOS that speaks interjections from Historic Hansard. The interjections, voice, speaking rate, volume, and delay are all set randomly.
import subprocess
from pymongo import MongoClient
import time
import random
import argparse
from credentials import MONGO_URL
HOUSES = {
'hofreps': 'House of Representatives',
@wragge
wragge / trove_copy_permalink.user.js
Last active November 7, 2017 23:35
Userscript to add a handy button to Trove works and versions to save permalink to clipboard.

Keybase proof

I hereby claim:

  • I am wragge on github.
  • I am wragge (https://keybase.io/wragge) on keybase.
  • I have a public key ASD92biquvI-sNM0M9TNP2jaS3vQn_1NQPAUU0YK2Dm-Wwo

To claim this, I am signing this object:

@wragge
wragge / slnsw-ms-catalogue-links.user.js
Created January 26, 2018 04:24
Userscript to change url of searches in SLNSW Pictures & Manuscripts catalogue to something easily shareable.
@wragge
wragge / archway-harvest-demo.csv
Last active June 10, 2018 14:01
Example of records harvested from Archway -- search for series '8333' and keyword 'Chinese'
We can make this file beautiful and searchable if this error is corrected: It looks like row 7 should actually have 15 columns, instead of 12. in line 6.
Access status,Accession,Agency,Alternative no.,Box/Item,Date,Former archives ref,Item ID,Part,Record group,Record no.,Record type,Sep,Series,Title
OPEN ACCESS,,ACGO,,2254 /,no date - no date,IA1,R14991199,,IA1,105/45,Text,,8333,Local Bodies - Chinese sick and wounded fund - Granted by Local Bodies (R14991199)
OPEN ACCESS,,ACGO,,2282 /,no date - no date,IA1,R14992099,,IA1,113/35,Text,,8333,Museum - Chinese curios - Gift from Rewi Alley to Christchurch Museum (R14992099)
RESTRICTED ACCESS,,ACGO,,2821 /,no date - no date,IA1,R19964568,1,IA1,116/2/18,Text,,8333,Chinese citizens naturalised in New Zealand - Lists of (R19964568)
RESTRICTED ACCESS,,ACGO,,2821 /,no date - no date,IA1,R19964569,2,IA1,116/2/18,Text,,8333,Chinese citizens naturalised in New Zealand - Lists of (R19964569)
RESTRICTED ACCESS,,ACGO,,2821 /,no date - no date,IA1,R19964570,3,IA1,116/2/18,Text,,8333,Chinese citizens naturalised in New Zealand - Lists of (R19964570)
RESTRICTED ACCESS,,ACGO,,2821 /,no date - no date,IA1,R19964571,4,IA1,116/2/18,