Skip to content

Instantly share code, notes, and snippets.

View cmharlow's full-sized avatar

Christina Harlow cmharlow

View GitHub Profile
@cmharlow
cmharlow / dltn_date_patterns
Last active August 29, 2015 14:26
DLTN Date Patterns
Patterns and sample data taken from all DLTN Phase 1 sets original, local data exposed to DLTN - 162019 records
Elements pulled: dc:date, split those containing semi-colons and ' and ', made all data lowercase
**Patterns:**
EDTF Already, with possible space clean up/removal as needed
^\d{4}$
^\d{4}-\d{2}$
@cmharlow
cmharlow / gist:14729d937488e2530b44
Created August 10, 2015 16:06
UTK MODS/RDF Titles Examples
For the various titles of an object, what elements, subelements, and attributes do the institutions of this group deal with?
titleInfo@type=[abbreviated | translated | alternative | uniform]/title
- although discussion of following more title then alternativeTitle for rest is valid (and current approach in MODSRDF v1 from LoC)
- uniform is good to have in theory but perhaps not much used in practice currently. Not currently represented in other RDF-modeled ontologies other than modsrdfv1 I think?
- MODSRDFv1 from LoC currently models modsrdf:title as modsrdf:property with range madsrdf:Title
titleInfo[@authority | @authorityURI | @valueURI]/title
- all captured by use of URIs for titles that will be taken from external authority
titleInfo@usage="primary"/title
- will this be captured by having just 1 title property and rest alternativeTitle properties?
@cmharlow
cmharlow / marc_recon.pl
Created August 29, 2015 00:38
MARC Recon with Catmandu
#!/usr/bin/env perl
#
# Match 100,110,700,710,650,655 against LCNAF, LCSH, AAT
#
# this is a watery mirror of the genius that is Patrick Hochstenbach <Patrick.Hochstenbach@UGent.be>
# his work (which sparked the below) on this here: https://gist.github.com/phochste/c87c81c79d8b8a6a2179
#
# the data creature that wrote the following: christina harlow, @cm_harlow on twitter
#
$|++;
function waybackAPIcall(input) {
var APIURL = "http://www.archive.org/wayback/available?url=";
var prefyear = "&timestamp=2008";
var URL = APIURL.concat(input.concat(prefyear));
var data = UrlFetchApp.fetch(URL);
Utilities.sleep(1000);
return(data.getContentText());
}
function availableinIA(input) {
@cmharlow
cmharlow / urlemailgeneration.js
Created October 7, 2015 17:15
C4L16 Google Form URL/Email generation script
// This is function taken from Luke's google doc, except a trigger for email form is added to end, to call confirmationEmail function
function genEditUrls() {
// Use data collected from dialog to manipulate the spreadsheet.
var ss = SpreadsheetApp.getActiveSpreadsheet();
var currentSheet = ss.getActiveSheet();
var formUrl = ss.getFormUrl();
var form = FormApp.openByUrl(formUrl);
//Change the sheet name as appropriate
var data = currentSheet.getDataRange().getValues();
@cmharlow
cmharlow / YorkUnivDigLibrary.txt
Last active October 13, 2015 20:16
York University Digital Library MODS QA Report (just playing around by Christina)
mods:abstract: |====================== | 22820/25050 | 91%
mods:accessCondition: |===================== | 21213/25050 | 84%
mods:classification: | | 328/25050 | 1%
mods:genre: |======================= | 24037/25050 | 95%
mods:identifier: |======================== | 24966/25050 | 99%
mods:language: | | 51/25050 | 0%
mods:language/mods:languageTerm: |================= | 17787/25050 | 71%
mods:location/mods:physicalLocation: |==================== | 20955/25050 | 83%
mods:location/mods:url: | | 791/25050 | 3%
mods:abstract: |========== | 17205/41438 | 41%
mods:accessCondition: |====================== | 36939/41438 | 89%
mods:classification: | | 461/41438 | 1%
mods:extension/{info:flvc/manifest/v1}flvc/{info:flvc/manifest/v1}objectHistory: |============== | 24788/41438 | 59%
mods:extension/{info:flvc/manifest/v1}flvc/{info:flvc/manifest/v1}otherLogo: | | 345/41438 | 0%
mods:extension/{info:flvc/manifest/v1}flvc/{info:flvc/manifest/v1}owningInstitution: |======================== | 41273/41438 | 99%
mods:extension/{info:flvc/manifest/v1}flvc/{info:flvc/manifest/v1}submittingInstitution: |================= | 28502/41438 | 68%
@cmharlow
cmharlow / dplaafter2020
Created October 15, 2015 01:28
last metadata qa playing around - for all dpla records after 2020
@context: |======================== | 5219/5223 | 99%
@context.@vocab: | | 4/5223 | 0%
@context.LCSH: | | 4/5223 | 0%
@context.aggregatedDigitalResource: | | 4/5223 | 0%
@context.begin.@id: | | 4/5223 | 0%
@context.begin.@type: | | 4/5223 | 0%
@context.collection: | | 4/5223 | 0%
@cmharlow
cmharlow / mashcat22Oct15.txt
Created October 22, 2015 22:10
MashCat Post 10/22/15 Chat (Authorities) TwarcReport CLI Report
Count: 549
Users: 80
User percentiles: █▃▂▁▁▁▁▁▁▁
[58, 17, 9, 5, 3, 1, 1, 1, 1, 1]
Has hashtag: 543 (98.91%)
Hashtags: 22
Hashtags percentiles: █▁▁▁▁▁▁▁▁▁
[102, 4, 1, 1, 0, 0, 0, 0, 0, 0]
Has URL: 107 (19.49%)
URLs: 33
{
"title": "Getty Linked Data Linked Data Fragments server",
"baseURL": "http://localhost:5000/",
"datasources": {
"getty-aat": {
"title": "Getty AAT",
"type": "SparqlDatasource",
"description": "Getty AAT via Sparql Endpoint (unconfirmed backend)",
"settings": { "endpoint": "http://vocab.getty.edu/sparql", "defaultGraph": "http://vocab.getty.edu/dataset/aat" },