Skip to content

Instantly share code, notes, and snippets.

View mark-dce's full-sized avatar

Mark Bussey mark-dce

  • Data Curation Experts
  • Minneapolis, MN
View GitHub Profile
@mark-dce
mark-dce / 16854582.json
Last active March 4, 2020 21:20
Mark's Sample Manifest - YCAL MSS 702
{
"@context": "http://iiif.io/api/presentation/2/context.json",
"@type": "sc:Manifest",
"@id": "https://raw.githubusercontent.com/curationexperts/YUL-DC/manifests/mark-dce/16854582.json",
"label": "Mark's Sample Manifest - YCAL MSS 702",
"metadata": [
{
"label": "Title",
"value": [
"[Career files unnumbered folder 9] The Thanks Be To Grandmother Winifred Foundation Grantees to be photographed"
@mark-dce
mark-dce / manifest.json
Last active February 24, 2020 03:51
Sample IIIF Manifest - from tutorial
{
"@context": "http://iiif.io/api/presentation/2/context.json",
"@type": "sc:Manifest",
"@id": "http://localhost:3000/manifest.json",
"label": "Papillons",
"description": "Four patterns inspired by butterflies.",
"attribution": "Special Collections Research Center at NCSU Libraries",
"logo": "http://localhost:3000/logo.jpg",
"sequences": [
{
@mark-dce
mark-dce / gradual.json
Last active February 24, 2020 03:49
Sample Manifest for Yale Manuscript
{
"@context": "http://iiif.io/api/presentation/2/context.json",
"@type": "sc:Manifest",
"@id": "http://localhost:3000/gradual.json",
"label": "Gradual",
"description": "Gradual, 1st half of 12th century",
"attribution": "Beinecke Library",
"logo": "https://beinecke.library.yale.edu/sites/default/files/images/BeineckeUnoffWordmark.png",
"sequences": [
{
@mark-dce
mark-dce / mark_analysis_junk_drawer.rake
Last active November 15, 2019 03:42
XML and CSV parsing examples
# frozen_string_literal: true
namespace :dce do
desc "Yellowback analysis sketchpad & junk drawer"
task analyze: :environment do
f = File.open('/Users/mark/Google Drive/DCE Clients/Emory University/2019-01 Repository Migration Development/5. Sample Data/DLP_Publishing_Test_20190919.xml')
doc = Nokogiri::XML(f)
total_records = doc.xpath("//record").count
child_volumes = doc.xpath("//datafield[@tag='856']/subfield[@code='3']/ancestor::record").count #code '3' = volume id, code 'u' = url
parent_mmsids = doc.xpath("//datafield[@tag='856']/subfield[@code='3']/ancestor::record").map{|n| n.xpath("controlfield[@tag='001']").text}
puts "MARC Records"
@mark-dce
mark-dce / yellowback-sample-processed.csv
Created November 13, 2019 23:39
Emory Yellowbacks sample manifest
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 2.
pl_row,work_id,CSV title,type,administrative_unit,content_type,data_classifications,emory_ark,emory_rights_statements,holding_repository,institution,other_identifiers,rights_statement,system_of_record_ID,conference_name,contributor,copyright_date,creator,date_created,date_digitized,date_issued,edition,extent,genre,local_call_number,place_of_production,primary_language,publisher,series_title,subject_geo,subject_names,subject_topics,table_of_contents,title,uniform_title,fileset_label,preservation_master_file,service_file,intermediate_file,transcript_file,extracted,pcdm_use
2,!!!!!,Edge cases,work,"Stuart A. Rose Manuscript, Archives, and Rare Book Library",http://id.loc.gov/vocabulary/resourceTypes/txt,Confidential,!!!!!,Test fixture,"Stuart A. Rose Manuscript, Archives, and Rare Book Library",Emory University,oclc:ocm12345678|barcode:010101010101|digwf:9090,http://rightsstatements.org/vocab/NoC-US/1.0/,Alma:111111111111111111,"International Exhibition (1862 : London, England)","Topp, Chester W., collector. GEU
@mark-dce
mark-dce / usage.yml
Created August 25, 2019 21:33
Emory metadata usage definitions
---
# Metadata usage hints
# attribute: usage
abstract: "Free-text, summary description of the content, such as an abstract. Provides additional search terms in natural language; provides important summary information for non-textual resources such as images, audio, and video"
access_right: "Information about access restrictions due to the nature of the information in the materials being described, such as those imposed by the donor, by the repository, or by statutory/regulatory requirements"
administrative_unit: "The name of a sub-unit within the Holding Repository; indicates a stewardship role"
arkivo_checksum: "-- system field - not directly editable --"
author_notes: "Free-text note that contains information from or about the authors of the material; may include or reference corresponding author information"
conference_dates: "Date range for conference at which a conference paper was presented"
@mark-dce
mark-dce / Hierarchy Sample.CSV
Last active March 18, 2019 16:00
Potential CSV model for collections and child pages
Object Type Title Item ARK Collection ARK Parent ARK Sequence Number
Collection Miscellaneous Manuscripts ark:9999/1df68c5
Work Home Book on Sanitation ark:9999/2fa8801 ark:9999/1df68c5
Child Work Front Cover ark:9999/8c500dc ark:9999/2fa8801 1
Child Work Inside Front ark:9999/6899e32 ark:9999/2fa8801 2
@mark-dce
mark-dce / loc-images.csv
Last active February 4, 2019 22:40
WPA Posters Import Manifest for Tenejo
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 21 columns, instead of 13. in line 1.
title,creator,keyword,rights statement,contributor,abstract or summary,license,date created,subject,language,identifier,location,related url,source,resource type,visibility,representative media,thumbnail,rendering,publisher,files
In March read the books you've always meant to read,unknown,Books|~|Color|~|Illinois|~|Libraries|~|Posters|~|Reading|~|Screen Prints,http://rightsstatements.org/vocab/InC/1.0/,Federal Art Project,"Poster for statewide Library Project showing a windblown woman and books by authors such as Scott, Dumas, Thackeray, Dickens, Austen, and others.",http://creativecommons.org/publicdomain/mark/1.0/,1936,Reading—Illinois—1930-1950|~|Books--1930-1950|~|Libraries--Illinois--1930-1950,English,"POS - WPA - ILL .01 .I5531, no. 1 (B size) [P&P]|~|LOC Control #98507722|~|Reproduction Number: LC-USZC2-5175 (color film copy slide)|~|cph 3f05175 //hdl.loc.gov/loc.pnp/cph.3f05175",http://www.geonames.org/4896861/illinois.html,https://www.loc.gov/item/98507722/|~|https://lccn.loc.gov/98507722|~|http://lo
@mark-dce
mark-dce / partner_institution_feature_spec.rb
Created August 21, 2018 15:22
Katalon recording to test that partnering agency is listed on the Submit tab
require "json"
require "selenium-webdriver"
require "rspec"
include RSpec::Expectations
describe "PartneringAgencyDisplaysOnSubmitTab" do
before(:each) do
@driver = Selenium::WebDriver.for :firefox
@base_url = "https://www.katalon.com/"
@mark-dce
mark-dce / solr.log.1
Last active April 12, 2018 02:56
Extract of prod-solr log at crash time - /var/solr/log
...
2018-04-12 02:01:10.691 INFO (qtp717356484-385) [ x:etds] o.a.s.u.p.LogUpdateProcessorFactory [etds] webapp=/solr path=/update/extract params={extractO
nly=true&wt=json&extractFormat=text}{} 0 283
2018-04-12 02:01:10.692 ERROR (qtp717356484-385) [ x:etds] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: org.apache.tika.exception.Tik
aException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@54b16458
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)