Skip to content

Instantly share code, notes, and snippets.

View seanupton's full-sized avatar

Sean Upton seanupton

  • Mostscript, LLC
  • Salt Lake City, Utah USA
View GitHub Profile
@seanupton
seanupton / ingest-desc-script.md
Last active October 24, 2019 15:09
Slide notes / ingest video script

Slide 3: Newspaper work types and relationships

NewspaperWorks provides five new work types to a Hyrax application:

  • Newspaper titles represent a publication, and contain Issues and Containers.
  • Issues can contain Pages, and Articles.
  • Articles, if you have segmented content, can be associated with one or more pages.
  • Optionally, Pages can relate to Containers in the same publication for respective microform reels.
  • You can see read more about these types on the NewspaperWorks repository wiki in samvera-labs.
@seanupton
seanupton / retry-job.md
Last active August 28, 2019 14:45
Testing CreateIssuePagesJob retry

Getting started, pre-requisites:

  1. Clone the issue-pdf-composition branch of newspaper_works
  2. Make sure your current working directory is the root of the newspaper_works GEM.
  3. Set up the gem and test app:
bundle install
bundle exec rake engine_cart:generate
@seanupton
seanupton / webmock-trivial.rb
Last active August 1, 2019 21:08
WebMock and Faraday in irb console
require 'webmock'
include WebMock::API
# Enable WebMock, but...
WebMock.enable!
# ...allow connection for non-stubbed requests:
WebMock.allow_net_connect!
url = 'https://www.example.com'
path = '/some/path/to/publication/sn99999999'
ingester = NewspaperWorks::Ingest::PDFIngester.new(path)
# if LCCN cannot be determined and validated from path, then raise an error
# during construction above...
# - Presumption: no character padding in LCCNs in either path or
# in any command form.
# - Normalize LCCN:
# - Strip whitespace?
# - Make lower case.
# - Validate LCCN with regex:
@seanupton
seanupton / geonames_place_uri_finder.rb
Last active April 12, 2019 00:16
Geonames snippet for use in ingest?
require 'faraday'
require 'nokogiri'
require 'uri'
def geonames_place_uri(placename)
query = URI::encode(placename)
geo_qs = "q=#{query}&username=#{Qa::Authorities::Geonames.username}"
url = "http://api.geonames.org/search?#{geo_qs}"
resp = Faraday.get url
doc = Nokogiri.XML(resp.body)
@seanupton
seanupton / sample_ingest_issue_page.rb
Created February 7, 2019 23:50
Hypothetical (untested) example ingesting an NDNP Issue and its pages, along with associated files and metadata
require 'newspaper_works_fixtures'
def ndnp_ingest_page(issue, page_data):
page = NewspaperPage.new
page.title = ["Page #{page_data.metadata.page_number}"]
page.height = page_data.metadata.height
page.width = page_data.metadata.width
page.save!
work_files = NewspaperWorks::Data::WorkFiles.new(page)
@seanupton
seanupton / SD9_prop3.txt
Created January 29, 2019 07:06
Utah Senate District 9, Proposition 3 2018 (Medicaid Expansion) Results, by Precinct
Via https://results.enr.clarityelections.com/UT/Salt_Lake/92254/Web02.214799/#/ (XML download)
Analysis via: https://gist.github.com/seanupton/7d650ff91f4b94f74d3dbffbe677c535
Total SD9:
FOR: 25647 (57.3%)
AGAINST: 19144 (42.7%)
87 of 99 voter precincts FOR
11 of 99 precincts against
1 precincts tie (SIL001)
@seanupton
seanupton / prop3_sd9_comp.py
Last active January 29, 2019 07:05
prop3_sd9_comp.py
#!/usr/bin/env python
from lxml import etree
def slco_election_data():
data = ''
with open('../data/slco_detail_2018.xml') as infile:
data = infile.read()
return etree.fromstring(data)
@seanupton
seanupton / commit_queued_example.rb
Last active January 7, 2019 18:41
Commiting queued derivative assignment after fileset creation
module NewspaperWorks
module Data
class WorkDerviatives
#...
# Given a fileset meeting both of the following conditions:
# 1. a non-nil import_url value;
# 2. is attached to a work (persisted in Fedora, if not yet in Solr)...
# ...this method shall get associated derivative paths queued and attach all.
def commit_queued!(file_set)
@seanupton
seanupton / file-crud-attach-assign-and-commit.rb
Last active April 23, 2019 00:14
File CRUD+Attachment components, proposed calling semantics
# WorkFiles is all of the following:
# - Adapter of Work
# - A mapping of existing saved files, expressed as a hash-like
# object, with identifier keys, and path-to-file values, with
# transparent just-in-time working-copy checkout of values on
# request from backing storage to local file.
# - Identifiers are presumed to be (PCDM) FileSet persistence ID,
# not persistence id of a file.
# - Provide a means to get a file path (working copy), either by
# global identifier or by "file name" relative to the local work