dmolesUC/c4l16-day2-notes.md

## c4l16-day2-notes.md

      
    Raw
  

              c4l16-day2-notes.md
            
          
    9 March 2016


"Linked Open Dime Novels; or, 19th Century Fiction and 21st Century Data": Matthew Short, Demian Katz

Ontology
Tools
Takeaways


"Beyond the Keyword: Creative Search and Query Expansions based on DBpedia": Marya Sawaf
"So you think you want to migrate to RDF": Eben English, Steven Carl Anderson

Vocabulary reuse
Proper use of predicates
Caveats
Is it worth it?


"How not to waste catalogers' time: Making the most of subject headings": John Mark Ockerbloom
"The Modern Day Sisyphus: #libtech Burnout and You": Becky Yoose
"Janus - Node.js Handler for all Library Searches": David Naughton
"Getty Research Portal Reboot: Angular and Elasticsearch for Metadata Search Aggregation": Susan Ley, Adam Cahan (Getty)
"Architecture is politics: The power and the perils of systems design": Andreas Orphanides (NCSU Libraries)

1. system design influences user behavior
2. system design reflects designer's values & cultural context
3. The system's interests will come into conflict with the user's interests


Transcending Traditional Systems and Labels: An API-First Archives Approach at NPR

API-first design
Architecture


"Building Desktop Applications using Web Technologies with Electron": Jason Ronallo
"Beyond the Bento Box: Using linked data and smart algorithms to integrate repository data in context": Jordan Fields & Mark Noble (Marmot)
"What does it take to get a job these days? Analyzing jobs.code4lib.org data to understand current technology skillsets": Monica Maceli
"Building a user-friendly authorities browse in Blacklight": Jennifer Colt & Frances Webb (Cornell)

"Linked Open Dime Novels; or, 19th Century Fiction and 21st Century Data": Matthew Short, Demian Katz


bibliography of 19th century popular fiction as LD

dimenovels.lib.niu.edu
dimenovels.org


Ontology


four classes: CreativeWork, Edition, Copy, Series aligning with current
data model
hard to figure out line between "work" and "expression" etc. in more
abstract models
properties: RDA Unconstrained (not FRBR)
defined English equivalents for opaque URIs to simplify coding
model captures relationships between multiple editions, info previously
only existing in editorial notes
nominal authors (house pen names) and actual authors tracked separately

Tools


hand-rolled RDF harvester: Murpoint; Fuseki as SPARQL server

Takeaways


"With limited time and resources, you can actually do real things"
MODS and MARC allow you to wedge URIs into records -- using that for
identifiers is a "tiny bit of linked data" that allows interop with
related data in full LD


"Beyond the Keyword: Creative Search and Query Expansions based on DBpedia": Marya Sawaf


semantic search -- within a single knowledge space

DBPedia as source of variations and synonyms


serendipitous or "creative" search -- get outside the original knowledge
space, share ideas across disciplines

synonyms of synonyms of synonyms -> exponential dataset
use word frequencies to filter for common expressions
fault tolerance to get around DBPedia quirks


"So you think you want to migrate to RDF": Eben English, Steven Carl Anderson

Vocabulary reuse


"reuse is how vocabularies gain value"
"always prefer using an existing [predicate] IRI over inventing a new one"
linked open vocabularies, sameAs
"with RDF you're not limited to a single vocabulary, you can mix-and-match"

Proper use of predicates


predicates have domains (valid subjects) and ranges (valid objects); not
all URIs are predicates

...of course, some people (DPLA, Europeana) aren't actually following the
definitions... "there is no Semantic Web police"
try to conform to accepted usages, or
use less popular predicate that does have the right range,
or mint your own
"domains actually mean very little"

you don't have to explicitly declare classes
but try not to do invalid things, e.g. use a predicate with a book
domain for music


extinction: URIs that don't resolve

"if there's data that you care about at that URI, you still need to
store that text locally"


don't be afraid to create a new predicate

"we've all seen" enough jamming data where it doesn't belong in MARC etc


Caveats


services like id.loc.gov can be rate-limited

"you're going to need to cache everything"
Rails Linked Data Fragments: front end to blazegraph, marmotta, in-memory
down side: batch downloads may not be made available often enough


Is it worth it?


public users can't tell the difference
RDF doesn't magically mean aggregatable or harvestable
need tightly-defined data structures, need to follow standards
"this is where things are going, you're going to have to deal with it"


"How not to waste catalogers' time: Making the most of subject headings": John Mark Ockerbloom


OPACs don't do a good job with subject browse
Solr and faceting aren't everything

"You can't just throw your subject headings into a weighted search and
call it done"
most relevant book is not necessarily the one with the best term score
faceting is good for slice-and-dice, not for explore: narrow or broaden,
not lateral
if you look how catalogers work, they assign subjects in a certain order
by relevance

plea for those converting to RDF: "please go out of your way to
preserve subject ordering"


dates in subject headings can be mined to raise scores for works
contemporary to events


"The Modern Day Sisyphus: #libtech Burnout and You": Becky Yoose


see notes here


"Janus - Node.js Handler for all Library Searches": David Naughton


you have one problem ->

"I'll just use node.js" ->
you have uncountably infinite problems


node.js is not a robust HTTP server

you need nginx etc. as a proxy


you need something else to keep node running if it goes down

forever, Supervisor


apache + passenger + node.js works OK
asynchrony in node is harder than it looks

"What color is your function"


"Getty Research Portal Reboot: Angular and Elasticsearch for Metadata Search Aggregation": Susan Ley, Adam Cahan (Getty)


angular.js + ElasticSearch
angular.js

Google MVC framework for JS-based web apps
benefits: dependency injection, 2-way data binding, testability, DOM filtering
large community
good styleguide
(which Getty followed)


ElasticSearch

"you don't have to use Java, you can write a bunch of funky JSON instead"
"ElasticSearch scales"


"Architecture is politics: The power and the perils of systems design": Andreas Orphanides (NCSU Libraries)

Slides

system design controls what users can or can't do
"design ethics: a thing"
3 key lessons in the ethics of system design

1. system design influences user behavior


"persuasive design"
"dark patterns": exploiting cognitive biases

pre-checked opt-in boxes
mixing required and optional checkboxes
highlighting and mis-identifying non-lowest airfares as lowest


clickable things should look clickable
calls to action should be prominently placed
Ethical principles

implement constraints/affordances to the user's benefit
design affordances the user will recognize
don't disguise constraints


2. system design reflects designer's values & cultural context


"architecture is politics" -- Mitch Kapor
e.g. Robert Moses' transit-proof overpasses
design sends a message about how designers value customers
"your metadata schema is a social justice issue"
your design choices reflect your values even if you don't intend it
do you value collecting metadata more than you value user privacy
Ethical principles

seek out & recognize your biases
diversify your design practices (and your team)
understand your culture and its mores


3. The system's interests will come into conflict with the user's interests


80/20 rule

if you spend 80% of your developer time supporting your 20% power
users, you're devaluing the vast majority of your users


content:advertising ratio

popular websites might have 1:5 content:advertising ratio
suggests advertisers are 5x more important than users


"your data validation schema is a social justice issue"

e.g. "your name must match your ID" vs. allowing only roman characters,
modeling names as first/middle/last, etc.


15% of internet users depend on mobile devices

"your mobile website is a social justice issue"


Ethical principles for compassionate design

recognize & acknowledge compromises
know your users
design with empathy


Transcending Traditional Systems and Labels: An API-First Archives Approach at NPR

API-first design


iterating on front end independent of back end development
simple, frequent front-end deployments

Architecture


coming from backbone & jquery

"angular is way easier than backbone"


all application state is stored in the URL

no state in the browser session
everything is bookmarkable, shareable, embeddable in bug reports


proxy layer: an API in front of the API

microservice between UI and API
authentication, caching, connecting to multiple internal APIs


moving from MySQL to NoSQL (Elastic + DynamoDB):

lots of HTTP calls
a million records -> N million API requests

SQL dump: 1 minute / year
inserting data into API: 1 hour / year
40 years of data -> 1 week to load


"Building Desktop Applications using Web Technologies with Electron": Jason Ronallo


slides
Why desktop applications

stand out from sea of browser tabs
focus w/o distraction by sea of browser tabs


Don't want to learn desktop GUI toolkits? Use HTML/CSS/JS.
Electron: one of several available platforms for that

used by e.g. Slack
Chromium + Node.js


Issues

cross-platform, but:

need to build a native installer
still some OS differences
still need to recompile native modules


"Beyond the Bento Box: Using linked data and smart algorithms to integrate repository data in context": Jordan Fields & Mark Noble (Marmot)


public library users probably want books first

but we also have archives, articles....


Marmot has 16 public, 6 academic, 5 school libraries

one discovery system for all these different user groups
federated discovery across different ILSs


Pika: Marmot's new (alpha) discovery layer

Linked data sources:

Who's on first
Geonames
Find a Grave
Wikipedia
Internal catalog, geneology, archive


Different subject catalogs between article database, archive, EBSCO catalog
primarily using LD for well-known relationships


"What does it take to get a job these days? Analyzing jobs.code4lib.org data to understand current technology skillsets": Monica Maceli


curriculum, jobs, practitioners
curriculum study

"somebody does that every couple of years" -> automate it via web-scraping to identify trends over time


jobs

code4lib jobs tagged and curated by volunteers
shortimer: "a django web app
that collects job announcements from the code4lib discussion list and
puts them on the Web."
text mining, correlations, groupings with R


"Building a user-friendly authorities browse in Blacklight": Jennifer Colt & Frances Webb (Cornell)

Cornell's blacklight implementation

existing (Voyager) lists subjects and authors according to vocabulary
new (Blacklight) lists according to field, provides easy access to narrowing
links to main Blacklight catalog search results

headings only appear if heading, "see", or "see also" will provide search results
in main catalog
cross-references come from the authority record


main catalog now indexes alternate forms as well as preferred forms (e.g.
records catalogued as "myocardial infarction" now show up under "heart
attack")

"users will find the records they want, but they won't necessarily
realize we've done anything interesting to help them find the records
they want"
can be done w/o setting up a separate authority browse
separate Solr index to facilitate searching records at the same level