Skip to content

Instantly share code, notes, and snippets.

@dmolesUC
Last active March 11, 2016 19:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dmolesUC/041743a69d84661eb4f6 to your computer and use it in GitHub Desktop.
Save dmolesUC/041743a69d84661eb4f6 to your computer and use it in GitHub Desktop.

9 March 2016

"Linked Open Dime Novels; or, 19th Century Fiction and 21st Century Data": Matthew Short, Demian Katz

Ontology

  • four classes: CreativeWork, Edition, Copy, Series aligning with current data model
  • hard to figure out line between "work" and "expression" etc. in more abstract models
  • properties: RDA Unconstrained (not FRBR)
  • defined English equivalents for opaque URIs to simplify coding
  • model captures relationships between multiple editions, info previously only existing in editorial notes
  • nominal authors (house pen names) and actual authors tracked separately

Tools

Takeaways

  • "With limited time and resources, you can actually do real things"
  • MODS and MARC allow you to wedge URIs into records -- using that for identifiers is a "tiny bit of linked data" that allows interop with related data in full LD

"Beyond the Keyword: Creative Search and Query Expansions based on DBpedia": Marya Sawaf

  • semantic search -- within a single knowledge space
    • DBPedia as source of variations and synonyms
  • serendipitous or "creative" search -- get outside the original knowledge space, share ideas across disciplines
    • synonyms of synonyms of synonyms -> exponential dataset
    • use word frequencies to filter for common expressions
    • fault tolerance to get around DBPedia quirks

"So you think you want to migrate to RDF": Eben English, Steven Carl Anderson

Vocabulary reuse

  • "reuse is how vocabularies gain value"
  • "always prefer using an existing [predicate] IRI over inventing a new one"
  • linked open vocabularies, sameAs
  • "with RDF you're not limited to a single vocabulary, you can mix-and-match"

Proper use of predicates

  • predicates have domains (valid subjects) and ranges (valid objects); not all URIs are predicates
    • ...of course, some people (DPLA, Europeana) aren't actually following the definitions... "there is no Semantic Web police"
    • try to conform to accepted usages, or
    • use less popular predicate that does have the right range, or mint your own
    • "domains actually mean very little"
      • you don't have to explicitly declare classes
      • but try not to do invalid things, e.g. use a predicate with a book domain for music
  • extinction: URIs that don't resolve
    • "if there's data that you care about at that URI, you still need to store that text locally"
  • don't be afraid to create a new predicate
    • "we've all seen" enough jamming data where it doesn't belong in MARC etc

Caveats

  • services like id.loc.gov can be rate-limited
    • "you're going to need to cache everything"
    • Rails Linked Data Fragments: front end to blazegraph, marmotta, in-memory
    • down side: batch downloads may not be made available often enough

Is it worth it?

  • public users can't tell the difference
  • RDF doesn't magically mean aggregatable or harvestable
  • need tightly-defined data structures, need to follow standards
  • "this is where things are going, you're going to have to deal with it"

"How not to waste catalogers' time: Making the most of subject headings": John Mark Ockerbloom

  • OPACs don't do a good job with subject browse
  • Solr and faceting aren't everything
    • "You can't just throw your subject headings into a weighted search and call it done"
    • most relevant book is not necessarily the one with the best term score
    • faceting is good for slice-and-dice, not for explore: narrow or broaden, not lateral
    • if you look how catalogers work, they assign subjects in a certain order by relevance
      • plea for those converting to RDF: "please go out of your way to preserve subject ordering"
    • dates in subject headings can be mined to raise scores for works contemporary to events

"The Modern Day Sisyphus: #libtech Burnout and You": Becky Yoose


"Janus - Node.js Handler for all Library Searches": David Naughton

  • you have one problem ->
    • "I'll just use node.js" ->
    • you have uncountably infinite problems
  • node.js is not a robust HTTP server
    • you need nginx etc. as a proxy
  • you need something else to keep node running if it goes down
    • forever, Supervisor
  • apache + passenger + node.js works OK
  • asynchrony in node is harder than it looks

"Getty Research Portal Reboot: Angular and Elasticsearch for Metadata Search Aggregation": Susan Ley, Adam Cahan (Getty)

  • angular.js + ElasticSearch
  • angular.js
    • Google MVC framework for JS-based web apps
    • benefits: dependency injection, 2-way data binding, testability, DOM filtering
    • large community
    • good styleguide (which Getty followed)
  • ElasticSearch
    • "you don't have to use Java, you can write a bunch of funky JSON instead"
    • "ElasticSearch scales"

"Architecture is politics: The power and the perils of systems design": Andreas Orphanides (NCSU Libraries)

Slides

  • system design controls what users can or can't do
  • "design ethics: a thing"
  • 3 key lessons in the ethics of system design

1. system design influences user behavior

  • "persuasive design"
  • "dark patterns": exploiting cognitive biases
    • pre-checked opt-in boxes
    • mixing required and optional checkboxes
    • highlighting and mis-identifying non-lowest airfares as lowest
  • clickable things should look clickable
  • calls to action should be prominently placed
  • Ethical principles
    • implement constraints/affordances to the user's benefit
    • design affordances the user will recognize
    • don't disguise constraints

2. system design reflects designer's values & cultural context

  • "architecture is politics" -- Mitch Kapor
  • e.g. Robert Moses' transit-proof overpasses
  • design sends a message about how designers value customers
  • "your metadata schema is a social justice issue"
  • your design choices reflect your values even if you don't intend it
  • do you value collecting metadata more than you value user privacy
  • Ethical principles
    • seek out & recognize your biases
    • diversify your design practices (and your team)
    • understand your culture and its mores

3. The system's interests will come into conflict with the user's interests

  • 80/20 rule
    • if you spend 80% of your developer time supporting your 20% power users, you're devaluing the vast majority of your users
  • content:advertising ratio
    • popular websites might have 1:5 content:advertising ratio
    • suggests advertisers are 5x more important than users
  • "your data validation schema is a social justice issue"
    • e.g. "your name must match your ID" vs. allowing only roman characters, modeling names as first/middle/last, etc.
  • 15% of internet users depend on mobile devices
    • "your mobile website is a social justice issue"
  • Ethical principles for compassionate design
    • recognize & acknowledge compromises
    • know your users
    • design with empathy

Transcending Traditional Systems and Labels: An API-First Archives Approach at NPR

API-first design

  • iterating on front end independent of back end development
  • simple, frequent front-end deployments

Architecture

  • coming from backbone & jquery
    • "angular is way easier than backbone"
  • all application state is stored in the URL
    • no state in the browser session
    • everything is bookmarkable, shareable, embeddable in bug reports
  • proxy layer: an API in front of the API
    • microservice between UI and API
    • authentication, caching, connecting to multiple internal APIs
  • moving from MySQL to NoSQL (Elastic + DynamoDB):
    • lots of HTTP calls
    • a million records -> N million API requests
      • SQL dump: 1 minute / year
      • inserting data into API: 1 hour / year
      • 40 years of data -> 1 week to load

"Building Desktop Applications using Web Technologies with Electron": Jason Ronallo

  • slides
  • Why desktop applications
    • stand out from sea of browser tabs
    • focus w/o distraction by sea of browser tabs
  • Don't want to learn desktop GUI toolkits? Use HTML/CSS/JS.
  • Electron: one of several available platforms for that
    • used by e.g. Slack
    • Chromium + Node.js
  • Issues
    • cross-platform, but:
      • need to build a native installer
      • still some OS differences
      • still need to recompile native modules

"Beyond the Bento Box: Using linked data and smart algorithms to integrate repository data in context": Jordan Fields & Mark Noble (Marmot)

  • public library users probably want books first
    • but we also have archives, articles....
  • Marmot has 16 public, 6 academic, 5 school libraries
    • one discovery system for all these different user groups
    • federated discovery across different ILSs
  • Pika: Marmot's new (alpha) discovery layer
    • Linked data sources:
      • Who's on first
      • Geonames
      • Find a Grave
      • Wikipedia
      • Internal catalog, geneology, archive
    • Different subject catalogs between article database, archive, EBSCO catalog
    • primarily using LD for well-known relationships

"What does it take to get a job these days? Analyzing jobs.code4lib.org data to understand current technology skillsets": Monica Maceli

  • curriculum, jobs, practitioners
  • curriculum study
    • "somebody does that every couple of years" -> automate it via web-scraping to identify trends over time
  • jobs
    • code4lib jobs tagged and curated by volunteers
    • shortimer: "a django web app that collects job announcements from the code4lib discussion list and puts them on the Web."
    • text mining, correlations, groupings with R

"Building a user-friendly authorities browse in Blacklight": Jennifer Colt & Frances Webb (Cornell)

Cornell's blacklight implementation

  • existing (Voyager) lists subjects and authors according to vocabulary
  • new (Blacklight) lists according to field, provides easy access to narrowing
  • links to main Blacklight catalog search results
    • headings only appear if heading, "see", or "see also" will provide search results in main catalog
    • cross-references come from the authority record
  • main catalog now indexes alternate forms as well as preferred forms (e.g. records catalogued as "myocardial infarction" now show up under "heart attack")
    • "users will find the records they want, but they won't necessarily realize we've done anything interesting to help them find the records they want"
    • can be done w/o setting up a separate authority browse
    • separate Solr index to facilitate searching records at the same level
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment