Skip to content

Instantly share code, notes, and snippets.

@jackflaps
Last active January 17, 2019 22:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jackflaps/b4bd0ceb8861da0ef0d9f9005ca05ed5 to your computer and use it in GitHub Desktop.
Save jackflaps/b4bd0ceb8861da0ef0d9f9005ca05ed5 to your computer and use it in GitHub Desktop.
current (2018 January 17) thinking on technical debt in metadata management

Forms of Metadata Technical Debt

Code Debt

code debt in the metadata context may include cataloging done outside of the confines of our cataloging rules ... or in violation of established metadata specifications

debt native to the metadata itself -- not enough of it, or metadata insufficient to meet user needs

works cited

Classes of code debt:

  • quantitative debt unprocessed/uncataloged materials
    • [# of materials] - [# of cataloged materials]
  • qualitative debt incompletely cataloged materials
    • Incomplete by what standard?
    • Brief vs. full records
    • For whom are we cataloging? Does our current metadata align with the discovery needs of our community members? Does anyone use or care about the metadata we create now? (equitable cataloging -- Olson, Buckland, ...)

Means of incurring code debt:

  • intentional (brief records; vendor loads known to be incomplete)
  • unintentional (vendor loads not known to be incomplete; preferred user access points not covered by existing cataloging practice or system; discovery systems use metadata properties for which values are unassigned)

Metrics:

  • How do you measure quantitative backlogs?
    • Should be noted that there isn't a consistent definition in the literature of what constitutes a backlog, and that's okay -- this will vary by institution, and every community has different needs
  • By what metrics do you measure qualitative backlogs? What steps do you take to address them prior to catalog ingest?
  • How does the presence of a discovery system, whose scope is larger than the catalog, impact backlog management tactics?

Howarth, Lynne C., Moor, Les, and Sze, Elisa (2010). Mountains to Molehills: the past, present, and future of cataloguing backlogs. Cataloging and Classification Quarterly, 48(5), pp. 423-444, dx.doi.org/10.1080/01639371003767227.

Overview of the identification and measurement of cataloging backlogs, from Piternick (1969) to the time the article was published. Notes the diversity of opinion regarding how to identify and measure backlogs or arrearages in the first place, and strategies for addressing them.

Snow, Karen (2017). Defining, Assessing, and Rethinking Quality Cataloging. Cataloging and Classification Quarterly, 55(7-8), pp. 438-455, doi-org.du.idm.oclc.org/10.1080/01639374.2017.1350774.

Begins from the question: "If the cataloging community cannot agree upon a universal definition of quality cataloging, how are libraries supposed to assess the quality of cataloging work?" From there, a comprehensive overview of the history of measuring library cataloging quality, going back to Charles Cutter, including modern interpretations of the subject (citing Hillmann, Calhoun, others).

Bowker (1883) already saying it best: "remember that cooperation does not mean rigid uniformity, and that, among many varieties of situation and circumstance, the best way is often a relative term."

Snow writes about efforts among libraries in the early part of the century to develop "audit tools" for measuring metadata quality, and the general findings they made:

  • Chapman and Massey (2002) assessing eleven specific areas of focus for cataloging at the University of Bath. Finding that such a tool is not objective, but is useful for evaluating accuracy "in the library's own terms"
  • MacEwan and Young () aligning their audit with the FRBR user tasks and assessing quality that way. Study suffered from small sample size and the fact that the FRBR user tasks themselves were not developed through user observation, but through the biases of the FRBR design committee
  • Hider and Tan () took a multi-faceted approach to measuring metadata quality in/for public libraries in Singapore, talking to users and cataloging experts to learn their opinions on what constituted "quality." These opinions varied widely, especially among the cataloging experts. Noted that while time-consuming, such investigations are essential for any definition of "quality" that would be useful for a library or library cooperative.

Snow identifies five strategies for catalogers to follow when seeking to name what "cataloging quality" means at their institution:

  1. identify personal definitions of "quality," then share those with others at their shop, leading to an institutional definition of "quality" that may be audited against existing metadata
  2. study user information needs, either themselves or in collaboration with public services librarians
  3. conduct studies to determine the needs not only of individual users, but of domains of users
  4. incorporate the results of user studies into cataloging standards
  5. embrace design thinking, that is, "human-centered exploration that seeks to address problems by privileging the cyclical approach of design rather than the linear approach of the scientific method" (Clarke 2015)

Design and Architectural Debt

...[design and architectural debt] instead manifests at the level of the systems and applications used to manage metadata for library resources, and/or at the level of standards development and data modeling for the cultural heritage domain...

the ability of standards and systems to meet the resource description needs of curators, catalogers, and processing archivists

Metrics:

  • Evaluation of metadata standards against discovery systems, local user needs, and local metadata practices
  • Evaluation of discovery and metadata management systems against local user needs and existing metadata

Research questions:

  1. To what extent are local metadata and metadata practices in alignment with metadata schema (MARC, BIBFRAME, etc.) and content (RDA, DACS) standards?
  2. To what extent are those schema and content standards in alignment with user needs in different contexts?
  3. To what extent do the systems we have in place for cataloging and discovery take advantage of (1) and (2), and are there areas in which they are deficient?

Environmental Debt

environmental debt is typically due to breakdowns in communication between the technical services unit and its collaborators throughout the institution...

effectiveness of the organizational structure for ensuring quality of metadata in resource discovery systems

Means of incurring environmental debt:

  • intentional insufficient allocation of staff time and resources to systems maintenance (including metadata) and/or user experience research
  • unintentional time constraints preventing regular communication among library teams (particularly if significant responsibility for system management is in the hands of an offsite vendor? I don't really know what I meant by this...)

Metrics:

(these are hard problems to solve, since you're basically measuring if people talk to each other or not)

  • Do technical services stakeholders understand systems well enough to optimize metadata management workflows for those systems?
  • Do IT stakeholders understand technical services domain experts -- their standard metadata schema and workflows, and their metadata management use cases -- well enough to integrate them into metadata management applications and discovery systems?
  • Do technical services stakeholders and public services stakeholders understand the interactions each has with the discovery and resource management systems well enough to find a balance in optimizing the work of each within those systems, and avoiding focusing too much on one at the expense of the other?

Documentation Debt

Documentation of metadata practices in libraries is a two-fold issue: the standards themselves, and additionally local practices and rules which complement these standards, may be documented to varying degrees.

(To this I would add institutional memory as an aspect of documentation debt, if I could write this paper over. Libraries and archives typically have a great deal of knowledge of past practice in senior faculty and staff; too often that knowledge disappears with those people when they retire or move on.)

Metrics:

  • Do you document your local practices, or do you rely on standards documentation (e.g. the RDA Toolkit)?
  • Can your documentation be used for validation of existing metadata? In what formats do you keep your documentation?

Requirements Debt

Maybe the most difficult to identify, measure, and address in a library setting (it's either this or environmental debt, for me)

Factors (no metrics yet, I think measurement of this comes from measurement of the other classes of debt):

  • the number of use cases a metadata management application must address
  • the number of user classes expected in such a system:
    • varying degrees of familiarity with discovery systems and information retrieval generally
    • variety of backgrounds, whether educational, socioeconomic, or other
    • desired outcomes
  • needs of users vs. needs of librarians (both important!)

...a comprehensive metadata technical debt management approach must take all of these factors into account in order to meet the "optimal requirements specification" for the metadata management system.

Management strategies

  • Design: not only of the metadata records and aggregations, but also of the taxonomies, standards, and data models upon which they are based.
  • System requirements: in terms of the library management application and the resource discovery and retrieval needs of users.
  • Environment and infrastructure: recognizing that library metadata serves a local purpose of connecting users with resources, but also increasingly serves a global role through inclusion in services like WorldCat as well as emerging linked data applications.
  • Workflow: its overall efficiencies and the extent to which it is documented. Physical as well as intellectual space
  • Governance: Who sets the standards and instruments by which technical debt is measured and managed in metadata management environments? Is the process top-down or collaborative?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment