Skip to content

Instantly share code, notes, and snippets.

@tombaker
Last active July 14, 2023 11:36
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save tombaker/00e47cf4771dff8566a44529a77aae48 to your computer and use it in GitHub Desktop.
Save tombaker/00e47cf4771dff8566a44529a77aae48 to your computer and use it in GitHub Desktop.

DCMI DCAP Interest Group cheatsheet

Version: 2020-05-27

This cheatsheet: https://gist.github.com/00e47cf4771dff8566a44529a77aae48.git

Main goal: Simple Tabular Model for Application Profiles (AP-STM)

For human consumption

  • For display in tabular format ("as is")
  • For conversion into HTML or PDF

For machine processing

  • For generating validation schemas in XML Schema, SHACL, or ShEx

Next call


Key links


In scope for DCAP-IG (potential deliverables)

  1. "Data, instances of"
  • Example data! - data in datasets, aka "instance data"
  • The need to understand or validate instance data is, of course, the whole point.
  1. "Simple Tabular Model for Application Profiles (AP-STM), instances of"
  • Example profiles! CSV files, based on AP-STM, filled in with specific constraints.
  • Templates for validating instance data about books, paintings, languages...
  1. "Simple Tabular Model for Application Profiles (AP-STM), specification of"
  1. "Model and vocabulary specification for Dublin Core Application Profiles (DCAP)"
  • Defines elements of CSV = spreadsheet columns = terminology = vocabulary
    • Entity_Shape_ID / Entity_Shape_Label
    • Property_ID / Property_Label
    • Value_Type / Value_Constraint
    • Cardinality / Annotation / Namespace_Prefix / Namespace_URI
  • Glossary with definitions of general concepts: Entity Shape, Property, Value, Namespace.
  • Definitions - EDIT HERE
  1. "AP-STM Value Types, vocabulary specification of"
  • Starter set of core value types
  • Examples: Literal, Non-literal, Entity Shape Reference, IRI, IRI Stem, Pick List
  1. Scripts in Python, etc, for converting AP-STM instances into validation schemas (ShEx, etc).

BEYOND the scope of DCAP-IG

  1. "Application Profiles, instances of" (other than simple tabular APs)
  • Examples: More complex profiles based on DCAT, BIBFRAME, or RDA.
  • AP Instances lie mid-way between data and AP Models:
    • Instance Data is matched against AP Instances (eg, for validation).
    • AP Instances are based on AP Models.
  1. "Application Profiles, generic model and vocabulary for"

Meetings and calls (reverse-chronological)


Where to declare namespace prefixes


Older prototypes


ShEx Lite - aka ShExJ-Lite, subset of ShExJ that should work in any ShEx implementation


ShExStatements (John Samuel)


Other related work


Early requirements and discussion


Some favorite definitions (so far)

AP concepts

AP-STM elements

  • Entity_Shape_ID - the handle for class of things being described
  • Entity_Shape_Label - a human-readable text representing the class of things being described
  • Property_ID - identifier of a property used to describe the resource
  • Property_Label - a human-readable text representing the property
  • Value_type - data type of the value in the instance data for the related property
  • Value_constraint - a further constraint on the value. Examples: pick list, URI stem.
  • Cardinality - number of pairs of a given property and value allowable for describing an entity
  • Annotation - free-form comments about the statement
  • Namespace_Prefix
  • Namespace_URI

Note: If the vocabulary were small enough, it could be "translated" into library- and computer-science terminology.


Various DCAP-IG resolutions and design decisions

  • Aim at "minimalist" profile (minimize the number of columns in spreadsheet).
  • Most minimal AP could consist of just a list of properties.
  • Properties do not have to be URIs; "label" not required.
  • Text fields needed for generating input forms or human-readable documentation.
    • Property Label
    • Entity Label
    • Annotation

Requirements for application profiles

For human consumption

  • Expressible in TXT, Markdown, HTML, PDF, MSWord, Google Docs...
  • Serves
    • to document community consensus
    • to document the structure of a specific dataset

For machine processing

  • Expressible in an actionable form (XML Schema, SHACL, ShEx)
  • Provides a template or schema for
    • creating instance data
    • consuming instance data
    • displaying data (eg, Web forms)
    • validating instance data.

Write-up...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment