Skip to content

Instantly share code, notes, and snippets.

@cneud
Last active October 8, 2018 16:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save cneud/ba595b0d70413c952d64154646f560cf to your computer and use it in GitHub Desktop.
Save cneud/ba595b0d70413c952d64154646f560cf to your computer and use it in GitHub Desktop.
SBB API docs

APIs of the Staatsbibliothek zu Berlin - Preußischer Kulturbesitz*

*(to the extent currently implemented)

Programmatic access to the digitised collections and digitised newspapers of the Staatsbibliothek zu Berlin - Preußischer Kulturbesitz (SBB) is currently possible via two distinct APIs.

Retrieval of metadata for objects in the digitised collections is established by use of the The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard. A wide range of client applications for OAI-PMH in numerous programming languages are freely available on the web.

The base URL for the OAI-PMH endpoint of the digitised collections of the SBB is
http://digital.staatsbibliothek-berlin.de/oai

Using the 6 verbs provided by OAI-PMH, requests such as the following can be generated

  • "Which metadata formats are supported by the API?"
    http://digital.staatsbibliothek-berlin.de/oai?verb=ListMetadataFormats

  • "What digital collections do exist?"
    http://digital.staatsbibliothek-berlin.de/oai?verb=ListSets

The SBB implements DublinCore (DC) for basic bibliographic metadata and METS for all metadata about the contents and structure of a digital object.

By combination of OAI-PMH verbs and the DC-Metadata, more specific requests can be formulated such as

  • "What digitised newspapers do exist?"
    http://digital.staatsbibliothek-berlin.de/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=zeitungen

The response contains a unique identifier for each digital oject, the PPN, e.g. oai:digital.staatsbibliothek-berlin.de:PPN867445300. Using the PPN, additional information about a digital object can be retrieved
http://digital.staatsbibliothek-berlin.de/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300

By changing the metadata-prefix to mets, the complete METS metadata record containing all references to any related files (images, OCR) can be retrieved
http://digital.staatsbibliothek-berlin.de/oai?verb=GetRecord&metadataPrefix=mets&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300

The METS file contains a section <fileSec> which holds child elements of the type <fileGrp> which contain references to various files that belong to the digital object, typically images in either JPG or PNG format...
http://content.staatsbibliothek-berlin.de/dms/PPN867445300/800/0/00000001.jpg
http://content.staatsbibliothek-berlin.de/dms/PPN867445300/800/0/00000001.png

...and OCRed text files in ALTO format
http://digital.staatsbibliothek-berlin.de/ocrresolver/PPN867445300/00000001.xml

1a. Digitised collections: Other goodies

https://content.staatsbibliothek-berlin.de/dc/755410300-0001/full/full/0/default.jpg
https://content.staatsbibliothek-berlin.de/dc/755410300-0001/full/full/0/default.png
https://content.staatsbibliothek-berlin.de/dc/755410300-0001/full/full/0/default.tif
https://content.staatsbibliothek-berlin.de/dc/755410300.ocr.zip
https://content.staatsbibliothek-berlin.de/dc/755410300
https://digital-staging.sbb.spk-berlin.de/metsresolver/?PPN=PPN755410300
https://digital-staging.sbb.spk-berlin.de/ocr-txt/PPN755410300

Retrieval of content (images and full-text) for digitised newspapers is supported via the International Image Interoperability Framework (IIIF) protocol. Also here a growing number of free clients and libraries for IIIF in numerous programming languages are available on the web.

Currently, digitised newspaper images and metadata can be retrieved by requests following this schema: http://content.staatsbibliothek-berlin.de/zefys/SNP{ZDB-ID}-{YYYYMMDD}-{Issue}-{Page}-{Article}-{Version}

The ZDB-ID is a unique identifier for every newspaper title and that can be found either within the ZEFYS newspaper portal or directly from the ZDB.

Next, a date of issue needs to be specified in the YYYYMMDD format, so e.g. 18900101 for the issue published on January 1st, 1890. The information about wich date ranges have already been digitised per newspaper title can again be found in the ZEFYS newspaper portal.

By then combining the page number 0 with the ending .xml in the URL, the metadata METS document for each newspaper title can be obtained, e.g. http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-0-0-0.xml

[Please note that the functionality described here for retrieving OCR data is not currently implemented yet!]
By incrementing the page number, the OCRed text files in ALTO format can be requested:
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0.xml for page 1,
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-2-0-0.xml for page 2 asf.

To retrieve the scanned images for the newspaper, further information needs to be specified in the URL, such as the addition of /full/{width in pixel},/0/default.jpg with width in pixel having the supported options 1200, 800, 250, e.g. http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0/full/1200,/0/default.jpg http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0/full/800,/0/default.jpg http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0/full/250,/0/default.jpg

It is also possible to retrieve the original TIFF images via IIIF by replacing the width in pixel with full and specifying default.tif instead of default.jpg in the URL like:
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0/full/full/0/default.tif

Some working examples:
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0/full/full/0/default.tif -> TIF page 1
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0/full/1200,/0/default.jpg -> JPG page 1
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-10-0/full/1200,/0/default.jpg -> JPG page 1 (article 10 highlighted)
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0.pdf -> PDF page 1
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-0-0-0.pdf -> PDF all pages
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0.xml -> ALTO page 1
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-0-0-0.xml -> METS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment