Programmatic access to the digitised collections and digitised newspapers of the Staatsbibliothek zu Berlin - Preußischer Kulturbesitz (SBB) is currently possible via two distinct APIs.
1. Digitised collections: OAI-PMH
Retrieval of metadata for objects in the digitised collections is established by use of the The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard. A wide range of client applications for OAI-PMH in numerous programming languages are freely available on the web.
The base URL for the OAI-PMH endpoint of the digitised collections of the SBB is
http://digital.staatsbibliothek-berlin.de/oai
Using the 6 verbs provided by OAI-PMH, requests such as the following can be generated
-
"Which metadata formats are supported by the API?"
http://digital.staatsbibliothek-berlin.de/oai?verb=ListMetadataFormats
-
"What digital collections do exist?"
http://digital.staatsbibliothek-berlin.de/oai?verb=ListSets
The SBB implements DublinCore (DC) for basic bibliographic metadata and METS for all metadata about the contents and structure of a digital object.
By combination of OAI-PMH verbs and the DC-Metadata, more specific requests can be formulated such as
- "What digitised newspapers do exist?"
http://digital.staatsbibliothek-berlin.de/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=zeitungen
The response contains a unique identifier for each digital oject, the PPN
, e.g. oai:digital.staatsbibliothek-berlin.de:PPN867445300
. Using the PPN
, additional information about a digital object can be retrieved
http://digital.staatsbibliothek-berlin.de/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300
By changing the metadata-prefix to mets
, the complete METS metadata record containing all references to any related files (images, OCR) can be retrieved
http://digital.staatsbibliothek-berlin.de/oai?verb=GetRecord&metadataPrefix=mets&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300
The METS file contains a section <fileSec>
which holds child elements of the type <fileGrp>
which contain references to various files that belong to the digital object, typically images in either JPG or PNG format...
http://content.staatsbibliothek-berlin.de/dms/PPN867445300/800/0/00000001.jpg
http://content.staatsbibliothek-berlin.de/dms/PPN867445300/800/0/00000001.png
...and OCRed text files in ALTO format
http://digital.staatsbibliothek-berlin.de/ocrresolver/PPN867445300/00000001.xml
1a. Digitised collections: Other goodies
https://content.staatsbibliothek-berlin.de/dc/755410300-0001/full/full/0/default.jpg
https://content.staatsbibliothek-berlin.de/dc/755410300-0001/full/full/0/default.png
https://content.staatsbibliothek-berlin.de/dc/755410300-0001/full/full/0/default.tif
https://content.staatsbibliothek-berlin.de/dc/755410300.ocr.zip
https://content.staatsbibliothek-berlin.de/dc/755410300
https://digital-staging.sbb.spk-berlin.de/metsresolver/?PPN=PPN755410300
https://digital-staging.sbb.spk-berlin.de/ocr-txt/PPN755410300
2. Digitised newspapers: IIIF
Retrieval of content (images and full-text) for digitised newspapers is supported via the International Image Interoperability Framework (IIIF) protocol. Also here a growing number of free clients and libraries for IIIF in numerous programming languages are available on the web.
Currently, digitised newspaper images and metadata can be retrieved by requests following this schema:
http://content.staatsbibliothek-berlin.de/zefys/SNP{ZDB-ID}-{YYYYMMDD}-{Issue}-{Page}-{Article}-{Version}
The ZDB-ID
is a unique identifier for every newspaper title and that can be found either within the ZEFYS newspaper portal or directly from the ZDB.
Next, a date of issue needs to be specified in the YYYYMMDD
format, so e.g. 18900101
for the issue published on January 1st, 1890. The information about wich date ranges have already been digitised per newspaper title can again be found in the ZEFYS newspaper portal.
By then combining the page number 0
with the ending .xml
in the URL, the metadata METS document for each newspaper title can be obtained, e.g.
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-0-0-0.xml
[Please note that the functionality described here for retrieving OCR data is not currently implemented yet!]
By incrementing the page number, the OCRed text files in ALTO format can be requested:
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0.xml
for page 1,
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-2-0-0.xml
for page 2 asf.
To retrieve the scanned images for the newspaper, further information needs to be specified in the URL, such as the addition of /full/{width in pixel},/0/default.jpg
with width in pixel
having the supported options 1200
, 800
, 250
, e.g.
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0/full/1200,/0/default.jpg
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0/full/800,/0/default.jpg
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0/full/250,/0/default.jpg
It is also possible to retrieve the original TIFF images via IIIF by replacing the width in pixel
with full
and specifying default.tif
instead of default.jpg
in the URL like:
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0/full/full/0/default.tif
Some working examples:
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0/full/full/0/default.tif
-> TIF page 1
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0/full/1200,/0/default.jpg
-> JPG page 1
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-10-0/full/1200,/0/default.jpg
-> JPG page 1 (article 10 highlighted)
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0.pdf
-> PDF page 1
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-0-0-0.pdf
-> PDF all pages
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0.xml
-> ALTO page 1
http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-0-0-0.xml
-> METS