cneud/sbb.api.doc.md

## sbb.api.doc.md

      
    Raw
  

              sbb.api.doc.md
            
          
    APIs of the Staatsbibliothek zu Berlin - Preußischer Kulturbesitz*

*(to the extent currently implemented)

Programmatic access to the digitised collections and digitised newspapers of the Staatsbibliothek zu Berlin - Preußischer Kulturbesitz (SBB) is currently possible via two distinct APIs.
1. Digitised collections: OAI-PMH

Retrieval of metadata for objects in the digitised collections is established by use of the The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard.
A wide range of client applications for OAI-PMH in numerous programming languages are freely available on the web.
The base URL for the OAI-PMH endpoint of the digitised collections of the SBB is

http://digital.staatsbibliothek-berlin.de/oai
Using the 6 verbs provided by OAI-PMH, requests such as the following can be generated


"Which metadata formats are supported by the API?"

http://digital.staatsbibliothek-berlin.de/oai?verb=ListMetadataFormats


"What digital collections do exist?"

http://digital.staatsbibliothek-berlin.de/oai?verb=ListSets


The SBB implements DublinCore (DC) for basic bibliographic metadata and METS for all metadata about the contents and structure of a digital object.
By combination of OAI-PMH verbs and the DC-Metadata, more specific requests can be formulated such as

"What digitised newspapers do exist?"

http://digital.staatsbibliothek-berlin.de/oai?verb=ListIdentifiers&metadataPrefix=oai_dc&set=zeitungen

The response contains a unique identifier for each digital oject, the PPN, e.g. oai:digital.staatsbibliothek-berlin.de:PPN867445300. Using the PPN, additional information about a digital object can be retrieved

http://digital.staatsbibliothek-berlin.de/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300
By changing the metadata-prefix to mets, the complete METS metadata record containing all references to any related files (images, OCR) can be retrieved

http://digital.staatsbibliothek-berlin.de/oai?verb=GetRecord&metadataPrefix=mets&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300
The METS file contains a section <fileSec> which holds child elements of the type <fileGrp> which contain references to various files that belong to the digital object, typically images in either JPG or PNG format...

http://content.staatsbibliothek-berlin.de/dms/PPN867445300/800/0/00000001.jpg

http://content.staatsbibliothek-berlin.de/dms/PPN867445300/800/0/00000001.png
...and OCRed text files in ALTO format

http://digital.staatsbibliothek-berlin.de/ocrresolver/PPN867445300/00000001.xml
1a. Digitised collections: Other goodies

https://content.staatsbibliothek-berlin.de/dc/755410300-0001/full/full/0/default.jpg

https://content.staatsbibliothek-berlin.de/dc/755410300-0001/full/full/0/default.png

https://content.staatsbibliothek-berlin.de/dc/755410300-0001/full/full/0/default.tif

https://content.staatsbibliothek-berlin.de/dc/755410300.ocr.zip

https://content.staatsbibliothek-berlin.de/dc/755410300

https://digital-staging.sbb.spk-berlin.de/metsresolver/?PPN=PPN755410300

https://digital-staging.sbb.spk-berlin.de/ocr-txt/PPN755410300
2. Digitised newspapers: IIIF

Retrieval of content (images and full-text) for digitised newspapers is supported via the International Image Interoperability Framework (IIIF) protocol.
Also here a growing number of free clients and libraries for IIIF in numerous programming languages are available on the web.
Currently, digitised newspaper images and metadata can be retrieved by requests following this schema:
http://content.staatsbibliothek-berlin.de/zefys/SNP{ZDB-ID}-{YYYYMMDD}-{Issue}-{Page}-{Article}-{Version}
The ZDB-ID is a unique identifier for every newspaper title and that can be found either within the ZEFYS newspaper portal or directly from the ZDB.
Next, a date of issue needs to be specified in the YYYYMMDD format, so e.g. 18900101 for the issue published on January 1st, 1890. The information about wich date ranges have already been digitised per newspaper title can again be found in the ZEFYS newspaper portal.
By then combining the page number 0 with the ending .xml in the URL, the metadata METS document for each newspaper title can be obtained, e.g.
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-0-0-0.xml
[Please note that the functionality described here for retrieving OCR data is not currently implemented yet!]

By incrementing the page number, the OCRed text files in ALTO format can be requested:

http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0.xml for page 1,

http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-2-0-0.xml for page 2 asf.
To retrieve the scanned images for the newspaper, further information needs to be specified in the URL, such as the addition of  /full/{width in pixel},/0/default.jpg with width in pixel having the supported options 1200, 800, 250, e.g.
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0/full/1200,/0/default.jpg
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0/full/800,/0/default.jpg
http://content.staatsbibliothek-berlin.de/zefys/SNP27971740-19070101-0-1-0-0/full/250,/0/default.jpg
It is also possible to retrieve the original TIFF images via IIIF by replacing the width in pixel with full and specifying default.tif instead of default.jpg in the URL like:

http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0/full/full/0/default.tif
Some working examples:

http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0/full/full/0/default.tif -> TIF page 1

http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0/full/1200,/0/default.jpg -> JPG page 1

http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-10-0/full/1200,/0/default.jpg -> JPG page 1 (article 10 highlighted)

http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0.pdf -> PDF page 1

http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-0-0-0.pdf -> PDF all pages

http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-1-0-0.xml -> ALTO page 1

http://content.staatsbibliothek-berlin.de/zefys/SNP26120215-19630207-0-0-0-0.xml -> METS