To recap Part 1, the Project Gutenberg metadate boils down to the following, expressed in YAML
# Project Gutenberg Metadata
pgterms:ebook:
url: http://www.gutenberg.org/ebooks/20728
To recap Part 1, the Project Gutenberg metadate boils down to the following, expressed in YAML
# Project Gutenberg Metadata
pgterms:ebook:
url: http://www.gutenberg.org/ebooks/20728
In Part 2 we looked at the data already in project Gutenberg. Now we’re going to want to bring in metadata from other sources. OpenLibrary is a source of metadata with a reasonably well designed API. It returns JSON, which can be readily converted to yaml
The OpenLibrary metadata for one edition (manifestation) of Space Viking is here:
olid:OL7526155M:
In Part 3, I looked at the metadata that’s available via api from other metadata sources. But there’s a bunch of metadata internal to GITenberg (and to a lesser extent, Project Gutenberg) that should probably be included in a GITenberg metadata file.
the Repo URL. For Space Viking, that would be https://github.com/GITenberg/Space-Viking_20728 Unfortunately, because of unicode and OS level issues, it’s not as simple as you might think to derive the url from other metadata. And maybe for portability, we should just keep "Space-Viking_20728"
Download URLs. Well maybe not. It probably makes more sense to have a resolver service separate from the repo.
version info. Again, maybe not. It probably makes sense to use Github’s versioning to keep track of this. On the other hand, downstream sites will need to know this stuff. But a MARC record build
Quite a bit of work has occurred since our last status report, though it’s rather scattered work in progress and still needs to be put together and documented.
We have about 10 PG texts converted to asciidoc
We have a working asciidoc-to-epub build machine
We have the start of a django website
We’ll be at BEA
We’ll be participating in a hackathon in SFO in June
There’s a problem I’ve been stewing over.
What’s the best way to organize collections of metadata into repos?
Solution 1: keep the metadata file in the repo with the book. People who know the book are best positioned to accept pull requests.
Solution 2: keep the metadata files in a separate repo just for metadata. Metadata specialists are better positioned to make sure the metadata is clean and uniform.
I hereby claim:
To claim this, I am signing this object:
50 | Pi_50 | Dataset | |
---|---|---|---|
52 | The-Square-Root-of-2_52 | Dataset | |
65 | The-First-100-000-Prime-Numbers_65 | Dataset | |
114 | The-Tenniel-Illustrations-for-Carroll-s-Alice-in-Wonderland_114 | StillImage | |
116 | Motion-Pictures-of-the-Apollo-11-Lunar-Landing_116 | MovingImage | |
127 | The-Number--e-_127 | Dataset | |
129 | The-Square-Root-of-2_129 | Dataset | |
239 | Radar-Map-of-the-United-States_239 | StillImage | |
256 | Motion-Picture-of-Rotating-Earth_256 | MovingImage | |
628 | The-Square-Root-of-3_628 | Dataset |
One of the objectives of gitenberg is to provide a github-flavored pathway for the improvement of the metadata for Project Gutenberg ebooks. This runs in two directions: . Improving the accessibility an usability of PG metadata . Improving the quality and completeness of PG metadata
The first step in this effort is to figure out what metadata already exists in Project Gutenberg.
Project Gutenberg provides periodic dumps of its metadata in the form of RDF. These are the metadata used to make the "bibrec" pages and also to make ebook files (an epub package, for example, stores this metadata in its "OPF" file). The dump consists of a zipped tarfile containing one rdf file per pg text. In the second tranche of repos moved to github, (roughly those above 10,000) Seth added the rdf file for each text to the corresponding archive. This saves you from having to deal with opening the archive and letting your operating system deal with 50,000 directori
<html> | |
<head> | |
<title>redspace</title> | |
<style type="text/css"> | |
.bold { | |
font-style: bold; | |
background-color: red; | |
} | |
</style> |