Skip to content

Instantly share code, notes, and snippets.

View eshellman's full-sized avatar

Eric Hellman eshellman

View GitHub Profile
@eshellman
eshellman / space.html
Created April 29, 2021 21:59
example file where tidy moves whitespace
<html>
<head>
<title>redspace</title>
<style type="text/css">
.bold {
font-style: bold;
background-color: red;
}
</style>
50 Pi_50 Dataset
52 The-Square-Root-of-2_52 Dataset
65 The-First-100-000-Prime-Numbers_65 Dataset
114 The-Tenniel-Illustrations-for-Carroll-s-Alice-in-Wonderland_114 StillImage
116 Motion-Pictures-of-the-Apollo-11-Lunar-Landing_116 MovingImage
127 The-Number--e-_127 Dataset
129 The-Square-Root-of-2_129 Dataset
239 Radar-Map-of-the-United-States_239 StillImage
256 Motion-Picture-of-Rotating-Earth_256 MovingImage
628 The-Square-Root-of-3_628 Dataset
@eshellman
eshellman / dual_metadata.asciidoc
Last active August 29, 2015 14:25
dual metadata repos.

dual metadata repos.

There’s a problem I’ve been stewing over.

What’s the best way to organize collections of metadata into repos?

Solution 1: keep the metadata file in the repo with the book. People who know the book are best positioned to accept pull requests.

Solution 2: keep the metadata files in a separate repo just for metadata. Metadata specialists are better positioned to make sure the metadata is clean and uniform.

GITenberg Status Report #2

Quite a bit of work has occurred since our last status report, though it’s rather scattered work in progress and still needs to be put together and documented.

  • We have about 10 PG texts converted to asciidoc

  • We have a working asciidoc-to-epub build machine

  • We have the start of a django website

  • We’ll be at BEA

  • We’ll be participating in a hackathon in SFO in June

GITenberg metadata

Part 4. Metadata that’s needed, but missing. Also, covers.

In Part 3, I looked at the metadata that’s available via api from other metadata sources. But there’s a bunch of metadata internal to GITenberg (and to a lesser extent, Project Gutenberg) that should probably be included in a GITenberg metadata file.

Housekeeping data

  • the Repo URL. For Space Viking, that would be https://github.com/GITenberg/Space-Viking_20728 Unfortunately, because of unicode and OS level issues, it’s not as simple as you might think to derive the url from other metadata. And maybe for portability, we should just keep "Space-Viking_20728"

  • Download URLs. Well maybe not. It probably makes more sense to have a resolver service separate from the repo.

  • version info. Again, maybe not. It probably makes sense to use Github’s versioning to keep track of this. On the other hand, downstream sites will need to know this stuff. But a MARC record build

@eshellman
eshellman / pgmetadata3.asciidoc
Last active August 29, 2015 14:17
Other sources of metadata

GITenberg metadata

Part 3. Other sources of metadata

In Part 2 we looked at the data already in project Gutenberg. Now we’re going to want to bring in metadata from other sources. OpenLibrary is a source of metadata with a reasonably well designed API. It returns JSON, which can be readily converted to yaml

The OpenLibrary metadata for one edition (manifestation) of Space Viking is here:

olid:OL7526155M:
@eshellman
eshellman / pgdata2.asciidoc
Last active August 29, 2015 14:17
Combing through the PG metadata

GITenberg metadata

Part 2. Combing through the Gutenberg metadata

To recap Part 1, the Project Gutenberg metadate boils down to the following, expressed in YAML

# Project Gutenberg Metadata
pgterms:ebook:
    url: http://www.gutenberg.org/ebooks/20728

GITenberg metadata

Part 1. Boiling down the Gutenberg RDF

One of the objectives of gitenberg is to provide a github-flavored pathway for the improvement of the metadata for Project Gutenberg ebooks. This runs in two directions: . Improving the accessibility an usability of PG metadata . Improving the quality and completeness of PG metadata

The first step in this effort is to figure out what metadata already exists in Project Gutenberg.

Project Gutenberg provides periodic dumps of its metadata in the form of RDF. These are the metadata used to make the "bibrec" pages and also to make ebook files (an epub package, for example, stores this metadata in its "OPF" file). The dump consists of a zipped tarfile containing one rdf file per pg text. In the second tranche of repos moved to github, (roughly those above 10,000) Seth added the rdf file for each text to the corresponding archive. This saves you from having to deal with opening the archive and letting your operating system deal with 50,000 directori

Keybase proof

I hereby claim:

  • I am eshellman on github.
  • I am gluejar (https://keybase.io/gluejar) on keybase.
  • I have a public key whose fingerprint is 0CB5 9AA5 2F64 E935 5A91 86B0 9718 4741 A983 947C

To claim this, I am signing this object: