Eric Hellman eshellman

## pgdata2.asciidoc

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              2 stars
            
          
                eshellman
                / pgdata2.asciidoc
            
            
              Last active
              August 29, 2015 14:17
            
              
                Combing through the PG metadata
              
          
GITenberg metadata


Part 2. Combing through the Gutenberg metadata


To recap Part 1, the Project Gutenberg metadate boils down to the following, expressed in YAML


# Project Gutenberg Metadata
pgterms:ebook:
    url: http://www.gutenberg.org/ebooks/20728


## pgmetadata3.asciidoc

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                eshellman
                / pgmetadata3.asciidoc
            
            
              Last active
              August 29, 2015 14:17
            
              
                Other sources of metadata
              
          
GITenberg metadata


Part 3. Other sources of metadata


In Part 2 we looked at the data already in project Gutenberg. Now we’re going to want to bring in metadata from other sources. OpenLibrary is a source of metadata with a reasonably well designed API. It returns JSON, which can be readily converted to yaml


The OpenLibrary metadata for one edition (manifestation) of Space Viking is here:


olid:OL7526155M:


## pgdata4.asciidoc

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                eshellman
                / pgdata4.asciidoc
            
            
              Last active
              August 29, 2015 14:17
            
          
GITenberg metadata


Part 4. Metadata that’s needed, but missing. Also, covers.


In Part 3, I looked at the metadata that’s available via api from other metadata sources. But there’s a bunch of metadata internal to GITenberg (and to a lesser extent, Project Gutenberg) that should probably be included in a GITenberg metadata file.


Housekeeping data


the Repo URL. For Space Viking, that would be https://github.com/GITenberg/Space-Viking_20728 Unfortunately, because of unicode and OS level issues, it’s not as simple as you might think to derive the url from other metadata. And maybe for portability, we should just keep "Space-Viking_20728"


Download URLs. Well maybe not. It probably makes more sense to have a resolver service separate from the repo.


version info. Again, maybe not. It probably makes sense to use Github’s versioning to keep track of this. On the other hand, downstream sites will need to know this stuff. But a MARC record build


## status2.asciidoc

      
              1 file
            
          
              1 fork
            
          
              2 comments
            
          
              0 stars
            
          
                eshellman
                / status2.asciidoc
            
            
              Last active
              August 29, 2015 14:20
            
          
GITenberg Status Report #2


Quite a bit of work has occurred since our last status report, though it’s rather scattered work in progress and still needs to be put together and documented.


We have about 10 PG texts converted to asciidoc


We have a working asciidoc-to-epub build machine


We have the start of a django website


We’ll be at BEA


We’ll be participating in a hackathon in SFO in June


## dual_metadata.asciidoc

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                eshellman
                / dual_metadata.asciidoc
            
            
              Last active
              August 29, 2015 14:25
            
              
                dual metadata repos.
              
          
dual metadata repos.


There’s a problem I’ve been stewing over.


What’s the best way to organize collections of metadata into repos?


Solution 1: keep the metadata file in the repo with the book. People who know the book are best positioned to accept pull requests.


Solution 2: keep the metadata files in a separate repo just for metadata. Metadata specialists are better positioned to make sure the metadata is clean and uniform.


## keybase.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                eshellman
                / keybase.md
            
            
              Last active
              December 22, 2015 17:26
            
          
    Keybase proof

I hereby claim:

I am eshellman on github.
I am gluejar (https://keybase.io/gluejar) on keybase.
I have a public key whose fingerprint is 0CB5 9AA5 2F64 E935 5A91  86B0 9718 4741 A983 947C

To claim this, I am signing this object:

  
## to_remove.tsv

          
            50
            Pi_50
            Dataset

            
              52
              The-Square-Root-of-2_52
              Dataset

            
              65
              The-First-100-000-Prime-Numbers_65
              Dataset

            
              114
              The-Tenniel-Illustrations-for-Carroll-s-Alice-in-Wonderland_114
              StillImage

            
              116
              Motion-Pictures-of-the-Apollo-11-Lunar-Landing_116
              MovingImage

            
              127
              The-Number--e-_127
              Dataset

            
              129
              The-Square-Root-of-2_129
              Dataset

            
              239
              Radar-Map-of-the-United-States_239
              StillImage

            
              256
              Motion-Picture-of-Rotating-Earth_256
              MovingImage

            
              628
              The-Square-Root-of-3_628
              Dataset

## pg metadata.asciidoc

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              3 stars
            
          
                eshellman
                / pg metadata.asciidoc
            
            
              Last active
              May 21, 2020 21:01
            
          
GITenberg metadata


Part 1. Boiling down the Gutenberg RDF


One of the objectives of gitenberg is to provide a github-flavored pathway for the improvement of the metadata for Project Gutenberg ebooks. This runs in two directions:
. Improving the accessibility an usability of PG metadata
. Improving the quality and completeness of PG metadata


The first step in this effort is to figure out what metadata already exists in Project Gutenberg.


Project Gutenberg provides periodic dumps of its metadata in the form of RDF. These are the metadata used to make the "bibrec" pages and also to make ebook files (an epub package, for example, stores this metadata in its "OPF" file). The dump consists of a zipped tarfile containing one rdf file per pg text. In the second tranche of repos moved to github, (roughly those above 10,000) Seth added the rdf file for each text to the corresponding archive. This saves you from having to deal with opening the archive and letting your operating system deal with 50,000 directori


## space.html
<html>
  <head>
    <title>redspace</title>
<style type="text/css">

.bold {
    font-style: bold;
    background-color: red;
}
</style>
50	Pi_50	Dataset
52	The-Square-Root-of-2_52	Dataset
65	The-First-100-000-Prime-Numbers_65	Dataset
114	The-Tenniel-Illustrations-for-Carroll-s-Alice-in-Wonderland_114	StillImage
116	Motion-Pictures-of-the-Apollo-11-Lunar-Landing_116	MovingImage
127	The-Number--e-_127	Dataset
129	The-Square-Root-of-2_129	Dataset
239	Radar-Map-of-the-United-States_239	StillImage
256	Motion-Picture-of-Rotating-Earth_256	MovingImage
628	The-Square-Root-of-3_628	Dataset
	<html>
	<head>
	<title>redspace</title>
	<style type="text/css">

	.bold {
	font-style: bold;
	background-color: red;
	}
	</style>