Skip to content

Instantly share code, notes, and snippets.

View edsu's full-sized avatar

Ed Summers edsu

View GitHub Profile
NDNP v3.14159 Batch Data Specification
Please digitize your newspaper pages as TIFF files and place them in a BagIt directory with a unique name of your choosing with the following structure.
batch_dlc_nirvana/
|-- bag-info.txt
|-- bagit.txt
|-- data
| `-- sn83030214
| |-- 1880-01-09
@edsu
edsu / quadtab.pl
Created September 11, 2010 02:00
#!/usr/bin/env perl
# this script reads n-quads on stdin, and writes the same quads separated with tabs to stdout
# the thought being that it is then easier to munge with unix tools like split, grep, etc
# more about n-quads can be found at: http://sw.deri.org/2008/07/n-quads/
use strict;
my $space = qr/ /;
my $uri = qr/<.+?>/;
# Hosts that serve up SKOS in the Billion Triple Challenge dataset:
#
# http://challenge.semanticweb.org/
#
# Results are ordered by the number of SKOS triples from the host, and were calculated with the
# following command:
#
# zgrep 'http://www.w3.org/2004/02/skos/core' btc-2010-chunk-*.gz | quadtabs.pl | cut -d "<ctrl-v><tab>" -f 4 | sort | uniq -c | sort -rn
#
# where quadtabs.pl = http://gist.github.com/574679
# below are domains that use the SKOS mapping properties to connect concepts together
10571 lod.geospecies.org -> species.geospecies.org
7343 lod.geospecies.org -> bio2rdf.org
7343 bio2rdf.org -> lod.geospecies.org
7177 www.uniprot.org -> lod.geospecies.org
7177 lod.geospecies.org -> www.uniprot.org
5190 lod.geospecies.org -> dbpedia.org
5190 dbpedia.org -> lod.geospecies.org
1901 psh.ntkcz.cz -> id.loc.gov
# These are hosts that link to each other using SKOS mapping properties
# in the Billion Triple Challenge data.
#
# It might be more accurate to use the related skos:ConceptScheme but that
# will take some more work.
15777 lod.geospecies.org -> species.geospecies.org
10868 lod.geospecies.org -> bio2rdf.org
10868 bio2rdf.org -> lod.geospecies.org
10635 www.uniprot.org -> lod.geospecies.org
# The following are hostnames that assert owl:sameAs relations between their resources
# in the billion triple challenge data set. They are ordered by the number of links
# between the hosts.
#
# zgrep -h 'http://www.w3.org/2002/07/owl#sameAs' btc-2010-chunk-*.gz | quadtab.pl | sameas.py | sort | uniq -c | sort -rn > sameas.txt
#
# where quadtab.pl = http://gist.github.com/574679
# and sameas.py = http://gist.github.com/578810
366902 dblp.l3s.de -> bibsonomy.org
@edsu
edsu / sameas.py
Created September 14, 2010 09:53
#!/usr/bin/env python
import re
import urlparse
import fileinput
def urlize(s):
return s.lstrip('<').rstrip('>')
for line in fileinput.input():
esummers@roentgenium:~$ sudo dpkg -i 4store_1.0.3-2_i386.deb
[sudo] password for esummers:
(Reading database ... 61683 files and directories currently installed.)
Preparing to replace 4store 1.0.3-2 (using 4store_1.0.3-2_i386.deb) ...
Unpacking replacement 4store ...
dpkg: dependency problems prevent configuration of 4store:
4store depends on libncurses5 (>= 5.7+20100313); however:
Version of libncurses5 on system is 5.7+20090803-2ubuntu3.
4store depends on librasqal2 (>= 0.9.18); however:
Version of librasqal2 on system is 0.9.17-1.
ed@rorty:~/Projects/bagit$ export PS1='\n\[\033[35m\].-(\[\033[33m\]\u@\h \[\033[36m\]\t\[\033[35m\]) \[\033[0m\]\w\n\[\033[35m\]\`-->\[\033[0m\]'
.-(ed@rorty 15:41:18) ~/Projects/bagit
`-->ls
bagit.egg-info bagit.pyc bench.py README test-data test.pyc
bagit.py bench-data build setup.py test.py
.-(ed@rorty 15:41:20) ~/Projects/bagit
`-->ls
bagit.egg-info bagit.pyc bench.py README test-data test.pyc
266 <http://purl.org/dc/terms/language>
416 <http://purl.org/vocab/relationship/grandparentOf>
484 <http://www.w3.org/2000/01/rdf-schema#seeAlso>
493 <http://purl.org/vocab/relationship/grandchildOf>
1501 <http://d-nb.info/gnd/predecessorWithoutSuccessor>
2764 <http://metadataregistry.org/uri/schema/RDARelationshipsGR2/relatedPersonPerson>
3891 <http://purl.org/vocab/relationship/siblingOf>
4635 <http://d-nb.info/gnd/invalidIdentifierForTheSubject>
4761 <http://d-nb.info/gnd/useConceptsInsteadSWD>
4761 <http://www.w3.org/2000/01/rdf-schema#label>