Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Convert a CTS repository of XML files to both tabular and graph representation
/*
Simple groovy script to convert a CTS repository of local XML files to :
1. tabular representation in structured text files
2. graph representation in RDF (TTL)
Usage: groovy cts.groovy <INVENTORY> <ARCHIVEROOT> <SCHEMA>
where INVENTORY is a CTS TextInventory file, ARCHIVEROOT is the root directory
where XML editions are stored, and SCHEMA is the Relax NG schema for validating
the TextInventory.
Run this in a directory where you have write permission; tabular text files
will be put a in a directory named "cts-tabulated" which will be created if it does
not exist. RDF output will be in a file named "cts.ttl" in the current directory.
*/
@GrabResolver(name='beta', root='http://beta.hpcc.uh.edu/nexus/content/repositories/releases')
@Grab(group='edu.holycross.shot', module='hocuspocus', version='0.13.5')
import edu.holycross.shot.hocuspocus.Corpus
if (args.size() != 3) {
System.err.println "Usage: groovy cts.groovy <INVENTORY> <ARCHIVEROOT> <SCHEMA>"
System.exit(-1)
}
// Inputs:
File inv = new File(args[0])
File textArchive = new File(args[1])
File schema = new File(args[2])
// Outputs:
File tabs = new File("cts-tabulated")
if (! tabs.exists()) {
tabs.mkdir()
}
File ttl = new File("cts.ttl")
Corpus c = new Corpus(inv, textArchive, schema)
c.ttl(ttl, true, tabs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.