Skip to content

Instantly share code, notes, and snippets.

@billdueber
Last active December 24, 2015 10:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save billdueber/6784727 to your computer and use it in GitHub Desktop.
Save billdueber/6784727 to your computer and use it in GitHub Desktop.

Subject: ANNOUNCEMENT: Traject MARC->Solr indexer beta release

Jonathan Rochkind (Johns Hopkins), along with Bill Dueber (University of Michigan), is happy to announce a first beta release of "traject," a framework for indexing MARC data to Solr.

traject, in the vein of solrmarc, allows you to define your indexing rules using simple macro and translation files. However, traject runs under JRuby and is "ruby all the way down," so you can easily provide additional logic by simply requring ruby files.

traject is currently in a beta release, but is already being used in production to generated the HathiTrust Catalog (http://www.hathitrust.org/). traject was developed under a test-first mentality and has undergone both continuous integration and an extensive benchmarking/profiling period to keep it fast.

You can view the code[1] on github, and easily install it as a (jruby) gem using "gem install traject".

To get a feel for traject, there is a well-documented sample configuration[2] you can clone. It includes a guide to the most important moving parts[3], which will continue to be fleshed out.

Part of that guide is a document[4] describing the major features, but briefly they are:

  • It's all just well-crafted and documented ruby code; easy to program, easy to read, easy to modify (the whole code base is only 6400 lines of code, more than a third of which is tests)
  • Fast. Traject by default indexes using multiple threads, so you can use all your cores!
  • Decoupled from specific readers/writers, so you can use ruby-marc or marc4j to read, and write to solr, a debug file, or anywhere else you'd like with a little extra code.
  • Designed so it's easy to test your own code and distribute it as a gem

We're hoping to build up an ecosystem around traject and encourage people to ask questions and contribute code (either directly to the project or via gem releases).

[1] http://github.com/jrochkind/traject [2] http://github.com/billdueber/traject_sample [3] https://github.com/billdueber/traject_sample/tree/master/guide [4] https://github.com/billdueber/traject_sample/blob/master/guide/why_traject.md

@jrochkind
Copy link

hard to read since gist doesn't wrap .txt files. am I missing a way to look at this more readably?

but looks good on first skim.

@jrochkind
Copy link

I'll want to review the traject_sample stuff, I guess.

I am still dubious that all that is needed for people to get started, the traject_sample stuff. You really think the extensive README and additional docs in traject aren't enough, and the announcement needs to direct people to the sample stuff on top of the copious documentation in traject?

@jrochkind
Copy link

@billdueber
Copy link
Author

I really do think the sample config is necessary. For me, anyway, the difference between "build a file and try this" and "try this with an existing file you can skim through" is huge. traject is one of the simplest possible solutions to what is really a pretty complicated problem space; there's a lot to it, and seeing common usage, I think, is useful. And the research about scaffolded learning with just this sort of example stuff is pretty extensive, or at least it was 15 years ago when I was in that world. For all I know, current practice is to just ship a card with a link to Wikipedia on it :-)

But if no one clones it or cares about it, well, no harm done other than me wasting my time repurposing the hathitrust index stuff.

I'd be happy to change the announcement to push at the traject/README.md more heavily, if you're less comfortable with the directions to the traject_sample being front-and-center.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment