Skip to content

Instantly share code, notes, and snippets.

View billdueber's full-sized avatar

Bill Dueber billdueber

View GitHub Profile
billdueber / suss_example.rb
Created November 10, 2010 18:05
Example of how to use jruby_streaming_update_solr_server
require 'rubygems'
require 'threach'
require 'jruby_streaming_update_solr_server'
solrURL = 'your solr url'
sussQueueSize = 128 # number of docs to queue up
sussThreads = 1 # number of threads to use to send stuff to solr
threads = 3 # number of threads to use to process the data
billdueber / Hathi catalog.txt
Created December 2, 2010 14:52
All 035 types with more than 1000 records in UMich/HathiTrust
1006 DLI
1040 GyWOH
1059 SciDir
1062 NYU
1077 ItFiC
1150 CSt
1175 FrPJT
1176 DGPO
1182 NjP
1196 BLKDR
billdueber /
Created February 4, 2011 16:02
Create a set of simple sitemap files for google to crawl
# Then just create a simple XML file pointing to the 50k line files
# Don't forget to gzip the files first
my $numfiles = ARGV[0]; # number of files generated before
my $urlToSitemapDir = '';
print q{<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="">
billdueber / ruby-marc-unicode.rb
Created May 31, 2011 20:39
Simple program to show 1.8 vs 1.9 rubymarc issues with unicode
require 'rubygems'
require 'marc'
require 'open-uri'
r ='')).first
puts r
billdueber / breakjump.rb
Created June 24, 2011 18:57
Log and inability to catch breakjump under jruby 1.6.2
require 'java'
def showBreakProblem &blk
threads = 2
consumers = []
threads.times do |i|
consumers << do |i|
Thread.current[:num] = i
billdueber / gist:1154163
Created August 18, 2011 14:29
OSX command-line args
1. Go to the app directory
cd /Applications/Google\
2. Rename the app to app.orig
mv Google\ Chrome Google\ Chrome.orig
3. Create a shell script with the original name that uses the args you want
billdueber / parser1.rb
Created November 18, 2011 05:26
Simple term parser
require 'parslet'
require 'pp'
class AdvParser < Parslet::Parser
rule(:space) { match['\\s\\t'].repeat(1) } # at least one space/tab
rule(:space?) { space.maybe } # zero or 1 things that match the 'space' rule
rule(:startexpr) { str('(') >> space? } # '(' followed by optional space
rule(:endexpr) { space? >> str(')') }
billdueber / extend.rb
Created December 19, 2011 16:51
JRuby: #extend 10x slower in 1.7 w/OpenJDK 1.7?
module A
def a
class C
n = 100_000
billdueber /
Created January 25, 2012 18:00
Ruby-marc slow on strict parser

I upgraded my ruby 1.8 to the latest patchlevel and all of a sudden ruby-marc was super-slow. I found the same thing on 1.9 and in JRuby, so I investigated.

There's a marc.bytes.to_a call inside the loop in Reader#decode. All the fix does is move it outside the loop so it only happens once.

You can see the patch in the "slowreadfix" branch at

As you can see, the speedup is about a factor of five.

Test case is reading in a Marc21 file with about 18K records in it.