Skip to content

Instantly share code, notes, and snippets.

View billdueber's full-sized avatar

Bill Dueber billdueber

View GitHub Profile
@billdueber
billdueber / suss_example.rb
Created November 10, 2010 18:05
Example of how to use jruby_streaming_update_solr_server
require 'rubygems'
require 'threach'
require 'jruby_streaming_update_solr_server'
solrURL = 'your solr url'
sussQueueSize = 128 # number of docs to queue up
sussThreads = 1 # number of threads to use to send stuff to solr
threads = 3 # number of threads to use to process the data
@billdueber
billdueber / Hathi catalog.txt
Created December 2, 2010 14:52
All 035 types with more than 1000 records in UMich/HathiTrust
1006 DLI
1040 GyWOH
1059 SciDir
1062 NYU
1077 ItFiC
1150 CSt
1175 FrPJT
1176 DGPO
1182 NjP
1196 BLKDR
@billdueber
billdueber / sitemapindex.pl
Created February 4, 2011 16:02
Create a set of simple sitemap files for google to crawl
# Then just create a simple XML file pointing to the 50k line files
# Don't forget to gzip the files first
#!/usr/local/bin/perl
my $numfiles = ARGV[0]; # number of files generated before
my $urlToSitemapDir = 'http://www.my.machine.edu/dir/for/sitemaps';
print q{<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"explainOther":"",
"fl":"*,score",
"indent":"on",
"start":"0",
@billdueber
billdueber / ruby-marc-unicode.rb
Created May 31, 2011 20:39
Simple program to show 1.8 vs 1.9 rubymarc issues with unicode
require 'rubygems'
require 'marc'
require 'open-uri'
r = MARC::Reader.new(open('http://mirlyn.lib.umich.edu/Record/000039829.marc')).first
puts r
@billdueber
billdueber / breakjump.rb
Created June 24, 2011 18:57
Log and inability to catch breakjump under jruby 1.6.2
require 'java'
def showBreakProblem &blk
threads = 2
consumers = []
threads.times do |i|
consumers << Thread.new(i) do |i|
Thread.current[:num] = i
begin
@billdueber
billdueber / gist:1154163
Created August 18, 2011 14:29
OSX command-line args
1. Go to the app directory
cd /Applications/Google\ Chrome.app/Contents/MacOS/
2. Rename the app to app.orig
mv Google\ Chrome Google\ Chrome.orig
3. Create a shell script with the original name that uses the args you want
@billdueber
billdueber / parser1.rb
Created November 18, 2011 05:26
Simple term parser
require 'parslet'
require 'pp'
class AdvParser < Parslet::Parser
rule(:space) { match['\\s\\t'].repeat(1) } # at least one space/tab
rule(:space?) { space.maybe } # zero or 1 things that match the 'space' rule
rule(:startexpr) { str('(') >> space? } # '(' followed by optional space
rule(:endexpr) { space? >> str(')') }
@billdueber
billdueber / extend.rb
Created December 19, 2011 16:51
JRuby: #extend 10x slower in 1.7 w/OpenJDK 1.7?
module A
def a
end
end
class C
end
n = 100_000
@billdueber
billdueber / ruby-marc_bench.md
Created January 25, 2012 18:00
Ruby-marc slow on strict parser

I upgraded my ruby 1.8 to the latest patchlevel and all of a sudden ruby-marc was super-slow. I found the same thing on 1.9 and in JRuby, so I investigated.

There's a marc.bytes.to_a call inside the loop in Reader#decode. All the fix does is move it outside the loop so it only happens once.

You can see the patch in the "slowreadfix" branch at https://github.com/ruby-marc/ruby-marc/commit/beba83745ebe0848218496e967edd65d632fb01e

As you can see, the speedup is about a factor of five.

Test case is reading in a Marc21 file with about 18K records in it.