Skip to content

Instantly share code, notes, and snippets.

Bill Dueber billdueber

View GitHub Profile
@billdueber
billdueber / zip_contents_summary.rb
Last active Feb 8, 2019
Summary of zipfile by mime type
View zip_contents_summary.rb
require 'zip'
require 'mimemagic'
zipfilename = ARGV[0]
class MimeStats
attr_accessor :type, :size, :csize, :count
def initialize(type, size = 0, csize = 0)
@type = type
@size = size
@billdueber
billdueber / test_solr_analysis.rb
Last active Jan 22, 2019
How to test solr analysis output against live solr
View test_solr_analysis.rb
require 'simple_solr_client'
# https://github.com/billdueber/simple_solr_client
client = SimpleSolrClient::Client.new("http://localhost:9639/solr")
# What do we have?
client.cores
#=> ['med']
core = client.core('med')
# Can get a list of the field types if you like:
@billdueber
billdueber / threaded_reader_bench.rb
Created Aug 10, 2018
Very dirty benchmark looking at a simple threaded marc-binary reader
View threaded_reader_bench.rb
require 'marc'
require 'benchmark'
require 'concurrent'
require 'concurrent-edge'
module MARC
class ThreadedReader < Reader
def each
records = Concurrent::Channel.new(capacity: 20)
@billdueber
billdueber / marc_vs_marc4j_bench.rb
Created Aug 9, 2018
Very rough look at parsing speed across marc and marc4j.
View marc_vs_marc4j_bench.rb
# A very, *very* imperfect bench, but gives us a rough idea
#
# tl;dr -- marc-binary is a wash, marc-xml is about 3.5 times faster using marc4j
#
# > bundle exec ruby --server bench.rb
# jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 Java HotSpot(TM) 64-Bit Server VM 25.112-b16 on 1.8.0_112-b16 +jit [darwin-x86_64]
# Warmup: 45. Runtime: 15
#
# Comparison:
# marc4j-xml: 8197.4 i/s
@billdueber
billdueber / multi_file.rb
Last active Jun 1, 2018
Enumerate over multiple files
View multi_file.rb
class MultiFile
include Enumerable
def initialize(filenames_or_handles, open_mode: 'r:utf-8')
@names_and_handles = Array(filenames_or_handles).map do |fn|
if fn.kind_of?(IO)
name = if fn.respond_to? :to_path
fn.to_path
else
View indexer.rb
require 'traject'
require_relative 'recusive_json_reader'
require 'traject/debug_writer'
settings do
store "reader_class_name", "MyJsonHierarchyReader"
store "writer_class_name", "Traject::DebugWriter"
store "output_file", "recursive.out"
end
View recursive_yield_example.rb
require 'json'
class MyJsonHierarchyReader
# @param [#each] input_stream Probably a file, but any iterator will do
# so long as it returns a valid JSON object from #each
def initialize(input_stream, settings)
# ... whatever you need to do. Let's pretend it's
# a newline-delimited JSON file, since you didn't
# specify anything
@json_lines = input_stream
@billdueber
billdueber / marc21_changed_code.rb
Last active Oct 31, 2017
changed code and the simplistic config used for the benchmark
View marc21_changed_code.rb
def extract_marc(spec, options = {})
# ... stuff deleted for clarity
## CREATE THE CHAIN
ppchain = Marc21.create_post_processing_chain(options, translation_map)
lambda do |record, accumulator, context|
accumulator.concat extractor.extract(record)
View Talk About Fedora.adoc

Samvera#General talking about Fedora

Tuesday, August 29, 2017

Mike Giarlo (5:56 PM)

Have folks here been hearing all manner of rumors today about Samvera, or certain Samvera institutions, walking away from Fedora and other community components? Some of us are hearing these rumors as of a few hours ago, and we’re trying to figure out where the misinformation is coming from.

It seems to center on Valkyrie. We did discuss Valkyrie and Fedora futures on today’s Fedora Leadership group, but not in the context the rumors are in.

View safer_reindex_everything.rb
require 'active-fedora'
require 'json'
def descendant_uris(uri)
begin
resource = Ldp::Resource::RdfSource.new(ActiveFedora.fedora.connection, uri)
rescue
STDERR.puts "Failed to create resource for uri #{uri}"
return []
end
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.