Skip to content

Instantly share code, notes, and snippets.

@jbfink
Created June 9, 2014 14:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jbfink/9ea54cafda09a2b1af75 to your computer and use it in GitHub Desktop.
Save jbfink/9ea54cafda09a2b1af75 to your computer and use it in GitHub Desktop.
/home/jbfink/.rbenv/versions/2.1.2/gemsets/ruby-marc/gems/marc-0.8.1/lib/marc/marc8/to_unicode.rb:163:in `rescue in transcode': MARC8, input byte offset 20, code set: 0x45, code point: 0xa0 (Encoding::InvalidByteSequenceError)
from /home/jbfink/.rbenv/versions/2.1.2/gemsets/ruby-marc/gems/marc-0.8.1/lib/marc/marc8/to_unicode.rb:144:in `transcode'
from /home/jbfink/.rbenv/versions/2.1.2/gemsets/ruby-marc/gems/marc-0.8.1/lib/marc/reader.rb:397:in `set_encoding'
from /home/jbfink/.rbenv/versions/2.1.2/gemsets/ruby-marc/gems/marc-0.8.1/lib/marc/reader.rb:359:in `block (2 levels) in decode'
from /home/jbfink/.rbenv/versions/2.1.2/gemsets/ruby-marc/gems/marc-0.8.1/lib/marc/reader.rb:358:in `each'
from /home/jbfink/.rbenv/versions/2.1.2/gemsets/ruby-marc/gems/marc-0.8.1/lib/marc/reader.rb:358:in `block in decode'
from /home/jbfink/.rbenv/versions/2.1.2/gemsets/ruby-marc/gems/marc-0.8.1/lib/marc/reader.rb:307:in `upto'
from /home/jbfink/.rbenv/versions/2.1.2/gemsets/ruby-marc/gems/marc-0.8.1/lib/marc/reader.rb:307:in `decode'
from /home/jbfink/.rbenv/versions/2.1.2/gemsets/ruby-marc/gems/marc-0.8.1/lib/marc/reader.rb:247:in `each'
from slice.rb:3:in `<main>'
Copy link

ghost commented Jun 9, 2014

http://rubydoc.info/github/ruby-marc/ruby-marc/MARC/Reader

Relevant part:
If you have Marc8 data, you really want to convert it to UTF8 outside of ruby, but if you can't:
MARC::Reader.new("marc8.marc" :external_encoding => "binary")
But you probably will have problems subsequently in your own code using the MARC::Record.

Copy link

ghost commented Jun 9, 2014

Is it actually compiled/binary marc or that weird text representation? If the latter, is it UTF-8?

@jbfink
Copy link
Author

jbfink commented Jun 9, 2014

It's the Harvard dataset here: http://openmetadata.lib.harvard.edu/bibdata . file / magic sez it's MARC-21 Bibliographic.

@jbfink
Copy link
Author

jbfink commented Jun 9, 2014

Binary totally works and is probably close enough for government work. I don't want (at this stage) to mess with Catmandu or other things to convert it -- maybe later. Thanks man, you're the best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment