Skip to content

Instantly share code, notes, and snippets.

@documentcloud
Created October 5, 2009 15:27
Show Gist options
  • Save documentcloud/202188 to your computer and use it in GitHub Desktop.
Save documentcloud/202188 to your computer and use it in GitHub Desktop.
full_text = Nokogiri::XML::Node.new('full_text', xml)
full_text.content = File.read(path).gsub(/[^[:print:]]/, '')
existence = Nokogiri::XML::Node.new('exists', xml)
existence.content = '1'
# Alternate approaches that don't work...
# to_xs is way, way too slow for production -- especially if
# we're rebuilding the index all the time. At least parallelize it
# in CloudCrowd.
# full_text.content = File.read(path).to_xs
# full_text << Nokogiri::XML::CDATA.new(xml, File.read(path).to_xs)
# full_text.content = `iconv -f UTF-8 -t UTF-8 #{path}`
# full_text.content = File.read(path).unpack('C*').pack('U*')
# full_text.content = converter.iconv(File.read(path) << ' ')[0..-2]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment