Skip to content

Instantly share code, notes, and snippets.

@jashmenn
Forked from jaydonnell/gist:239107
Created November 20, 2009 02:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jashmenn/239219 to your computer and use it in GitHub Desktop.
Save jashmenn/239219 to your computer and use it in GitHub Desktop.
# Use Jruby to read hadoop sequence files
def load_libs(libs)
Dir.glob(File.join(libs,"*.jar")).each { |f|
require f
}
end
load_libs ENV["HADOOP_HOME"]
load_libs File.join(ENV["HADOOP_HOME"], 'lib')
input,output,numlines = *ARGV
raise "Usage: #{$0} input output numlines" unless ARGV.size == 3
module H; end
module J; end
H::Configuration = Java::OrgApacheHadoopConf::Configuration
H::FileSystem = Java::OrgApacheHadoopFs::FileSystem
H::Path = Java::OrgApacheHadoopFs::Path
H::SequenceFile = Java::OrgApacheHadoopIo::SequenceFile
H::Writable = Java::OrgApacheHadoopIo::Writable
H::ReflectionUtils = Java::OrgApacheHadoopUtil::ReflectionUtils
J::URI = Java::JavaNet::URI
conf = H::Configuration.new
uri = input
fs = H::FileSystem.get(J::URI.create(uri), conf)
path = H::Path.new(uri)
uri2 = output
fs2 = H::FileSystem.get(J::URI.create(uri2), conf)
path2 = H::Path.new(uri2)
reader = H::SequenceFile::Reader.new(fs, path, conf)
writer = H::SequenceFile::Writer.new(fs2, conf, path2, reader.getKeyClass(), reader.getValueClass())
0.upto(numlines.to_i - 1) do |i|
key = H::ReflectionUtils.newInstance(reader.getKeyClass(), conf)
value = H::ReflectionUtils.newInstance(reader.getValueClass(), conf)
reader.next(key, value)
writer.append(key, value)
print "."
# puts key.to_s + " " + value.to_s # optional
end
puts
writer.close
puts "done"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment