Skip to content

Instantly share code, notes, and snippets.

@alexbevi
Last active January 14, 2021 16:31
Show Gist options
  • Save alexbevi/1b24b517b3990ae9f6cb95af8bbc3a23 to your computer and use it in GitHub Desktop.
Save alexbevi/1b24b517b3990ae9f6cb95af8bbc3a23 to your computer and use it in GitHub Desktop.
Dump BSON File Contents as JSON

dump_bson.rb

Usage

# Use -f/--filename to pass the file to emit JSON for
ruby dump_bson.rb -f auditLog_prod01.bson
....
{"atype":"authenticate","ts":"2020-11-21 06:25:22 UTC","local":{"ip":"127.0.0.1","port":27013},"remote":{"ip":"127.0.0.1","port":55842},"users":[{"user":"__system","db":"local"}],"roles":[],"param":{"user":"__system","db":"local","mechanism":"SCRAM-SHA-1"},"result":0}
{"atype":"shutdown","ts":"2020-11-21 06:25:23 UTC","local":{"ip":"127.0.0.1","port":27013},"remote":{"ip":"127.0.0.1","port":55842},"users":[{"user":"__system","db":"local"}],"roles":[],"param":{},"result":0}
# If there's a failure while parsing, the parser will break with an error indicating where the failure was
[pos: 62840087 len: 18415897] Attempted to read 18415897 bytes, but only 4170505 bytes remain

If there's an error, the position in the file (pos in the output) can be used with the -s/--skip parameter to restart at that position. This allows you to quickly resume without having to start over each time:

The -d/--debug flag outputs a short summary of successful entries as well as their position and length details.

ruby dump_bson.rb -f auditLog_prod01.bson -s 62839609 -d
[pos: 62839609 len: 271] {"atype":"authenticate","ts":"2 ...
[pos: 62839880 len: 207] {"atype":"shutdown","ts":"2020- ...
[pos: 62840087 len: 18415897] Attempted to read 18415897 bytes, but only 4170505 bytes remain
#!/usr/bin/env ruby
#
# dump_bson.rb
# Script to dump a BSON file to JSON usining the Ruby BSON library
#
# If corruption is found, additional options can be passed to attempt to troubleshoot and recover
#
require 'bundler/inline'
require 'optparse'
gemfile do
source 'https://rubygems.org'
gem 'bson'
end
def read_bson_document(io, params)
bb = BSON::ByteBuffer.new
off = io.pos
sz = io.read(4).unpack("V")[0]
bb.put_int32(sz)
raw = io.read(sz - 4)
bb.put_bytes(raw)
bb.rewind!
debug_output = "[pos: #{off} len: #{sz}] "
begin
doc = Hash.from_bson(bb)
debug_output += "#{doc.to_json[0..30]} ..."
return params[:debug] ? debug_output : doc.to_json
rescue => ex
msg = "#{debug_output}#{ex.message}"
if params[:dumpRawBytesOnError] > 0
io.seek(off, IO::SEEK_SET)
raw = io.read(params[:dumpRawBytesOnError]).unpack("C*")
msg += "\nRaw: #{raw}"
end
if params[:bruteForceOnError]
io.seek(off + params[:bruteForceOnErrorBytes], IO::SEEK_SET)
return params[:debug] ? "#{msg} ... shifting #{params[:bruteForceOnErrorBytes]} byte(s) and retrying ..." : nil
end
puts msg
exit
end
end
params = {}
parser = OptionParser.new do |opts|
opts.banner = "Usage: dump_bson.rb [options]"
opts.on("-h", "--help", "Prints this help") do
puts opts
exit
end
opts.on('-f FILENAME', '--filename FILENAME', String, '[Required] Filename of the BSON file to process')
opts.on('-s NUM', '--skip NUM', Integer, 'Number of bytes to seek into the file before parsing')
opts.on('-d', '--debug', 'Output debug information about each document instead of JSON')
opts.on('--dumpRawBytesOnError NUM', Integer, 'Number of bytes to dump to STDOUT on error')
opts.on('--bruteForceOnError', 'Keep retrying, advancing <bruteForceOnErrorBytes> byte(s) at a time on error')
opts.on('--bruteForceOnErrorBytes NUM', Integer, 'Bytes to advance when <bruteForceOnError> is set')
end
parser.parse!(into: params)
params[:debug] ||= false
params[:skip] ||= 0
params[:dumpRawBytesOnError] ||= 0
params[:bruteForceOnErrorBytes] ||= 1
if params[:filename].nil?
puts "--filename not provided"
puts parser.help
exit
end
io = File.new(params[:filename])
io.seek(params[:skip], IO::SEEK_SET)
while !io.eof? do
result = read_bson_document(io, params)
STDOUT << result << "\n" unless result.nil?
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment