Skip to content

Embed URL

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
#!/usr/bin/ruby -w
require 'set'
Entry = Struct.new :id, :instance do
def self.parse(line)
if /ID=\s*'([^']*)'\s+INSTANCE=\s*'([^']*)'/ =~ line
new $1, $2
else
raise "Cannot parse: %p" % line
end
end
end
entries = Set.new
ARGV.each do |file|
File.foreach file do |line|
begin
entry = Entry.parse(line)
if entries.include? entry
puts line
else
entries << entry
end
rescue
# Ignore lines that don't parse
end
end
end
@presidentbeef

With 1,000,000 entries and 452 duplicates:

$ /usr/bin/time -v ruby doit.rb input > /dev/null
    Command being timed: "ruby doit.rb input"
    User time (seconds): 9.78
    System time (seconds): 0.14
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.93
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 913760
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 57211
    Voluntary context switches: 3
    Involuntary context switches: 1013
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
@presidentbeef

And with Robert's original:

$ /usr/bin/time -v ruby doit_orig.rb input > /dev/null
    Command being timed: "ruby doit_orig.rb input"
    User time (seconds): 16.28
    System time (seconds): 0.19
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.50
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 913344
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 57191
    Voluntary context switches: 3
    Involuntary context switches: 1656
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

So no memory savings, but faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.