Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save presidentbeef/7156955 to your computer and use it in GitHub Desktop.
Save presidentbeef/7156955 to your computer and use it in GitHub Desktop.
#!/usr/bin/ruby -w
require 'set'
Entry = Struct.new :id, :instance do
def self.parse(line)
if /ID=\s*'([^']*)'\s+INSTANCE=\s*'([^']*)'/ =~ line
new $1, $2
else
raise "Cannot parse: %p" % line
end
end
end
entries = Set.new
ARGV.each do |file|
File.foreach file do |line|
begin
entry = Entry.parse(line)
if entries.include? entry
puts line
else
entries << entry
end
rescue
# Ignore lines that don't parse
end
end
end
@presidentbeef
Copy link
Author

With 1,000,000 entries and 452 duplicates:

$ /usr/bin/time -v ruby doit.rb input > /dev/null
    Command being timed: "ruby doit.rb input"
    User time (seconds): 9.78
    System time (seconds): 0.14
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.93
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 913760
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 57211
    Voluntary context switches: 3
    Involuntary context switches: 1013
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

@presidentbeef
Copy link
Author

And with Robert's original:

$ /usr/bin/time -v ruby doit_orig.rb input > /dev/null
    Command being timed: "ruby doit_orig.rb input"
    User time (seconds): 16.28
    System time (seconds): 0.19
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.50
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 913344
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 57191
    Voluntary context switches: 3
    Involuntary context switches: 1656
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

So no memory savings, but faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment