Skip to content

Instantly share code, notes, and snippets.

@micronax
Created November 11, 2013 10:50
Show Gist options
  • Save micronax/7411367 to your computer and use it in GitHub Desktop.
Save micronax/7411367 to your computer and use it in GitHub Desktop.
This small ruby-script removes all duplicates from the lines in a given file while the fields are separated by a given separator. Example: a;b;c;d;e;f;d;e;g;j;k;l gets cleaned into a;b;c;d;e;f;g;j;k;l
#!/usr/bin/env ruby
require 'csv'
/* SETTINGS */
inputFile = './input.csv'
outputFile = './output.csv'
separator = ";"
/* DONT CHANGE BELOW */
output = File.open(outputFile, 'w')
count = %x{wc -l #{inputFile}}.split.first.to_i + 1
print "Starting uniq-operation...\n"
i = 1
CSV.foreach(inputFile) do |row|
print "\rProcessing line " + i.to_s() + " of " + count.to_s()
if row[0] != nil
data = row[0].split(separator)
uniq = data.uniq
ssv = uniq.join(separator)
else
ssv = ""
end
output.write ssv + "\n"
i = i+1
end
print "\r\nOperation Complete!\n"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment