Created
June 9, 2011 15:04
-
-
Save dpritchett/1016907 to your computer and use it in GitHub Desktop.
Deduplicate a sorted CSV file, adding a rowcount column
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def clean_line some_text | |
# Takes a comma delimited string, returns an array of fields with leading | |
# and trailing whitespace stripped out. | |
some_text.gsub(/\"|\n/, "").split(',').each { |x| x.strip! } | |
end | |
def spit_line counter, some_line | |
# Takes a counter and an array of strings, prints them in counter,strings | |
# CSV format. | |
print "#{counter}" | |
some_line.each { |x| print ",#{x}"} | |
print "\n" | |
STDOUT.flush | |
end | |
# init | |
counter = 1 | |
last_line = clean_line ARGF.readline | |
ARGF.each do |this_line| | |
this_line = clean_line this_line | |
if this_line == last_line | |
counter = counter + 1 | |
else | |
spit_line counter, last_line | |
counter = 1 | |
last_line = this_line | |
end | |
end | |
# wrap up | |
spit_line counter, last_line |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment