Skip to content

Instantly share code, notes, and snippets.

@dpritchett
Created June 9, 2011 15:04
Show Gist options
  • Save dpritchett/1016907 to your computer and use it in GitHub Desktop.
Save dpritchett/1016907 to your computer and use it in GitHub Desktop.
Deduplicate a sorted CSV file, adding a rowcount column
def clean_line some_text
# Takes a comma delimited string, returns an array of fields with leading
# and trailing whitespace stripped out.
some_text.gsub(/\"|\n/, "").split(',').each { |x| x.strip! }
end
def spit_line counter, some_line
# Takes a counter and an array of strings, prints them in counter,strings
# CSV format.
print "#{counter}"
some_line.each { |x| print ",#{x}"}
print "\n"
STDOUT.flush
end
# init
counter = 1
last_line = clean_line ARGF.readline
ARGF.each do |this_line|
this_line = clean_line this_line
if this_line == last_line
counter = counter + 1
else
spit_line counter, last_line
counter = 1
last_line = this_line
end
end
# wrap up
spit_line counter, last_line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment