Skip to content

Instantly share code, notes, and snippets.

@pietrop
Created August 13, 2015 17:00
Show Gist options
  • Save pietrop/006fb642e6a52903cfae to your computer and use it in GitHub Desktop.
Save pietrop/006fb642e6a52903cfae to your computer and use it in GitHub Desktop.
=begin
@date 23 July 2015
@author: pietro.passarelli@gmail.com
Given a CSV with two coulmns that have duplicate fields,
this script compares the two lists and returns a list of the duplicates,
and prints it out as text file.
run it as `ruby csv_2_list.rb name_of_the_csv_file_.csv`
=end
require 'csv'
filename = ARGV.first
#csv_file = CSV.read(filename)
# 3 arrays, one for the first list, one for the second and one for the duplicates
list_one = []
list_two =[]
duplicates =[]
# iterate through the csv elements to put colum one in list_one and column two of the csv in list_two array
CSV.foreach(filename) do |r|
list_one << r[0]
if r[1] != nil
list_two << r[1]
end
end
puts "### Identifiying Duplicates ###"
# using built in method & on two arrays we can create a new array that only contains the duplciates
# this mehtod is highly optimised and can handle very long lists.
duplicates = list_one & list_two
# outputing duplicates to screen
puts duplicates.size
# Writing duplicates to file, one per line
File.open("duplicates.txt", 'w') do |file|
# looping through duplicates array
duplicates.each do |d|
# writing duplicate item to file
file.write(d)
# adding a new line before the next one
file.write("\n")
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment