Skip to content

Instantly share code, notes, and snippets.

@sarchertech
Created April 17, 2011 03:21
Show Gist options
  • Save sarchertech/923721 to your computer and use it in GitHub Desktop.
Save sarchertech/923721 to your computer and use it in GitHub Desktop.
Quick script I wrote to find stores with the same name, that may be franchises.
require 'yaml'
class Store
attr_accessor :name, :address, :city, :state, :zip, :phone_number
end
def list_of_stores
files = Dir.glob('*.yml')
stores = []
files.each do |file|
File.open(file, 'r') {|f| stores += YAML.load(f)}
end
return stores
end
def delete_duplicates(stores)
seen = []
marker = []
counter = 0
stores.each do |store|
attr_array = [store.name[0..5], store.zip, store.phone_number, store.address[0..3]]
if seen.include?(attr_array)
marker << store
counter += 1
else
seen << attr_array
end
end
marker.each {|m| stores.delete(m)}
return counter
end
def print_multi_stores(stores)
seen = {}
stores.each do |store|
if seen.has_key?(store.name)
seen[store.name][0] += 1
else
seen[store.name] = [1, store.state]
end
end
seen = seen.sort_by {|k,v| v[0]}
seen.reverse!
puts ""
puts "multi stores"
puts "-------------"
#sorting converts hash to array of arrays
seen.each do |k,v|
num, state = v
puts num.to_s + "--" + state + "--" + k if num > 1
end
puts "-------------"
puts ""
end
stores = list_of_stores
stores.sort_by! {|s| s.zip}
puts stores.length
puts "deleted " + delete_duplicates(stores).to_s + " duplicate stores"
print_multi_stores(stores)
@JohnFord
Copy link

You're pinting the state with the store-count but if you have a "franchise" in multiple states, you're only going to see the first state encountered. Is that what you meant to do?

@sarchertech
Copy link
Author

sarchertech commented Apr 17, 2011 via email

@JohnFord
Copy link

I'm not sure; I guess it just comes down to what you need. I was just messing around with a different way of building up that hash and noticed it. I'm still playing with it. :-)

@JohnFord
Copy link

So, here are my mods: https://gist.github.com/923754

At first, I was focused on the block at line 45 in your code. Often in ruby, you can eliminate that whole, "if the key doesn't exist, initialize it, otherwise do something else" idiom by telling ruby, ahead of time, what to do whenever it encounters a key it hasn't seen. In this case, I'm passing in a block that I want it to run whenever it encounters a new key. That block, in turn, creates yet another hash that will initialize any new key's value to 0. So calling seen['New Store']['GA'] += 1 will automagically create something like:
{ 'New Store' => { 'GA' => 1} } without us explicitly initializing either hash.

Dave Brady actually posted a screencast with more detail on this just the other day: http://www.heartmindcode.com/blog/2011/04/creating-ruby-hashes/ (There's also a follow-up with JEG2 on Dave's site that's worth watching.)

It ended up being a bit messier than I originally intended since I decided to keep a separate count for each state, as well.

@sarchertech
Copy link
Author

sarchertech commented Apr 17, 2011 via email

@coty
Copy link

coty commented Apr 20, 2011

I think you could use 1.9's uniq! with a block to simplify your delete_duplicates method: https://gist.github.com/932961

@sarchertech
Copy link
Author

sarchertech commented Apr 21, 2011 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment