Skip to content

Instantly share code, notes, and snippets.

@23inhouse
Forked from phoozle/follow.rb
Created August 29, 2012 02:32
Show Gist options
  • Save 23inhouse/3506228 to your computer and use it in GitHub Desktop.
Save 23inhouse/3506228 to your computer and use it in GitHub Desktop.
Find twitter and facebook accounts
#!/usr/bin/env ruby
require 'mechanize'
require 'csv'
require 'fileutils'
source_csv = ARGV[0] || "./wineries.csv"
max_searchs = ARGV[1] || 100
def google_site_search(site, query)
domain = site.split('.').first
return_hash = {:"#{domain}_title" => nil, :"#{domain}_link" => nil}
agent = Mechanize.new
agent.get("http://www.google.com.au/search?q=site:#{site}+#{query}")
if link = agent.page.link_with(:text => /\| Facebook/) || link = agent.page.link_with(:text => /on Twitter$/)
puts "Found #{query} in #{site}!"
return_hash[:"#{domain}_title"] = link.text # Title
return_hash[:"#{domain}_link"] = (link.uri.to_s.match(/url\?q=([^\&]+)/) || [])[1] # Link
else
puts "Didn't find #{query} in #{site} :-("
end
return return_hash
end
output_csv = CSV.open("./wineries_temp.csv", "w")
output_csv << ["Winery Name", "Twitter Title", "Twitter Link", "Facebook Title", "Facebook Link", "Facebook Incorrect", "Twitter Incorrect"]
begin
google_allowing_us = true
count = 0
CSV.foreach(source_csv, "r+") do |row|
winery_name = row.first
next if winery_name == 'Winery Name'
google_allowing_us = false if count >= max_searchs
count += 1 if row[1].nil? || row[3].nil?
winery = {:twitter_title => row[1], :twitter_link => row[2], :facebook_title => row[3], :facebook_link => row[4]}
begin
# If row doesn't have data and we aren't blocked by Google
winery.merge!(google_site_search('twitter.com', winery_name)) if row[1].nil? && google_allowing_us
winery.merge!(google_site_search('facebook.com', winery_name)) if row[3].nil? && google_allowing_us
rescue Mechanize::ResponseCodeError
google_allowing_us = false
puts "Google has blocked us :( \nSkipping Google Searches are finishing..."
end
output_csv << [winery_name, winery[:twitter_title], winery[:twitter_link], winery[:facebook_title], winery[:facebook_link], row[5], row[6]]
end
rescue Interrupt
puts "\n> Aborting, no changes have been made..."
output_csv.close
FileUtils.rm("./wineries_temp.csv")
exit(0)
end
output_csv.close
FileUtils.mv("./wineries_temp.csv", "./wineries.csv")
puts "> Output saved as wineries.csv"
@23inhouse
Copy link
Author

@phoozle

I did some minor refactoring

  1. made a generic google_site_search method
  2. added a max_searches param
  3. changed == nil to .nil?
  4. changed rescue to rescue Mechanize::ResponseCodeError
  5. changed the skip first row check to use next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment