Skip to content

Instantly share code, notes, and snippets.

@ashaw
Created December 3, 2010 22:45
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ashaw/727670 to your computer and use it in GitHub Desktop.
Save ashaw/727670 to your computer and use it in GitHub Desktop.
module ProPublica
class GoogleComparer
BASE_URL = "https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q="
WEB_TITLE_DELIMITERS = /(-|\|).*$/
def initialize(facility_name)
@facility_name = facility_name
url_encoded_facility_name = CGI.escape(@facility_name)
@results = JSON.parse(RestClient.get(BASE_URL + url_encoded_facility_name))['responseData']['results']
end
def strip_goog(str)
str.gsub(/<\/?[^>]*>/, "").gsub(WEB_TITLE_DELIMITERS,'')
end
def penalize_uppercase(s1, s2)
expected_uppercase = s1.split(/ /).size
gotten_uppercase = s2.scan(/([A-Z])/).size
penalty = gotten_uppercase - expected_uppercase
end
def parse
@evaluated_results = {}
@results.each_with_index do |result,idx|
evaluated_result = Levenshtein.distance(@facility_name, strip_goog(result['title'].upcase)) +
penalize_uppercase(@facility_name, strip_goog(result['title']))
@evaluated_results[evaluated_result] = strip_goog(result['title'])
end
@evaluated_results
end
def winner_value
@evaluated_results.min[1].strip
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment