Recognize search engines and spammers using
require 'net/http'
require 'xmlsimple'
url = ""
xml_data = Net::HTTP.get_response(URI.parse(url)).body
data = XmlSimple.xml_in(xml_data)
agents = data['user-agent'].select{|agent| type = agent["Type"].first; type.include?("R") || type.include?("S")}
agent_names = agents.collect {|agent| agent["String"].first}
rceee commented Aug 13, 2013

Really outstanding gist! Thanks for this.

Xmlsimple is a bit outdated now; do you know of anything more maintained that could do the same job? Would Nokogiri work well for this as well?

jmwelch commented Aug 7, 2015

In regards to your (very old...) question, here's what I ended up using!

url = ""
xml_data = Net::HTTP.get_response(URI.parse(url)).body
xml_doc = Nokogiri::XML(xml_data)
bots = xml_doc.xpath('.//user-agent').collect{|u| u if u.xpath('.//Type').text.include?("R") || u.xpath('.//Type').text.include?("S")}
bots_list = bots.reject(&:blank?).reject{|b| b.xpath('.//String').blank?}
agent_names = bots_list.collect{|b| b.xpath('.//String').first.text}

