Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Recognize search engines and spammers using
require 'net/http'
require 'xmlsimple'
url = ""
xml_data = Net::HTTP.get_response(URI.parse(url)).body
data = XmlSimple.xml_in(xml_data)
agents = data['user-agent'].select{|agent| type = agent["Type"].first; type.include?("R") || type.include?("S")}
agent_names = agents.collect {|agent| agent["String"].first}
Copy link

rceee commented Aug 13, 2013

Really outstanding gist! Thanks for this.

Xmlsimple is a bit outdated now; do you know of anything more maintained that could do the same job? Would Nokogiri work well for this as well?

Copy link

jmwelch commented Aug 7, 2015

In regards to your (very old...) question, here's what I ended up using!

url = ""
xml_data = Net::HTTP.get_response(URI.parse(url)).body
xml_doc = Nokogiri::XML(xml_data)
bots = xml_doc.xpath('.//user-agent').collect{|u| u if u.xpath('.//Type').text.include?("R") || u.xpath('.//Type').text.include?("S")}
bots_list = bots.reject(&:blank?).reject{|b| b.xpath('.//String').blank?}
agent_names = bots_list.collect{|b| b.xpath('.//String').first.text}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment