Skip to content

Instantly share code, notes, and snippets.

@Sjors
Created August 3, 2011 00:08
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save Sjors/1121578 to your computer and use it in GitHub Desktop.
Save Sjors/1121578 to your computer and use it in GitHub Desktop.
Recognize search engines and spammers using user-agents.org
require 'net/http'
require 'xmlsimple'
url = "http://www.user-agents.org/allagents.xml"
xml_data = Net::HTTP.get_response(URI.parse(url)).body
data = XmlSimple.xml_in(xml_data)
agents = data['user-agent'].select{|agent| type = agent["Type"].first; type.include?("R") || type.include?("S")}
agent_names = agents.collect {|agent| agent["String"].first}
@rceee
Copy link

rceee commented Aug 13, 2013

Really outstanding gist! Thanks for this.

Xmlsimple is a bit outdated now; do you know of anything more maintained that could do the same job? Would Nokogiri work well for this as well?

@jmwelch
Copy link

jmwelch commented Aug 7, 2015

In regards to your (very old...) question, here's what I ended up using!

url = "http://www.user-agents.org/allagents.xml"
xml_data = Net::HTTP.get_response(URI.parse(url)).body
xml_doc = Nokogiri::XML(xml_data)
bots = xml_doc.xpath('.//user-agent').collect{|u| u if u.xpath('.//Type').text.include?("R") || u.xpath('.//Type').text.include?("S")}
bots_list = bots.reject(&:blank?).reject{|b| b.xpath('.//String').blank?}
agent_names = bots_list.collect{|b| b.xpath('.//String').first.text}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment