Skip to content

Instantly share code, notes, and snippets.

@corny
Created August 29, 2012 16:36
Show Gist options
  • Save corny/3515360 to your computer and use it in GitHub Desktop.
Save corny/3515360 to your computer and use it in GitHub Desktop.
Shoulda macro for testing the robots.txt
#
# Shoulda Macros for testing the robots.txt
# =========================================
#
# source: https://gist.github.com/3515360
# inspiration: https://github.com/fizx/robots/blob/master/lib/robots.rb
# license: WTFPL http://sam.zoy.org/wtfpl/
#
# Example usage:
#
# should allow_index('/news')
# should_not allow_index('/news').user_agent('ia_archiver')
#
class ActiveSupport::TestCase
module RobotsMatcher
DEFAULT_AGENT = '*'
LINE_REGEXP = /^\s*(User-agent|Allow|Disallow):(.+)/i
def allow_index(path)
InRobotsMatcher.new(path)
end
class InRobotsMatcher
def initialize(path)
@path = path
@agent = DEFAULT_AGENT
end
def user_agent(agent)
@agent = agent
self
end
def matches?(subject)
if @agent != DEFAULT_AGENT
# spefific user agent first
result = matches_agent?(@agent)
# any user agent
result = matches_agent?(DEFAULT_AGENT) if result.nil?
else
result = matches_agent?(@agent)
end
# allow if nothing matched
result = true if result.nil?
result
end
def self.robots_txt
# cache on first access
@robots_txt ||= File.read(Rails.root.join("public", "robots.txt"))
end
# return values:
# true: allowed
# false: disallowed
# nil: no match
def matches_agent?(exptected_agent)
current_agent = nil
self.class.robots_txt.each_line do |line|
match = line.match(LINE_REGEXP) or next
key = match[1].downcase
val = match[2].strip
case key
when 'user-agent'
current_agent = val
when 'allow', 'disallow'
next if current_agent.downcase != exptected_agent.downcase
pattern = Regexp.escape(val).gsub(Regexp.escape("*"), ".*")
regexp = Regexp.compile("^#{pattern}")
return key == 'allow' if @path =~ regexp
end
end
true
end
def failure_message
"Expected user agent '#{@agent}' to be allowed to index '#{@path}'"
end
def negative_failure_message
"Expected user agent '#{@agent}' to be disallowed to index '#{@path}'"
end
def description
"allow '#{@agent}' to index '#{@path}'"
end
end
end
extend RobotsMatcher
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment