Skip to content

Instantly share code, notes, and snippets.

@szepeviktor
Last active August 29, 2015 13:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save szepeviktor/9010568 to your computer and use it in GitHub Desktop.
Save szepeviktor/9010568 to your computer and use it in GitHub Desktop.
bad robot catcher - "nofollow" and "disallow" trap
# put this in directory called /ajax.googleapis.com/
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^ index.php [L]
</IfModule>
<?php
// put this in directory called /ajax.googleapis.com/
require('../disallow/index.php');
<?php
// put this in directory called /allow/
header( 'X-Robots-Tag: noindex', true );
?><!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
<meta http-equiv="refresh" content="1;url=/">
<meta name="robots" content="nofollow">
<style type="text/css">body{background:white} a{color:white;text-decoration:none}</style>
</head>
<body>
<a href="../disallow/" title="all links are disallowed by the robots meta tag">-</a>
</body>
</html>
<?php
// put this in directory called /allow/
header( 'X-Robots-Tag: noindex', true );
?><!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
<meta http-equiv="refresh" content="1;url=/">
<style type="text/css">body{background:white} a{color:white;text-decoration:none}</style>
</head>
<body>
<a href="meta-nofollow.php" title="you can follow this link">+</a>
<a rel="nofollow" href="../disallow/" title="disallowed by rel=nofollow">-</a>
<a href="../disallow/" title="disallowed in robots.txt">-</a>
<a href="//ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js" title="this is not a relative link">-</a>
</body>
</html>
<?php
/* put this in directory called /disallow/ */
// fail2ban got you!
$rt_ban_count = 6;
for ($i = 1; $i <= $rt_ban_count; $i++) {
error_log('File does not exist: robot_nofollow_trap');
}
die;
### this "system" tests the followings:
#
# Disallow: /disallow/ (in robots.txt)
# <meta name="robots" content="nofollow">
# <a rel="nofollow" href="nofollow.html">ban me</a>
# <a href="//ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js">this is not a relative URL</a>
#
# put a hidden link in your site: /allow/
###
User-agent: *
Crawl-delay: 10
Disallow: /disallow/
Allow: /allow/
<?php
Class!
is_robots()
add_filter('robots_txt', $this->add_disallow, 2, 1);
is_admin()
trap type: fail2ban/.htaccess/nginx.conf/CloudFlare API/call itsec
$rt_fail2ban_count = 6
$rt_allow_url = 'allow' ?randomize
$rt_allow_meta_url = 'allow-meta' ?randomize
$rt_disallow = 'disallow' ?randomize
$rt_relative_url = '//'. 'ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js'
$rt_crawl_delay_on Y/N
$rt_crawl_delay = 10
-> options-general.php fieldset
!is_admin()
add_rewrite_rule('^allow/?$','index.php?sec_nofollow=allow','bottom');
add_rewrite_rule('^allow-meta/?$','index.php?sec_nofollow=allow_meta','bottom');
add_rewrite_rule('^/ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js$','index.php?sec_nofollow=rel_url','bottom');
add_rewrite_rule('^disallow/?$','index.php?sec_nofollow=disallow','bottom');
// different traps: rel nofollow, robots meta, robots.txt, realtive protocol
add_rewrite_tag('sec_nofollow', '([a-z_]+)');
add_disallow()
//'User-agent: *'. PHPEOL .
'Crawl-delay: 10'.'Disallow: /disallow/'. PHPEOL .'Allow: /allow/'
trap/fail2ban
for ($i = 1; $i <= $rt_ban_count; $i++) {
error_log('File does not exist: robot_nofollow_trap');
}
die;
which action to hook for queries?
inject allow link in footer? inline display:none?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment