Last active
August 29, 2015 13:56
-
-
Save szepeviktor/9010568 to your computer and use it in GitHub Desktop.
bad robot catcher - "nofollow" and "disallow" trap
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# put this in directory called /ajax.googleapis.com/ | |
<IfModule mod_rewrite.c> | |
RewriteEngine On | |
RewriteRule ^ index.php [L] | |
</IfModule> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
// put this in directory called /ajax.googleapis.com/ | |
require('../disallow/index.php'); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
// put this in directory called /allow/ | |
header( 'X-Robots-Tag: noindex', true ); | |
?><!DOCTYPE html> | |
<html> | |
<head> | |
<meta charset="utf-8"> | |
<title></title> | |
<meta http-equiv="refresh" content="1;url=/"> | |
<meta name="robots" content="nofollow"> | |
<style type="text/css">body{background:white} a{color:white;text-decoration:none}</style> | |
</head> | |
<body> | |
<a href="../disallow/" title="all links are disallowed by the robots meta tag">-</a> | |
</body> | |
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
// put this in directory called /allow/ | |
header( 'X-Robots-Tag: noindex', true ); | |
?><!DOCTYPE html> | |
<html> | |
<head> | |
<meta charset="utf-8"> | |
<title></title> | |
<meta http-equiv="refresh" content="1;url=/"> | |
<style type="text/css">body{background:white} a{color:white;text-decoration:none}</style> | |
</head> | |
<body> | |
<a href="meta-nofollow.php" title="you can follow this link">+</a> | |
<a rel="nofollow" href="../disallow/" title="disallowed by rel=nofollow">-</a> | |
<a href="../disallow/" title="disallowed in robots.txt">-</a> | |
<a href="//ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js" title="this is not a relative link">-</a> | |
</body> | |
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
/* put this in directory called /disallow/ */ | |
// fail2ban got you! | |
$rt_ban_count = 6; | |
for ($i = 1; $i <= $rt_ban_count; $i++) { | |
error_log('File does not exist: robot_nofollow_trap'); | |
} | |
die; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### this "system" tests the followings: | |
# | |
# Disallow: /disallow/ (in robots.txt) | |
# <meta name="robots" content="nofollow"> | |
# <a rel="nofollow" href="nofollow.html">ban me</a> | |
# <a href="//ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js">this is not a relative URL</a> | |
# | |
# put a hidden link in your site: /allow/ | |
### | |
User-agent: * | |
Crawl-delay: 10 | |
Disallow: /disallow/ | |
Allow: /allow/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
Class! | |
is_robots() | |
add_filter('robots_txt', $this->add_disallow, 2, 1); | |
is_admin() | |
trap type: fail2ban/.htaccess/nginx.conf/CloudFlare API/call itsec | |
$rt_fail2ban_count = 6 | |
$rt_allow_url = 'allow' ?randomize | |
$rt_allow_meta_url = 'allow-meta' ?randomize | |
$rt_disallow = 'disallow' ?randomize | |
$rt_relative_url = '//'. 'ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js' | |
$rt_crawl_delay_on Y/N | |
$rt_crawl_delay = 10 | |
-> options-general.php fieldset | |
!is_admin() | |
add_rewrite_rule('^allow/?$','index.php?sec_nofollow=allow','bottom'); | |
add_rewrite_rule('^allow-meta/?$','index.php?sec_nofollow=allow_meta','bottom'); | |
add_rewrite_rule('^/ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js$','index.php?sec_nofollow=rel_url','bottom'); | |
add_rewrite_rule('^disallow/?$','index.php?sec_nofollow=disallow','bottom'); | |
// different traps: rel nofollow, robots meta, robots.txt, realtive protocol | |
add_rewrite_tag('sec_nofollow', '([a-z_]+)'); | |
add_disallow() | |
//'User-agent: *'. PHPEOL . | |
'Crawl-delay: 10'.'Disallow: /disallow/'. PHPEOL .'Allow: /allow/' | |
trap/fail2ban | |
for ($i = 1; $i <= $rt_ban_count; $i++) { | |
error_log('File does not exist: robot_nofollow_trap'); | |
} | |
die; | |
which action to hook for queries? | |
inject allow link in footer? inline display:none? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment