Skip to content

Instantly share code, notes, and snippets.

@geerlingguy
Created May 7, 2014 18:48
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save geerlingguy/a438b41a9a8f988ee106 to your computer and use it in GitHub Desktop.
Save geerlingguy/a438b41a9a8f988ee106 to your computer and use it in GitHub Desktop.
Detect crawlers/bots/spiders in PHP (simple and fast)
<?php
/**
* Check if the given user agent string is one of a crawler, spider, or bot.
*
* @param string $user_agent
* A user agent string (e.g. Googlebot/2.1 (+http://www.google.com/bot.html))
*
* @return bool
* TRUE if the user agent is a bot, FALSE if not.
*/
function smart_ip_detect_crawler($user_agent) {
// User lowercase string for comparison.
$user_agent = strtolower($_SERVER['HTTP_USER_AGENT']);
// A list of some common words used only for bots and crawlers.
$bot_identifiers = array(
'bot',
'slurp',
'crawler',
'spider',
'curl',
'facebook',
'fetch',
);
// See if one of the identifiers is in the UA string.
foreach ($bot_identifiers as $identifier) {
if (strpos($user_agent, $identifier) !== FALSE) {
return TRUE;
}
}
return FALSE;
}
@FarrisFahad
Copy link

How much of a proof is this code? Because I heard that bot detection cannot be 100% proof.

@FinlayDaG33k
Copy link

@FarrisFahad the big problem with this code is that it does not verify the bot as legitimate.
People can just spoof the User-Agent.

@akamomer
Copy link

akamomer commented Mar 8, 2021

Current Facebook header:
User-Agent: cortex/1.0
X-FB-HTTP-Engine: Liger
X-FB-Client-IP: True
X-FB-Server-Cluster: True

@brampta
Copy link

brampta commented Oct 5, 2021

whats the point of passing $user_agent as an argument and after that overriding it with
$user_agent = strtolower($_SERVER['HTTP_USER_AGENT']);
I guess you dont need that argument.

@brampta
Copy link

brampta commented Oct 5, 2021

Also why name the function smart_ip_detect_crawler(). It does not have anything to do with IP. Maybe you could name it just smart_detect_crawler().

@geerlingguy
Copy link
Author

This script is not meant to be the be-all-and-end-all of bot detection ;)

Hopefully it's helpful if you're working on your own system, and as @FinlayDaG33k mentioned... it's trivial to bypass this. It was originally used just as a metric for how much traffic was coming to a certain page was bots like GoogleBot and Bing's bot.

@FinlayDaG33k
Copy link

whats the point of passing $user_agent as an argument and after that overriding it with $user_agent = strtolower($_SERVER['HTTP_USER_AGENT']); I guess you dont need that argument.

I think he just forgot to clean that up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment