Skip to content

Instantly share code, notes, and snippets.

@worka
Last active December 10, 2019 00:11
Show Gist options
  • Save worka/c231a0e0ea7c6826f34cbd361be8acc1 to your computer and use it in GitHub Desktop.
Save worka/c231a0e0ea7c6826f34cbd361be8acc1 to your computer and use it in GitHub Desktop.

Detect google and yandex search bots (crawl, spider)

function detectSearchBot($ip, $agent, &$hostname)
{
    $hostname = $ip;

    // check HTTP_USER_AGENT what not to touch gethostbyaddr in vain
    if (preg_match('/(?:google|yandex)bot/iu', $agent)) {
        // success - return host, fail - return ip or false
        $hostname = gethostbyaddr($ip);

        // https://support.google.com/webmasters/answer/80553
        if ($hostname !== false && $hostname != $ip) {
            // detect google and yandex search bots
            if (preg_match('/\.((?:google(?:bot)?|yandex)\.(?:com|ru))$/iu', $hostname)) {
                // success - return ip, fail - return hostname
                $ip = gethostbyname($hostname);

                if ($ip != $hostname) {
                    return true;
                }
            }
        }
    }

    return false;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment