Skip to content

Instantly share code, notes, and snippets.

@underdown
Created November 15, 2011 18:07
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save underdown/1367825 to your computer and use it in GitHub Desktop.
Save underdown/1367825 to your computer and use it in GitHub Desktop.
PHP Google Scraping Function
function findranking($domain, $keyword) {
usleep(400000*rand(0,16)); //set random interval between checks
$url="http://www.google.com/search?q=".urlencode($keyword)."&num=100";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER,1); // set to 0 to eliminate header info from response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // Returns response data instead of TRUE(1)
$header = array();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; // browsers keep this blank.
curl_setopt( $ch, CURLOPT_HTTPHEADER, $header );
curl_setopt( $ch, CURLOPT_USERAGENT, $useragents[rand(0, sizeof($useragents)-1)] );
$resp = curl_exec($ch); //execute post and get results
curl_close ($ch);
if(strpos($resp, "Location: http://sorry.google.com/sorry/") && strpos($resp, "302 Found"))
return -200;
$count=0;
$bp=0;
while($bp=strpos($resp, "<h3 class=", $bp+1)) {
$count++;
$end=strpos($resp, "</h3>", $bp);
$link= substr($resp, $bp, $end-$bp+5);
$bp++;
if(stripos($link, ">Local business results for") && stripos($link, "href=\"http://maps.google.com/")){
$count--;
}
// match all secure and www non-www
if(stripos($link, "http://".$domain) || stripos($link, "https://".$domain) || stripos($link, "http://www.".$domain) || stripos($link, "https://www.".$domain)){
$rank=$count;
}
}
if (!$rank) $rank=101;
return $rank;
}
@rafauel
Copy link

rafauel commented May 10, 2022

Estou tendo alguns erros na curl_setopt( $ch, CURLOPT_USERAGENT, $useragents[rand(0, sizeof($useragents)-1)] );
E com o while($bp=strpos($resp, "<h3 class=", $bp+1)) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment