Skip to content

Instantly share code, notes, and snippets.

@royduin royduin/crawl.php

Last active Sep 24, 2016
Embed
What would you like to do?
Search for public git repositories in the Alexa 1 million top ranked websites list, see: https://royduineveld.nl/hacking-public-git-repositories/
<?php
// Download the list from: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
$csv = file_get_contents('top-1m.csv');
$lines = explode(PHP_EOL,$csv);
$counter = 1;
foreach($lines as $line){
$url = explode(',',$line)[1];
echo $counter.' '.$url.PHP_EOL;
$gitUrl = 'http://'.$url.'/.git/config';
$content = file_get_contents($gitUrl);
if(strstr($content,'[core]') !== FALSE){
echo '- YEAH! '.$gitUrl.PHP_EOL;
$yeah[] = $gitUrl;
file_put_contents('found.csv',$gitUrl.PHP_EOL,FILE_APPEND);
}
$counter++;
}
if(isset($yeah)){
echo 'FOUND:'.PHP_EOL;
foreach($yeah as $url){
echo '- '.$url.PHP_EOL;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.