Skip to content

Instantly share code, notes, and snippets.

@aarvay
Created July 11, 2011 12:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save aarvay/1075768 to your computer and use it in GitHub Desktop.
Save aarvay/1075768 to your computer and use it in GitHub Desktop.
A Naive HN Crawler to get news about Javascript
<?php
/**
* A Naive HN Crawler to get news about Javascript
* @author Vignesh Rajagopalan <vignesh@campuspry.com>
*
* Motivated by Vijay Anand (@vijayanands)
*/
require('phpQuery.php');
//$fp = fopen("../jsdata.txt", 'w');
$next = 'http://news.ycombinator.com/';
$key=0;
$key1 = 'javascript';
$key2 = 'js';
for($i=0;$i<5;$i++) {
$html = file_get_contents($next);
$doc = phpQuery::newDocument($html);
pq('tr:first')->remove();
pq('tr:empty')->remove();
pq('td:empty')->remove();
$rows = pq('table tr');
$details = array();
foreach($rows as $row) {
$title = strtolower(pq($row)->find('td.title')->text());
//echo $title;
if (strpos($title, 'javascript') !== false) {
$details[$key]["title"] = $title;
$details[$key]["url"] = pq($row)->find('td.title')->find('a')->attr('href');
$key++;
continue;
}
if($title == "more") {
$next = "http://news.ycombinator.com".pq($row)->find('a')->attr('href');
break;
}
}
print_r($details).'<br /><br />';
//fwrite($fp, json_encode($details)."<br /><br />");
}
?>
@aarvay
Copy link
Author

aarvay commented Jul 11, 2011

Crawls only the top 5 pages. Of course can be changed :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment