Skip to content

Instantly share code, notes, and snippets.

@pingyen
Created March 17, 2015 15:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pingyen/8fb7c6e7d3378b29f289 to your computer and use it in GitHub Desktop.
Save pingyen/8fb7c6e7d3378b29f289 to your computer and use it in GitHub Desktop.
Crawling PTT
<?php
require('phpQuery/phpQuery.php');
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'facebookexternalhit/1.1 (+https://www.facebook.com/externalhit_uatext.php)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$articles = array();
curl_setopt($ch, CURLOPT_URL, 'https://www.ptt.cc/bbs/Gossiping/index.html');
$doc = phpQuery::newDocument(curl_exec($ch));
$entries = $doc['.r-ent'];
foreach ($entries as $entry) {
$entry = pq($entry);
$like = $entry['.nrec span']->html();
if ($like === '爆' || intval($like) >= 50) {
$author = $entry['.author']->html();
if ($author === '-') {
continue;
}
$a = $entry['.title a'];
$articles[] = array(
'like' => $like,
'title' => $a->html(),
'link' => 'https://www.ptt.cc/' . $a->attr('href'),
'author' => $author,
'date' => $entry['.date']->html()
);
}
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment