Skip to content

Instantly share code, notes, and snippets.

@cherenkov
Created June 2, 2012 16:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cherenkov/2859070 to your computer and use it in GitHub Desktop.
Save cherenkov/2859070 to your computer and use it in GitHub Desktop.
[PHP HTML スクレイピング] あるWEBページの複数データ(例htt.. - 人力検索はてな http://q.hatena.ne.jp/1338651304
1 1(一、いち、ひと、ひとつ)は、最小の正の整数であ...
2 2(二、に、じ、ふた、ふたつ)は、自然数、また整数...
3 3(三、さん、み、みっつ)は、自然数または整数にお...
4 4(四、よん、し、す、よつ、よ)は、自然数および整...
<?php
require_once 'simplehtmldom/simple_html_dom.php';
header('Content-Type:text/html; charset=UTF-8');
function pr($var) {
echo '<pre>'; print_r($var); echo '</pre>';
}
function strAbbr($str, $n) {
$result = mb_substr($str, 0, $n, 'UTF-8');
if (mb_strlen($str, 'UTF-8') > $n) {
$result = $result . '...';
}
return $result;
}
$baseurl = 'http://ja.wikipedia.org/wiki/';
$csv = '';
for ($i=1; $i<5; $i++) {
$url = $baseurl . $i;
$html = file_get_html($url);
$title = $html->find('#firstHeading', 0)->plaintext;
$body = $html->find('#mw-content-text p', 0)->plaintext;
$body = strAbbr($body, 25);
$csv .= "{$title},{$body}\n";
}
echo $csv;
date_default_timezone_set("Japan");
$mytime = date('YmdHi');
$filename = "./log{$mytime}.csv";
file_put_contents($filename, $csv, FILE_APPEND);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment