Skip to content

Instantly share code, notes, and snippets.

@romanodesouza
Created November 6, 2014 12:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save romanodesouza/dd048cb754ce2dc6a599 to your computer and use it in GitHub Desktop.
Save romanodesouza/dd048cb754ce2dc6a599 to your computer and use it in GitHub Desktop.
<?php
define('BASE_URL', 'http://chimera.labs.oreilly.com/books/1230000000545/');
function getPage($url)
{
$full_url = BASE_URL . $url;
echo "Crawleando página $full_url...", PHP_EOL;
$file = __DIR__ . '/ebook/'. $url;
file_put_contents($file, file_get_contents($full_url));
$page = @DOMDocument::loadHTMLFile($file);
getNextUrl($page);
}
function getNextUrl($page)
{
$xpath = new DOMXpath($page);
getImages($page);
$next_url = $xpath->query('//a[@accesskey="n"]');
foreach ($next_url as $x) {
$next_url = $x->getAttribute('href');
break;
}
getPage($next_url);
}
function getImages($page)
{
$xpath = new DOMXpath($page);
$images = $xpath->query('//img');
foreach ($images as $image) {
$image_source = $image->getAttribute('src');
$basename_file = basename($image_source);
echo "Baixando a imagem $image_source", PHP_EOL;
file_put_contents(__DIR__ . '/ebook/images/' . $basename_file, file_get_contents($image_source));
}
}
getPage('/pr01.html');
@romanodesouza
Copy link
Author

This was made in 15min. Don't take it seriously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment