Skip to content

Instantly share code, notes, and snippets.

@guychouk
Last active January 26, 2021 13:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save guychouk/4f906b62dcc9fad4a2d189206ca1940a to your computer and use it in GitHub Desktop.
Save guychouk/4f906b62dcc9fad4a2d189206ca1940a to your computer and use it in GitHub Desktop.
Example of using PHP's simple_html_dom package to scrape the EFA's website.
<?php
require 'simple_html_dom.php';
$baseUrl = "https://www.europeanfilmacademy.org/Presentation.presentation.0.html?&no_cache=1&uid=";
for ($i = 1866; $i <= 4500; $i++) {
$html = new simple_html_dom();
try {
$html->load_file($baseUrl . $i);
} catch (Exception $ex) {
echo "Failed on uid {$i}";
continue;
}
$details = $html->find('div[class=tx-efapresentation-pi1]');
if (empty($details)) {
echo "Failed on uid {$i}";
continue;
}
$name = $details[0]->find('h1');
if (empty($name)) {
echo "No name on uid {$i}";
continue;
}
$nameTxt = $name[0]->text();
$email = $details[0]->find('label[plaintext^=e-mail:]');
if (empty($email)) {
echo "No email on uid {$i}";
continue;
}
$emailTxt = $email[0]->parent()->next_sibling()->first_child()->text();
$country = $details[0]->find('label[plaintext^=country for voting:]');
if (empty($country)) {
echo "No country on uid {$i}";
continue;
}
$countryTxt = $country[0]->parent()->next_sibling()->first_child()->text();
$result = trim($nameTxt) . "|" . trim($emailTxt) . "|" . trim($countryTxt) . "\n";
file_put_contents('results.txt', $result, FILE_APPEND);
unset($html);
unset($details);
echo "Wrote {$result} to file\n";
sleep(5);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment