Skip to content

Instantly share code, notes, and snippets.

@hubgit
Last active December 3, 2015 09:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hubgit/309962aae54abeb04dfd to your computer and use it in GitHub Desktop.
Save hubgit/309962aae54abeb04dfd to your computer and use it in GitHub Desktop.
Extract attribute values from JATS wiki tables
<?php
$doc = new DOMDocument;
$doc->loadHTMLFile('http://jatswiki.org/wiki/Attribute_values');
$data = [];
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//table') as $table) {
$type = $xpath->evaluate('string(preceding-sibling::h2[1])', $table);
$type = preg_replace('/ values$/', '', $type);
$keys = [];
foreach ($xpath->query('tr[1]/th', $table) as $heading) {
$keys[] = strtolower(trim($heading->textContent));
}
$items = [];
foreach ($xpath->query('tr[position()>1]', $table) as $row) {
$item = [];
foreach ($xpath->query('td', $row) as $index => $cell) {
$item[$keys[$index]] = trim($cell->textContent);
}
$items[] = $item;
}
$data[$type] = $items;
}
print json_encode($data, JSON_PRETTY_PRINT);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment