Skip to content

Instantly share code, notes, and snippets.

@hakre
Created July 14, 2015 21:42
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hakre/c36a66708259f54e564a to your computer and use it in GitHub Desktop.
Save hakre/c36a66708259f54e564a to your computer and use it in GitHub Desktop.
Recursive Method Calls with SimpleXML and DOMDocument Context
<?php
/**
* Class ArrayXmlElement
*
* Use an array definition stored as XML to convert a tree structure from another DOMDocument
*
* @author hakre <http://hakre.wordpress.com>
*/
class XmlArrayElement extends SimpleXMLElement
{
/**
* setter for named fields on SimpleXMLElements
*
* @link https://hakre.wordpress.com/2015/03/27/the-simplexmlelement-magic-wonder-world-in-php/
*
* @param string $name of the field
* @param mixed $value to store
*/
public function setData($name, $value)
{
$element = dom_import_simplexml($this);
$element->data[$name] = $value;
$element->circref = $element;
}
/**
* getter for named fields on SimpleXMLElements
*
* @link https://hakre.wordpress.com/2015/03/27/the-simplexmlelement-magic-wonder-world-in-php/
*
* @param string $name of the field
*
* @return null|mixed retrieved value or null if field is not set
*/
public function getData($name)
{
$element = dom_import_simplexml($this);
if (!isset($element->data[$name])) {
return null;
}
return $element->data[$name];
}
/**
* assign a document that is processed
*
* this also registers all xpath namespaces
*
* @param DOMDocument $doc to process
*/
public function assignDocument(DOMDocument $doc)
{
/** @var self $root */
$root = $this->xpath('/*')[0];
$xpath = new DOMXPath($doc);
foreach ($root->xml->namespace as $namespace) {
$xpath->registerNamespace($namespace['prefix'], $namespace['uri']);
}
$root->setData('xpath', $xpath);
}
public function toArray(DOMNode $context = null)
{
/** @var self $root */
$root = $this->xpath('/*')[0];
if ($root == $this) {
return $this->array[0]->toArray();
}
/** @var DOMXPath $xpath */
$xpath = $root->getData('xpath');
$return = [];
foreach ($this as $child) {
$result = $evaluated = $xpath->evaluate($child['expr'], $context);
if ($evaluated instanceof DOMNodeList) {
if (1 === $evaluated->length) {
$result = $child->toArray($evaluated->item(0));
} else {
$result = [];
foreach ($evaluated as $node) {
$result[] = $child->toArray($node);
}
}
}
$name = (string)($child['alias'] ?: $child->getName());
$return[$name] = $result;
}
// single elements can be casted to string
if ((1 === count($this)) and ('string' === (string)$child['cast'])) {
$return = reset($return);
}
return $return;
}
}
@Nicero
Copy link

Nicero commented Jul 31, 2015

Thanks to your excellent script I can read a complex XML and extract only the strings as wished. I noticed that sometimes the XML contains tables as the following. In these cases, with your script, can I somehow read both the string "Lorem ipsum..." and the HTML table keeping the tags?

<corpo>Lorem ipsum...
<h:table border="1" cellspacing="0" cellpadding="0" summary="-" h:style="width: 40em;" xmlns:h="http://www.w3.org/HTML/1998/html4"><h:tr><h:th>UBI</h:th><h:th>CAPITOLO</h:th><h:th>2015</h:th><h:th>2016</h:th><h:th>TOTALE</h:th></h:tr><h:tr><h:td>5.2.1.5047</h:td><h:td align="right">5397</h:td><h:td align="right">100.000</h:td><h:td align="right">100.000</h:td><h:td align="right">200.000</h:td></h:tr><h:tr><h:td>5.2.1.5048</h:td><h:td align="right">5340</h:td><h:td align="right">2.500.000</h:td><h:td align="right">2.500.000</h:td><h:td align="right">5.000.000</h:td></h:tr><h:tr><h:td>5.2.1.5048</h:td><h:td align="right">5390</h:td><h:td align="right">100.000</h:td><h:td align="right">100.000</h:td><h:td align="right">200.000</h:td></h:tr><h:tr><h:td>5.2.1.5049</h:td><h:td align="right">5426</h:td><h:td align="right">1.000.000</h:td><h:td align="right">1.000.000</h:td><h:td align="right">2.000.000</h:td></h:tr><h:tr><h:td>5.2.1.5049</h:td><h:td align="right">5431</h:td><h:td align="right">200.000</h:td><h:td align="right">200.000</h:td><h:td align="right">400.000</h:td></h:tr><h:tr><h:td>5.2.1.5049</h:td><h:td align="right">5434</h:td><h:td align="right">461.000</h:td><h:td align="right">461.000</h:td><h:td align="right">922.000</h:td></h:tr><h:tr><h:td>5.2.1.5050</h:td><h:td align="right">5282</h:td><h:td align="right">700.000</h:td><h:td align="right">700.000</h:td><h:td align="right">1.400.000</h:td></h:tr><h:tr><h:td>5.2.1.5051</h:td><h:td align="right">5398</h:td><h:td align="right">60.000</h:td><h:td align="right">60.000</h:td><h:td align="right">120.000</h:td></h:tr><h:tr><h:td colspan="2" align="right">TOTALE</h:td><h:td align="right">5.121.000</h:td><h:td align="right">5.121.000</h:td><h:td align="right">10.242.000</h:td></h:tr></h:table>
</corpo>

@hakre
Copy link
Author

hakre commented Aug 16, 2015

In your example the string "Lorem ipsum..." is the first text-child-node of the corpo element. It's no problem to retrieve it. When you're looking for the XML of the table, this would require returning the inner-XML which needs a modification. Xpath on it's own does not have a method to retrieve it, but the class could be extended to deal with it, perhaps similar as with the cast parameter.

@Nicero
Copy link

Nicero commented Jul 4, 2016

Hi again Hakre! I'm still using your handy script. Would you kindly explain me how can I read the ID attribute of commagiven for example the following XML?

<articolato>
    <articolo id="art1">
        <num>Art.   1</num>
        <comma id="art1-com1">
            <corpo>Blah blah blah</corpo>
        </comma>
    </articolo>
</articolato>

Should'nt be as easy as this:

<comma expr="//comma/@id"/> The expected result should be art1-com1

I can get the value of num for example like this

<num expr="normalize-space(string(a:num))"/>

but I don't know how to get the id attribute.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment