Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Recursive Method Calls with SimpleXML and DOMDocument Context
<?php
/**
* Class ArrayXmlElement
*
* Use an array definition stored as XML to convert a tree structure from another DOMDocument
*
* @author hakre <http://hakre.wordpress.com>
*/
class XmlArrayElement extends SimpleXMLElement
{
/**
* setter for named fields on SimpleXMLElements
*
* @link https://hakre.wordpress.com/2015/03/27/the-simplexmlelement-magic-wonder-world-in-php/
*
* @param string $name of the field
* @param mixed $value to store
*/
public function setData($name, $value)
{
$element = dom_import_simplexml($this);
$element->data[$name] = $value;
$element->circref = $element;
}
/**
* getter for named fields on SimpleXMLElements
*
* @link https://hakre.wordpress.com/2015/03/27/the-simplexmlelement-magic-wonder-world-in-php/
*
* @param string $name of the field
*
* @return null|mixed retrieved value or null if field is not set
*/
public function getData($name)
{
$element = dom_import_simplexml($this);
if (!isset($element->data[$name])) {
return null;
}
return $element->data[$name];
}
/**
* assign a document that is processed
*
* this also registers all xpath namespaces
*
* @param DOMDocument $doc to process
*/
public function assignDocument(DOMDocument $doc)
{
/** @var self $root */
$root = $this->xpath('/*')[0];
$xpath = new DOMXPath($doc);
foreach ($root->xml->namespace as $namespace) {
$xpath->registerNamespace($namespace['prefix'], $namespace['uri']);
}
$root->setData('xpath', $xpath);
}
public function toArray(DOMNode $context = null)
{
/** @var self $root */
$root = $this->xpath('/*')[0];
if ($root == $this) {
return $this->array[0]->toArray();
}
/** @var DOMXPath $xpath */
$xpath = $root->getData('xpath');
$return = [];
foreach ($this as $child) {
$result = $evaluated = $xpath->evaluate($child['expr'], $context);
if ($evaluated instanceof DOMNodeList) {
if (1 === $evaluated->length) {
$result = $child->toArray($evaluated->item(0));
} else {
$result = [];
foreach ($evaluated as $node) {
$result[] = $child->toArray($node);
}
}
}
$name = (string)($child['alias'] ?: $child->getName());
$return[$name] = $result;
}
// single elements can be casted to string
if ((1 === count($this)) and ('string' === (string)$child['cast'])) {
$return = reset($return);
}
return $return;
}
}
@Nicero

This comment has been minimized.

Copy link

Nicero commented Jul 31, 2015

Thanks to your excellent script I can read a complex XML and extract only the strings as wished. I noticed that sometimes the XML contains tables as the following. In these cases, with your script, can I somehow read both the string "Lorem ipsum..." and the HTML table keeping the tags?

<corpo>Lorem ipsum...
<h:table border="1" cellspacing="0" cellpadding="0" summary="-" h:style="width: 40em;" xmlns:h="http://www.w3.org/HTML/1998/html4"><h:tr><h:th>UBI</h:th><h:th>CAPITOLO</h:th><h:th>2015</h:th><h:th>2016</h:th><h:th>TOTALE</h:th></h:tr><h:tr><h:td>5.2.1.5047</h:td><h:td align="right">5397</h:td><h:td align="right">100.000</h:td><h:td align="right">100.000</h:td><h:td align="right">200.000</h:td></h:tr><h:tr><h:td>5.2.1.5048</h:td><h:td align="right">5340</h:td><h:td align="right">2.500.000</h:td><h:td align="right">2.500.000</h:td><h:td align="right">5.000.000</h:td></h:tr><h:tr><h:td>5.2.1.5048</h:td><h:td align="right">5390</h:td><h:td align="right">100.000</h:td><h:td align="right">100.000</h:td><h:td align="right">200.000</h:td></h:tr><h:tr><h:td>5.2.1.5049</h:td><h:td align="right">5426</h:td><h:td align="right">1.000.000</h:td><h:td align="right">1.000.000</h:td><h:td align="right">2.000.000</h:td></h:tr><h:tr><h:td>5.2.1.5049</h:td><h:td align="right">5431</h:td><h:td align="right">200.000</h:td><h:td align="right">200.000</h:td><h:td align="right">400.000</h:td></h:tr><h:tr><h:td>5.2.1.5049</h:td><h:td align="right">5434</h:td><h:td align="right">461.000</h:td><h:td align="right">461.000</h:td><h:td align="right">922.000</h:td></h:tr><h:tr><h:td>5.2.1.5050</h:td><h:td align="right">5282</h:td><h:td align="right">700.000</h:td><h:td align="right">700.000</h:td><h:td align="right">1.400.000</h:td></h:tr><h:tr><h:td>5.2.1.5051</h:td><h:td align="right">5398</h:td><h:td align="right">60.000</h:td><h:td align="right">60.000</h:td><h:td align="right">120.000</h:td></h:tr><h:tr><h:td colspan="2" align="right">TOTALE</h:td><h:td align="right">5.121.000</h:td><h:td align="right">5.121.000</h:td><h:td align="right">10.242.000</h:td></h:tr></h:table>
</corpo>
@hakre

This comment has been minimized.

Copy link
Owner Author

hakre commented Aug 16, 2015

In your example the string "Lorem ipsum..." is the first text-child-node of the corpo element. It's no problem to retrieve it. When you're looking for the XML of the table, this would require returning the inner-XML which needs a modification. Xpath on it's own does not have a method to retrieve it, but the class could be extended to deal with it, perhaps similar as with the cast parameter.

@Nicero

This comment has been minimized.

Copy link

Nicero commented Jul 4, 2016

Hi again Hakre! I'm still using your handy script. Would you kindly explain me how can I read the ID attribute of commagiven for example the following XML?

<articolato>
    <articolo id="art1">
        <num>Art.   1</num>
        <comma id="art1-com1">
            <corpo>Blah blah blah</corpo>
        </comma>
    </articolo>
</articolato>

Should'nt be as easy as this:

<comma expr="//comma/@id"/> The expected result should be art1-com1

I can get the value of num for example like this

<num expr="normalize-space(string(a:num))"/>

but I don't know how to get the id attribute.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.