Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Parsing a huge XML with closure in PHP
<?php
// Open the XML
$handle = fopen('file.xml', 'r');
// Get the nodestring incrementally from the xml file by defining a callback
// In this case using a anon function.
nodeStringFromXMLFile($handle, '<item>', '</item>', function($nodeText){
// Transform the XMLString into an array and
print_r(getArrayFromXMLString($nodeText));
});
fclose($handle);
/**
* For every node that starts with $startNode and ends with $endNode call $callback
* with the string as an argument
*
* Note: Sometimes it returns two nodes instead of a single one, this could easily be
* handled by the callback though. This function primary job is to split a large file
* into manageable XML nodes.
*
* the callback will receive one parameter, the XML node(s) as a string
*
* @param resource $handle - a file handle
* @param string $startNode - what is the start node name e.g <item>
* @param string $endNode - what is the end node name e.g </item>
* @param callable $callback - an anonymous function
*/
function nodeStringFromXMLFile($handle, $startNode, $endNode, $callback=null) {
$cursorPos = 0;
while(true) {
// Find start position
$startPos = getPos($handle, $startNode, $cursorPos);
// We reached the end of the file or an error
if($startPos === false) {
break;
}
// Find where the node ends
$endPos = getPos($handle, $endNode, $startPos) + mb_strlen($endNode);
// Jump back to the start position
fseek($handle, $startPos);
// Read the data
$data = fread($handle, ($endPos-$startPos));
// pass the $data into the callback
$callback($data);
// next iteration starts reading from here
$cursorPos = ftell($handle);
}
}
/**
* This function will return the first string it could find in a resource that matches the $string.
*
* By using a $startFrom it recurses and seeks $chunk bytes at a time to avoid reading the
* whole file at once.
*
* @param resource $handle - typically a file handle
* @param string $string - what string to search for
* @param int $startFrom - strpos to start searching from
* @param int $chunk - chunk to read before rereading again
* @return int|bool - Will return false if there are EOL or errors
*/
function getPos($handle, $string, $startFrom=0, $chunk=1024, $prev='') {
// Set the file cursor on the startFrom position
fseek($handle, $startFrom, SEEK_SET);
// Read data
$data = fread($handle, $chunk);
// Try to find the search $string in this chunk
$stringPos = mb_strpos($prev.$data, $string);
// We found the string, return the position
if($stringPos !== false ) {
return $stringPos+$startFrom - mb_strlen($prev);
}
// We reached the end of the file
if(feof($handle)) {
return false;
}
// Recurse to read more data until we find the search $string it or run out of disk
return getPos($handle, $string, $chunk+$startFrom, $chunk, $data);
}
/**
* Turn a string version of XML and turn it into an array by using the
* SimpleXML
*
* @param string $nodeAsString - a string representation of a XML node
* @return array
*/
function getArrayFromXMLString($nodeAsString) {
$simpleXML = simplexml_load_string($nodeAsString);
if(libxml_get_errors()) {
user_error('Libxml throws some errors.', implode(',', libxml_get_errors()));
}
return simplexml2array($simpleXML);
}
/**
* Turns a SimpleXMLElement into an array
*
* @param SimpleXMLelem $xml
* @return array
*/
function simplexml2array($xml) {
if(is_object($xml) && get_class($xml) == 'SimpleXMLElement') {
$attributes = $xml->attributes();
foreach($attributes as $k=>$v) {
$a[$k] = (string) $v;
}
$x = $xml;
$xml = get_object_vars($xml);
}
if(is_array($xml)) {
if(count($xml) == 0) {
return (string) $x;
}
$r = array();
foreach($xml as $key=>$value) {
$r[$key] = simplexml2array($value);
}
// Ignore attributes
if (isset($a)) {
$r['@attributes'] = $a;
}
return $r;
}
return (string) $xml;
}
@jeffreyroberts

This comment has been minimized.

Copy link

commented Aug 23, 2013

Awesome, Thank you!

@surferxo3

This comment has been minimized.

Copy link

commented Apr 30, 2016

What I want to do is to grab the xml using curl (file will be around 32mb) and display the parsed data on screen without any lagging. Memory limit is 32mb and and there are million of records

Is there any way to parse a large xml file using "yield" in php?

@regiszanandrea

This comment has been minimized.

Copy link

commented Jun 22, 2016

HI,

I tried to use this parser, but I modified the getArrayFromXMLString to this:
`

function getArrayFromXMLString($nodeAsString) {
    $simpleXML = simplexml_load_string($nodeAsString);
    if($simpleXML){
       echo "yes";
    }else{
       echo "no";
    }
    if(libxml_get_errors()) {
        user_error('Libxml throws some errors.', implode(',', libxml_get_errors()));
    }
    return simplexml2array($simpleXML);
}

`

And always, I got "no". My XML file is around 745 KB, with 18k lines. You have any idea ? Thanks a lot.

@milansaha

This comment has been minimized.

Copy link

commented Jul 22, 2016

You are the best. Worked smoothly on 2GB xml datasets.

@incredimike

This comment has been minimized.

Copy link

commented Sep 7, 2016

I found this mis-read some entries and calculated the $endPos incorrectly with my XML. In my case, when the error occured, it was because the last 2 characters of the tag were cut off in the string before being parsed. I haxed the one of the functions to check for the missing "t>" and add it when needed.

@atulopen

This comment has been minimized.

Copy link

commented Dec 8, 2016

hi , my file is of 140 mb , tried with your script but got the error: Fatal error: Maximum function nesting level of '256' reached, aborting!

@jzvikas

This comment has been minimized.

Copy link

commented Dec 21, 2016

increase nested level in php.in

@toddmcbrearty

This comment has been minimized.

Copy link

commented Jan 16, 2017

this is a great start for me. thanks so much

@ashok2009it

This comment has been minimized.

Copy link

commented Nov 2, 2017

Error "simplexml_load_string(): namespace error : Namespace prefix commons on preference-order is not defined" is showing how to fix it?

@mgdgpt

This comment has been minimized.

Copy link

commented Jan 28, 2018

Hi , how i can use nodeStringFromXMLFile with xml atributes like an ID? Thanks a lot.

Example:

"<reservation id="60613"><reservationNumber>38058</reservationNumber></reservation>"

@mgdgpt

This comment has been minimized.

Copy link

commented Feb 1, 2018

I also have internal error and go to logs and i can't see it. Can you help me please?

@Hlokolozar

This comment has been minimized.

Copy link

commented Aug 25, 2018

Thank you so much, I have been stuck for almost a week now, and I really could not work, and with just this now I can start working.

Thank you, you are an inspiration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.