Skip to content

Instantly share code, notes, and snippets.

@pento pento/regex-vs-dom.php
Last active Dec 31, 2015

Embed
What would you like to do?
Testing the performance of searching a lump of HTML with Regular Expressions, vs creating a DOMDocument.
<?php
$html = <<<EOT
<p><strong>Lorem #ipsum dolor sit amet</strong>, consectetur adipiscing elit. In in elit euismod, laoreet sapien eget, tristique ipsum. In #aliquam eros tortor, sit amet aliquet turpis suscipit eget. Maecenas eget vulputate metus. Phasellus at ligula ut nulla placerat imperdiet. Duis laoreet mauris <strong>eget dolor #egestas suscipit</strong>. In et #sodales elit. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. In tristique sit amet nisl ultrices rhoncus. Phasellus eget sem vitae urna pulvinar tristique non at velit. Integer eget nulla dolor. Vivamus quis iaculis massa, et faucibus mi. Quisque pretium dapibus massa, id imperdiet quam. #Morbi mollis ipsum eu mauris ultrices, <em>vel #pharetra quam sagittis</em>. Pellentesque auctor lacus massa, in tempor leo viverra id. Cras nisl ante, vehicula nec felis vitae, dictum sollicitudin eros. Donec sagittis id lorem ac tristique.</p>
<p>Duis quis consequat sapien. <a href="http://google.com/">Quisque porta nunc nec #nisi sollicitudin elementum</a>. Vestibulum facilisis tempus tristique. Nullam sed tristique nulla. In #egestas nec sapien quis tincidunt. Phasellus cursus lacinia mi, dictum bibendum dolor mollis condimentum. Suspendisse elementum, est sit amet luctus dapibus, orci tellus rutrum lacus, sit amet facilisis nisi lacus varius arcu.</p>
EOT;
$start = microtime( true );
for ( $i = 0; $i < 1000; $i++ ) {
$tags = array();
preg_match_all( '/(?:^|\s)#([\w-]+)\b/', $html, $tags );
}
$end = microtime( true );
echo "preg_match_all: " . ( $end - $start ) . "\n";
$start = microtime( true );
for ( $i = 0; $i < 1000; $i++ ) {
$tags = array();
$dom = new DOMDocument;
$dom->loadHTML( '<?xml encoding="UTF-8">' . $html );
$xpath = new DOMXPath( $dom );
$textNodes = $xpath->query( '//text()' );
foreach ( $textNodes as $textNode ) {
$matches = array();
if ( preg_match_all( '/(?:^|\s)#([\w-]+)\b/', $textNode->nodeValue, $matches ) ) {
$tags = array_merge( $tags, $matches[1] );
}
}
}
$end = microtime( true );
echo "DOM (new document): " . ( $end - $start ) . "\n";
$start = microtime( true );
$dom = new DOMDocument;
$dom->loadHTML( '<?xml encoding="UTF-8">' . $html );
$xpath = new DOMXPath( $dom );
for ( $i = 0; $i < 1000; $i++ ) {
$tags = array();
$textNodes = $xpath->query( '//text()' );
foreach ( $textNodes as $textNode ) {
$matches = array();
if ( preg_match_all( '/(?:^|\s)#([\w-]+)\b/', $textNode->nodeValue, $matches ) ) {
$tags = array_merge( $tags, $matches[1] );
}
}
}
$end = microtime( true );
echo "DOM (cached document): " . ( $end - $start ) . "\n";
@Xeoncross

This comment has been minimized.

Copy link

commented Jan 6, 2014

Sample in case anyone wants to see the output (ubuntu 64bit / core i3)

preg_match_all:        0.096717834472656
DOM (new document):    0.21419596672058
DOM (cached document): 0.11462593078613
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.