Skip to content

Instantly share code, notes, and snippets.

@pento
Last active February 14, 2021 06:38
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pento/8034553 to your computer and use it in GitHub Desktop.
Save pento/8034553 to your computer and use it in GitHub Desktop.
Testing the performance of searching a lump of HTML with Regular Expressions, vs creating a DOMDocument.
<?php
$html = <<<EOT
<p><strong>Lorem #ipsum dolor sit amet</strong>, consectetur adipiscing elit. In in elit euismod, laoreet sapien eget, tristique ipsum. In #aliquam eros tortor, sit amet aliquet turpis suscipit eget. Maecenas eget vulputate metus. Phasellus at ligula ut nulla placerat imperdiet. Duis laoreet mauris <strong>eget dolor #egestas suscipit</strong>. In et #sodales elit. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. In tristique sit amet nisl ultrices rhoncus. Phasellus eget sem vitae urna pulvinar tristique non at velit. Integer eget nulla dolor. Vivamus quis iaculis massa, et faucibus mi. Quisque pretium dapibus massa, id imperdiet quam. #Morbi mollis ipsum eu mauris ultrices, <em>vel #pharetra quam sagittis</em>. Pellentesque auctor lacus massa, in tempor leo viverra id. Cras nisl ante, vehicula nec felis vitae, dictum sollicitudin eros. Donec sagittis id lorem ac tristique.</p>
<p>Duis quis consequat sapien. <a href="http://google.com/">Quisque porta nunc nec #nisi sollicitudin elementum</a>. Vestibulum facilisis tempus tristique. Nullam sed tristique nulla. In #egestas nec sapien quis tincidunt. Phasellus cursus lacinia mi, dictum bibendum dolor mollis condimentum. Suspendisse elementum, est sit amet luctus dapibus, orci tellus rutrum lacus, sit amet facilisis nisi lacus varius arcu.</p>
EOT;
$start = microtime( true );
for ( $i = 0; $i < 1000; $i++ ) {
$tags = array();
preg_match_all( '/(?:^|\s)#([\w-]+)\b/', $html, $tags );
}
$end = microtime( true );
echo "preg_match_all: " . ( $end - $start ) . "\n";
$start = microtime( true );
for ( $i = 0; $i < 1000; $i++ ) {
$tags = array();
$dom = new DOMDocument;
$dom->loadHTML( '<?xml encoding="UTF-8">' . $html );
$xpath = new DOMXPath( $dom );
$textNodes = $xpath->query( '//text()' );
foreach ( $textNodes as $textNode ) {
$matches = array();
if ( preg_match_all( '/(?:^|\s)#([\w-]+)\b/', $textNode->nodeValue, $matches ) ) {
$tags = array_merge( $tags, $matches[1] );
}
}
}
$end = microtime( true );
echo "DOM (new document): " . ( $end - $start ) . "\n";
$start = microtime( true );
$dom = new DOMDocument;
$dom->loadHTML( '<?xml encoding="UTF-8">' . $html );
$xpath = new DOMXPath( $dom );
for ( $i = 0; $i < 1000; $i++ ) {
$tags = array();
$textNodes = $xpath->query( '//text()' );
foreach ( $textNodes as $textNode ) {
$matches = array();
if ( preg_match_all( '/(?:^|\s)#([\w-]+)\b/', $textNode->nodeValue, $matches ) ) {
$tags = array_merge( $tags, $matches[1] );
}
}
}
$end = microtime( true );
echo "DOM (cached document): " . ( $end - $start ) . "\n";
@xeoncross
Copy link

Sample in case anyone wants to see the output (ubuntu 64bit / core i3)

preg_match_all:        0.096717834472656
DOM (new document):    0.21419596672058
DOM (cached document): 0.11462593078613

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment