Skip to content

Instantly share code, notes, and snippets.

@rotexdegba
Last active September 9, 2022 01:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rotexdegba/9751711 to your computer and use it in GitHub Desktop.
Save rotexdegba/9751711 to your computer and use it in GitHub Desktop.
Alternative implementation of in_array using either isset or array_key_exists. This approach requires restructuring the original array.
<?php
//This isset optimization is not suitable for arrays that contains null values.
/*
$a = array('a' => null);
isset($a['a']); // Will be FALSE
array_key_exists('a', $a); // Will be TRUE
*/
$total = 10000;
$paragraph = 'this is a sentence. Crud! $$$$!';
$words = explode(' ', $paragraph);
// Create a bunch of letter entries
$sequence = [];
for ($j = 0; $j < 10000; $j++) {
$sequence[] = 'a' . $j;
}
$s = microtime(true);
for ($j = 0; $j < $total; $j++) {
foreach ($words as $word) {
in_array(strtolower($word), $sequence);
}
}
echo "in_array: " . (microtime(true) - $s) . "\n";
// Convert the array to a hash
$hash = array_fill_keys($sequence, true);
$s = microtime(true);
for ($j = 0; $j < $total; $j++) {
foreach ($words as $word) {
isset($hash[strtolower($word)]);
}
}
echo "isset: " . (microtime(true) - $s) . "\n";
$s = microtime(true);
for ($j = 0; $j < $total; $j++) {
foreach ($words as $word) {
array_key_exists(strtolower($word), $hash);
}
}
echo "array_key_exists: " . (microtime(true) - $s) . "\n";
@rotexdegba
Copy link
Author

Alternative implementation of in_array using either isset or array_key_exists. This approach requires restructuring the original array.

http://mtdowling.com/blog/2014/03/17/hash-lookups-over-array-search/

Favor Hash Lookups Over Array Searches

A common programming requirement is to match a string against a set of known strings. For example, let’s say you were iterating over the words in a forum post and testing to see if a word is in a list of prohibited words. A common approach to this problem is to create an array of the known prohibited words and then use PHP’s in_array() function to test if the string is found in the list. However, there’s a simple optimization you can make to significantly improve the performance of the algorithm.

Searching an Array

<?php

    $words = get_all_words_in_text($text);
    $badWords = ['$$$$', '@#$%', 'crud' /** ... */ ];

    foreach ($words as $word) {
        if (in_array(strtolower($word), $badWords)) {
            echo 'Found bad word: ' . $word . "\n";
        }
    }
?>

This approach solves the problem, but it’s inefficient. As we iterate over each word, we then test each word against each word in the bad word list using in_array(). PHP’s in_array() function runs in linear time, meaning that as the size of the bad words array increases, the amount of time it takes to search the array proportionally increases. We can do much better than this.

Hash Lookups

Because the list of bad words is known up front, we can easily restructure our program to run in constant time by utilizing a PHP associative array. Like most hash tables, a key lookup on a PHP associative array is O(1) (except when a collision occurs which isn’t a concern for our use case).

If we restructure our PHP array to become an associative array where the words we are looking for are the keys and each value is true, then we can use PHP’s isset() function and greatly speed things up:

<?php

    $words = get_all_words_in_text($text); 
    $badWords = [ 
        '$$$$' => true, 
        '@#$%' => true, 
        'crud' => true 
        // ... 
    ]; 

    foreach ($words as $word) { 

        if (isset($badWords[strtolower($word)])) { 

            echo 'Found bad word: ' . $word . "\n"; 
        } 
    }
?>

@parijke
Copy link

parijke commented May 4, 2022

Since you have two arrays, you could use the array_intersect instead of a foreach loop?

@rotexdegba
Copy link
Author

Since you have two arrays, you could use the array_intersect instead of a foreach loop?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment