Skip to content

Instantly share code, notes, and snippets.

@nunoveloso
Created March 7, 2012 12:34
Show Gist options
  • Star 20 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nunoveloso/1992851 to your computer and use it in GitHub Desktop.
Save nunoveloso/1992851 to your computer and use it in GitHub Desktop.
PHP array operations up to 10x faster than the original
/**
* Home mande method to do array_diff ~10x faster that PHP built-in.
*
* @param The array to compare from
* @param An array to compare against
*
* @return an array containing all the entries from array1 that are not present in array2.
*/
function nuno_array_diff($array1, $array2) {
$diff = array();
// we don't care about keys anyway + avoids dupes
foreach ($array1 as $value) {
$diff[$value] = 1;
}
// unset common values
foreach ($array2 as $value) {
unset($diff[$value]);
}
return array_keys($diff);
}
/**
* Home mande method to do array_intersect ~10x faster that PHP built-in.
*
* @param The array to compare from
* @param An array to compare against
*
* @return an array containing all the entries from array1 that are present in array2.
*/
function nuno_array_intersect($array1, $array2) {
$a1 = $a2 = array();
// we don't care about keys anyway + avoids dupes
foreach ($array1 as $value) {
$a1[$value] = $value;
}
foreach ($array2 as $value) {
$a2[$value] = 1;
}
// unset different values values
foreach ($a1 as $value) {
if (!isset($a2[$value])) {
unset($a1[$value]);
}
}
return array_keys($a1);
}
@machitgarha
Copy link

machitgarha commented Jul 10, 2019

Great job! Thank you very much for the code!

(After an extensive edit:)

Let's compare the performance of array_diff() and nuno_array_diff():

PHP 8.0(.10), no JIT

Note: The results for PHP 7.4 are more or less the same. The execution times in PHP8 are slightly faster, however (test it yourself please, no way to include it here, as it makes the comment too long and not so useful).

Array Sizes (#1, #2) Value Range(s) Time #1 Time #2 #2 is Faster than #1 by
(100, 10000) [0, 10] 0.0001s 0.0002s -100% (0.5x)
(100, 10000) [0, 1000] 0.0005s 0.0002s 150% (2.5x)
(100, 10000) [0, 100000] 0.0006s 0.0002s 200% (3x)
(100, 10000) [0, 10000000] 0.0005s 0.0002s 150% (2.5x)
(100, 1000000) [0, 10] 0.0152s 0.0134s 13% (1.1x)
(100, 1000000) [0, 1000] 0.0441s 0.0141s 213% (3.1x)
(100, 1000000) [0, 100000] 0.2323s 0.0175s 1227% (13.3x)
(100, 1000000) [0, 10000000] 0.2000s 0.0206s 871% (9.7x)
(10000, 100) [0, 10] 0.0001s 0.0002s -100% (0.5x)
(10000, 100) [0, 1000] 0.0006s 0.0003s 100% (2x)
(10000, 100) [0, 100000] 0.0005s 0.0007s -40% (0.7x)
(10000, 100) [0, 10000000] 0.0005s 0.0005s 0% (1x)
(10000, 10000) [0, 10] 0.0003s 0.0003s 0% (1x)
(10000, 10000) [0, 1000] 0.0009s 0.0004s 125% (2.3x)
(10000, 10000) [0, 100000] 0.0013s 0.0007s 86% (1.9x)
(10000, 10000) [0, 10000000] 0.0012s 0.0006s 100% (2x)
(10000, 1000000) [0, 10] 0.0154s 0.0137s 12% (1.1x)
(10000, 1000000) [0, 1000] 0.0442s 0.0142s 211% (3.1x)
(10000, 1000000) [0, 100000] 0.2308s 0.0184s 1154% (12.5x)
(10000, 1000000) [0, 10000000] 0.1858s 0.0238s 681% (7.8x)
(1000000, 100) [0, 10] 0.0111s 0.0150s -35% (0.7x)
(1000000, 100) [0, 1000] 0.0551s 0.0208s 165% (2.6x)
(1000000, 100) [0, 100000] 0.0588s 0.0537s 9% (1.1x)
(1000000, 100) [0, 10000000] 0.0634s 0.1784s -181% (0.4x)
(1000000, 10000) [0, 10] 0.0111s 0.0151s -36% (0.7x)
(1000000, 10000) [0, 1000] 0.0416s 0.0169s 146% (2.5x)
(1000000, 10000) [0, 100000] 0.0755s 0.0535s 41% (1.4x)
(1000000, 10000) [0, 10000000] 0.0697s 0.1839s -164% (0.4x)
(1000000, 1000000) [0, 10] 0.0267s 0.0285s -7% (0.9x)
(1000000, 1000000) [0, 1000] 0.0825s 0.0311s 165% (2.7x)
(1000000, 1000000) [0, 100000] 0.4891s 0.0808s 505% (6.1x)
(1000000, 1000000) [0, 10000000] 0.4427s 0.2910s 52% (1.5x)

PHP 8.0(.10), tracing JIT

php.ini configuration:

opcache.jit = On
opcache.jit_buffer_size = 128M
Array Sizes (#1, #2) Value Range(s) Time #1 Time #2 #2 is Faster than #1 by
(100, 10000) [0, 10] 0.0002s 0.0001s 100% (2x)
(100, 10000) [0, 1000] 0.0005s 0.0001s 400% (5x)
(100, 10000) [0, 100000] 0.0007s 0.0002s 250% (3.5x)
(100, 10000) [0, 10000000] 0.0007s 0.0002s 250% (3.5x)
(100, 1000000) [0, 10] 0.0158s 0.0095s 66% (1.7x)
(100, 1000000) [0, 1000] 0.0437s 0.0102s 328% (4.3x)
(100, 1000000) [0, 100000] 0.2283s 0.0124s 1741% (18.4x)
(100, 1000000) [0, 10000000] 0.1943s 0.0160s 1114% (12.1x)
(10000, 100) [0, 10] 0.0002s 0.0001s 100% (2x)
(10000, 100) [0, 1000] 0.0005s 0.0001s 400% (5x)
(10000, 100) [0, 100000] 0.0005s 0.0006s -20% (0.8x)
(10000, 100) [0, 10000000] 0.0006s 0.0004s 50% (1.5x)
(10000, 10000) [0, 10] 0.0002s 0.0002s 0% (1x)
(10000, 10000) [0, 1000] 0.0008s 0.0002s 300% (4x)
(10000, 10000) [0, 100000] 0.0013s 0.0007s 86% (1.9x)
(10000, 10000) [0, 10000000] 0.0012s 0.0006s 100% (2x)
(10000, 1000000) [0, 10] 0.0156s 0.0099s 58% (1.6x)
(10000, 1000000) [0, 1000] 0.0458s 0.0097s 372% (4.7x)
(10000, 1000000) [0, 100000] 0.2482s 0.0157s 1481% (15.8x)
(10000, 1000000) [0, 10000000] 0.1971s 0.0192s 927% (10.3x)
(1000000, 100) [0, 10] 0.0113s 0.0081s 40% (1.4x)
(1000000, 100) [0, 1000] 0.0564s 0.0130s 334% (4.3x)
(1000000, 100) [0, 100000] 0.0590s 0.0420s 40% (1.4x)
(1000000, 100) [0, 10000000] 0.0653s 0.1653s -153% (0.4x)
(1000000, 10000) [0, 10] 0.0116s 0.0082s 41% (1.4x)
(1000000, 10000) [0, 1000] 0.0419s 0.0093s 351% (4.5x)
(1000000, 10000) [0, 100000] 0.0813s 0.0432s 88% (1.9x)
(1000000, 10000) [0, 10000000] 0.0754s 0.1715s -127% (0.4x)
(1000000, 1000000) [0, 10] 0.0287s 0.0194s 48% (1.5x)
(1000000, 1000000) [0, 1000] 0.0919s 0.0208s 342% (4.4x)
(1000000, 1000000) [0, 100000] 0.5627s 0.0783s 619% (7.2x)
(1000000, 1000000) [0, 10000000] 0.4300s 0.2413s 78% (1.8x)

Note: The test code is available here as gist (with the defaults). Both arrays are filled with random values using random_int(), and meaningless results (the first most ones) are excluded.

Looking at the results:

  • In the case of no JIT:
    • If the first array is smaller or both arrays has the same size, then nuno_array_diff() is a better choice. But, if the second one is smaller, then array_diff() might be a better choice. Personally, I think the later case is more common in real-world cases, but at the same time, it should be considered micro-optimizations; because often you don't have to deal with such a large arrays. However, in heavy mathematical operations, nuno_array_diff() should work better.
    • The function is up to 10x (or even 15x) faster that the built-in, but not approximately.
  • In the case of JIT: Use nuno_array_diff(). JIT is super cool. :)

@lesagi
Copy link

lesagi commented Sep 12, 2021

Great job! Thank you very much for the code!

The first function (i.e. nuno_array_diff()) is not 10 times faster than array_diff() internal function, at least on PHP7. Actually, it is faster in many cases (and slower in very few cases), but not ten times. The following is what I've tested (both array are filled with random values using mt_rand(), and the results are approximate):

First Array Size Second Array Size How Much Faster? Performance Factor
100 100 110% faster 2x
100 1000 120% faster 2x
100 10000 230% faster 3x
100 100000 370% faster 3.5x
100 1000000 390% faster 4x
1000 100 70% faster 1.5x
1000 1000 100% faster 2x
1000 10000 140% faster 2.5x
1000 100000 340% faster 4.5x
1000 1000000 390% faster 5x
10000 100 60% faster 1.5x
10000 1000 80% faster 1.5x
10000 10000 120% faster 2x
10000 100000 290% faster 4x
10000 1000000 380% faster 5x
100000 100 190% faster 3x
100000 1000 230% faster 3.5x
100000 10000 260% faster 3.5x
100000 100000 270% faster 3.5x
100000 1000000 360% faster 4.5x
1000000 100 220% faster 3x
1000000 1000 260% faster 3.5x
1000000 10000 320% faster 4x
1000000 100000 260% faster 3.5x
1000000 1000000 290% faster 4x
Cool! In real-world cases, it performs really better (almost 4 times better). One advantage your custom implementation has is that it got faster when the array sizes increases, comparing to array_diff() (as it can be seen in the results). Using a JIT in the future PHP version would make the results even better!

Note: I've tested also with arrays filled by a constant value, in the cases like that, array_diff() performs better sometimes (2x to 3x faster), however, they are not very common cases.

Hi, do you mind explain how you performed the tests?

@machitgarha
Copy link

machitgarha commented Sep 13, 2021

@lesagi

Hi. I posted it more than two years ago, and I don't remember how I tested it. Sorry about that, I should have been included the test code at that time. However, one method is to create an array and fill it with random numbers, run both functions in two different iteration loops, and measure it. I might reproduce and re-run a new test, and update the results (and even test JIT).

@machitgarha
Copy link

@lesagi

Updated my comment.

@muhamadsobari198
Copy link

Thanks for code!

@AnsPunktF
Copy link

AnsPunktF commented Jan 30, 2024

It is nice, aslong you are not goging to compare arrays with numbers as string in it, or with floats, or floats with the lovely exponent letter...

Otherwise you might either run into a type juggling problem later because the returned value will be an integer and not a string anymore as it was in the original array, or you'll have a floating point issue or a key that is an int.

Example for these problems using the intersect function:
$a1 = ['123', 12.34, 1e23, 2.0]; $a2 = [123, 12, "1.0E+23", 200376420512301056, 8.5-6.4-0.1]; $resMain = array_intersect($a1, $a2) $resNuno = nuno_array_intersect($a1, $a2);
So handle with care!

Use for non numeric strings to prevent type juggling or arrays that contains integers.
Don't use for floats or strings of numbers!

Maybe a solution could be to add a prefix ('_'.$value) to the keys, when the values get used as key. It is a bit hacky, but in this way you force php to keep all as string, so it compares all as string.
In the last foreach do isset('_'.$value) instead !isset() and add the result there to an array instead of using array_keys().
Should give then almost the same results as array_unique(array_inteesect()) I think, but I didn't checked the performance. For doing more equal, check in the first foreach if the key isset()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment