Skip to content

Instantly share code, notes, and snippets.

@nunoveloso
Created March 7, 2012 12:34
Show Gist options
  • Star 20 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nunoveloso/1992851 to your computer and use it in GitHub Desktop.
Save nunoveloso/1992851 to your computer and use it in GitHub Desktop.
PHP array operations up to 10x faster than the original
/**
* Home mande method to do array_diff ~10x faster that PHP built-in.
*
* @param The array to compare from
* @param An array to compare against
*
* @return an array containing all the entries from array1 that are not present in array2.
*/
function nuno_array_diff($array1, $array2) {
$diff = array();
// we don't care about keys anyway + avoids dupes
foreach ($array1 as $value) {
$diff[$value] = 1;
}
// unset common values
foreach ($array2 as $value) {
unset($diff[$value]);
}
return array_keys($diff);
}
/**
* Home mande method to do array_intersect ~10x faster that PHP built-in.
*
* @param The array to compare from
* @param An array to compare against
*
* @return an array containing all the entries from array1 that are present in array2.
*/
function nuno_array_intersect($array1, $array2) {
$a1 = $a2 = array();
// we don't care about keys anyway + avoids dupes
foreach ($array1 as $value) {
$a1[$value] = $value;
}
foreach ($array2 as $value) {
$a2[$value] = 1;
}
// unset different values values
foreach ($a1 as $value) {
if (!isset($a2[$value])) {
unset($a1[$value]);
}
}
return array_keys($a1);
}
@lesagi
Copy link

lesagi commented Sep 12, 2021

Great job! Thank you very much for the code!

The first function (i.e. nuno_array_diff()) is not 10 times faster than array_diff() internal function, at least on PHP7. Actually, it is faster in many cases (and slower in very few cases), but not ten times. The following is what I've tested (both array are filled with random values using mt_rand(), and the results are approximate):

First Array Size Second Array Size How Much Faster? Performance Factor
100 100 110% faster 2x
100 1000 120% faster 2x
100 10000 230% faster 3x
100 100000 370% faster 3.5x
100 1000000 390% faster 4x
1000 100 70% faster 1.5x
1000 1000 100% faster 2x
1000 10000 140% faster 2.5x
1000 100000 340% faster 4.5x
1000 1000000 390% faster 5x
10000 100 60% faster 1.5x
10000 1000 80% faster 1.5x
10000 10000 120% faster 2x
10000 100000 290% faster 4x
10000 1000000 380% faster 5x
100000 100 190% faster 3x
100000 1000 230% faster 3.5x
100000 10000 260% faster 3.5x
100000 100000 270% faster 3.5x
100000 1000000 360% faster 4.5x
1000000 100 220% faster 3x
1000000 1000 260% faster 3.5x
1000000 10000 320% faster 4x
1000000 100000 260% faster 3.5x
1000000 1000000 290% faster 4x
Cool! In real-world cases, it performs really better (almost 4 times better). One advantage your custom implementation has is that it got faster when the array sizes increases, comparing to array_diff() (as it can be seen in the results). Using a JIT in the future PHP version would make the results even better!

Note: I've tested also with arrays filled by a constant value, in the cases like that, array_diff() performs better sometimes (2x to 3x faster), however, they are not very common cases.

Hi, do you mind explain how you performed the tests?

@machitgarha
Copy link

machitgarha commented Sep 13, 2021

@lesagi

Hi. I posted it more than two years ago, and I don't remember how I tested it. Sorry about that, I should have been included the test code at that time. However, one method is to create an array and fill it with random numbers, run both functions in two different iteration loops, and measure it. I might reproduce and re-run a new test, and update the results (and even test JIT).

@machitgarha
Copy link

@lesagi

Updated my comment.

@muhamadsobari198
Copy link

Thanks for code!

@AnsPunktF
Copy link

AnsPunktF commented Jan 30, 2024

It is nice, aslong you are not goging to compare arrays with numbers as string in it, or with floats, or floats with the lovely exponent letter...

Otherwise you might either run into a type juggling problem later because the returned value will be an integer and not a string anymore as it was in the original array, or you'll have a floating point issue or a key that is an int.

Example for these problems using the intersect function:
$a1 = ['123', 12.34, 1e23, 2.0]; $a2 = [123, 12, "1.0E+23", 200376420512301056, 8.5-6.4-0.1]; $resMain = array_intersect($a1, $a2) $resNuno = nuno_array_intersect($a1, $a2);
So handle with care!

Use for non numeric strings to prevent type juggling or arrays that contains integers.
Don't use for floats or strings of numbers!

Maybe a solution could be to add a prefix ('_'.$value) to the keys, when the values get used as key. It is a bit hacky, but in this way you force php to keep all as string, so it compares all as string.
In the last foreach do isset('_'.$value) instead !isset() and add the result there to an array instead of using array_keys().
Should give then almost the same results as array_unique(array_inteesect()) I think, but I didn't checked the performance. For doing more equal, check in the first foreach if the key isset()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment