Skip to content

Instantly share code, notes, and snippets.

@andrey-legayev
Last active January 7, 2024 21:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andrey-legayev/c2879927bcb4a648793ad0aea1509773 to your computer and use it in GitHub Desktop.
Save andrey-legayev/c2879927bcb4a648793ad0aea1509773 to your computer and use it in GitHub Desktop.
PHP benchmark: in_array() vs. binary search in sorted array
<?php
/**
* Binary search - search element in sorted array of values and return true/false
*
* @param $value
* @param $array array
* @return bool
*/
function in_array_binary($value, array &$array)
{
// Set the left pointer to 0.
$left = 0;
// Set the right pointer to the length of the array -1.
$right = count($array) - 1;
while ($left <= $right) {
// Set the initial midpoint to the rounded down value of half the length of the array.
$midpoint = intdiv(($left + $right), 2);
$current = $array[$midpoint];
if ($current < $value) {
// The midpoint value is less than the value.
$left = $midpoint + 1;
} elseif ($current > $value) {
// The midpoint value is greater than the value.
$right = $midpoint - 1;
} else {
// This is the key we are looking for - we found it
return true;
}
}
// The value was not found.
return false;
}
function test($size, $iterations)
{
// prepare arrays
$arr = [];
for ($i = 0; $i < $size; $i++) {
$value = rand();
array_push($arr, $value);
}
// run tests
$t1 = microtime(true);
for ($j = 0; $j < $iterations; $j++) {
$x = in_array($j, $arr);
}
$t1 = microtime(true) - $t1;
$t2 = microtime(true);
sort($arr);
for ($j = 0; $j < $iterations; $j++) {
$x = in_array_binary($j, $arr);
}
$t2 = microtime(true) - $t2;
// echo results as CSV
printf("%s, %d, %d, %f, %f\n", phpversion(), $size, $iterations, $t1, $t2);
}
echo "# PHP Version, Array Size, Iterations, in_array(), binary_search()\n";
for ($i = 1; $i <= 100000; $i = $i * 10) {
test($i, 10000);
}
#!/bin/bash
for ver in 7.0 7.1 7.2 7.3; do
echo "# running tests for version $ver"
image=php:$ver-cli
docker run --rm -v "$PWD":/src $image \
bash -c "php /src/perf-test.php"
done
@andrey-legayev
Copy link
Author

andrey-legayev commented Oct 7, 2019

Note: "sort()" is included into binary search time (!)
In normal case data should be sorted before any searches.

My results:
image

Surface diagram:
image

Do you still think that in_array() is good one? because it's readable?

@andrey-legayev
Copy link
Author

On small data sets (<10000 elements) in_array is still be faster.
It's due to the fact that in_array() is native PHP function, but binary search implemented as php code.

This is comparison - see last value in rows: "ratio"

# PHP Version, Array Size, Iterations, in_array(), binary_search(), ratio
7.3.10, 1, 10000, 0.001805, 0.006006, ratio: 0.300504
7.3.10, 10, 10000, 0.001712, 0.013662, ratio: 0.125316
7.3.10, 100, 10000, 0.003379, 0.020435, ratio: 0.165346
7.3.10, 1000, 10000, 0.019565, 0.031215, ratio: 0.626774
7.3.10, 10000, 10000, 0.163645, 0.037205, ratio: 4.398471
7.3.10, 100000, 10000, 2.107781, 0.103909, ratio: 20.284919

@oleksandr-butenko
Copy link

Thank you very detailed explanation and giving point where it's break even.

@igortregub
Copy link

On small data sets (<10000 elements) in_array is still be faster.

probably because in_array_binary function is written on php instead of in_array :)

@Antarian
Copy link

Antarian commented Jan 7, 2024

That break even point can be even lower. Because sort() is inside of the time counter, it is adding a lot to it. If your array comes pre-sorted or you know it is sorted the way you need then binary search will be even quicker.

@andrey-legayev
Copy link
Author

andrey-legayev commented Jan 7, 2024

@Antarian as Igor stated above it could be rewritten in c/c++ (as php module?) and be even more faster.
Also if you perform searches many times then probably it's good idea to convert input array into hashset once and then perform searches: https://www.php.net/manual/en/class.ds-set.php
Or at least do array_flip() and search by key with isset()

@Antarian
Copy link

Antarian commented Jan 7, 2024

@andrey-legayev I have mentioned pre-sorted arrays because php is great in combination with other technologies like SQL.
Yes, if you work with larger datasets (more than 10k list items) in PHP then Ds module seems to be reasonable option. Thanks for the link.
Creating new PHP module in C, is for me on the same level as doing stuff in any other suitable language and using specialized microservice for the task.
PHP array and array functions are like good all-rounder potato, nothing special, but nothing bad. Perfectly suitable until you hit special requirements somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment