Skip to content

Instantly share code, notes, and snippets.

@patricknelson
Last active March 12, 2020 07:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save patricknelson/0c06cc91110086d5a7dcc1203bfabd69 to your computer and use it in GitHub Desktop.
Save patricknelson/0c06cc91110086d5a7dcc1203bfabd69 to your computer and use it in GitHub Desktop.
PHP function to split arrays up into chunks to do work on (basically pagination for callbacks).

doChunks()

Very simple but useful PHP function for performing work on large arrays of information. Basically, it's like pagination for callbacks. For example, this can be used for performing bulk database INSERT operations (instead of inserting each item individually). See below for a quick real world example.

Example Usage

$numbers = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ];

$return = doChunks($numbers, 3, function($numbersChunk) {
	return array_sum($numbersChunk);
});

var_dump($return);

Should have 4 return values:

array (size=4)
  0 => int 6
  1 => int 15
  2 => int 24
  3 => int 10

Example Use Case

Here's a more complex real world example on the power of breaking generic work into smaller portions. In this case, the doChunks function is helping optimize database operations. This use case originally involved inserting large numbers of records using the SilverStripe ORM individually one at a time. SilverStripe's framework is quite powerful, but this also means that individual operations can take quite a long time when repeated thousands of times. The doChunks function comes into play because when the total number of records is not known, it's important to ensure that you limit the maximum number of values you INSERT at any given moment. This is because MySQL has an upper limit on the size of an individual query (which an vary by server via max_allowed_packet), which, in our case, was roughly 16MB. To be safe, we capped our maximum rows to 500 at a time.

This optimization (moving from ORM to bulk INSERT) resulted in reducing import time from of about 75s down to only about 2.5s (or a 30x speed increase). Less time waiting for an API to complete means reduced likelihood of a potentially failed and/or unnecessarily repeated import caused by network connectivity issues. It also means dramatically improved performance by reducing CPU hit to the server cluster and offloading the majority of work to the database itself (best practice). Note that this is only one of 3 separate usages of doChunks (which in total were inserting approximately 6.7k records).

// Insert patents in bulk into database directly, breaking up into chunks of 500. While we're here,
// we'll make sure to also build a flat de-duplicated array of inventors for insertion later.
/** @var RawInventorData[] $inventors */
$inventors = [];
doChunks($patents, 500, function($patents) use (&$inventors) {
	$dbValues = [];

	/** @var RawPatentData[] $patents */
	foreach($patents as $rawPatentData) {
		$dbValues[] = [
			$rawPatentData->matter_file_number,
			$rawPatentData->grant_date,
			$rawPatentData->patent_number,
			$rawPatentData->country,
			$rawPatentData->title,
			$rawPatentData->abstract,
		];

		// Build flattened array of de-duplicated inventors (for bulk insertion later).
		foreach($rawPatentData->inventors as $rawInventorData) {
			if (!isset($inventors[$rawInventorData->qid])) {
				$inventors[$rawInventorData->qid] = $rawInventorData;
			}
		}
	}
	
	// Pass values into insertFieldsBulk(), which will generate the concatenated string of
	// escaped field values in parenthetical groups, e.g. ('val1', 'val2'), ('val1', 'val2'), ...
	$dbFields = [
		'MatterFileNumber',
		'GrantDate',
		'PatentNumber',
		'CountryCode',
		'Title',
		'Abstract',
	];
	$insertFields = insertFieldsBulk($dbFields, $dbValues);
	$query = "INSERT INTO Patent $insertFields";
	DB::query($query);
});
<?php
/**
* Splits arrays up into chunks to do work on (basically pagination for callbacks).
*
* @param array $array
* @param int $chunkSize
* @param callable $callback
* @return array The set of returns from your callback on each chunk.
*
* @author Patrick Nelson, pat@catchyour.com
*/
function doChunks(array $array, $chunkSize, $callback) {
// Just to prevent division by zero.
$chunkSize = (int) $chunkSize;
if ($chunkSize < 1) return [];
// Calculate total chunks to slice up.
$total = count($array);
$chunks = (int) ceil($total / $chunkSize);
// Slice up and pass each chunk into the callback, collecting return values.
$returns = [];
for ($i = 1; $i <= $chunks; $i++) {
$offset = ($i - 1) * $chunkSize;
$chunk = array_slice($array, $offset, $chunkSize);
$returns[] = $callback($chunk);
}
return $returns;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment