Skip to content

Instantly share code, notes, and snippets.

@jmikola
Last active August 29, 2015 14:23
Show Gist options
  • Save jmikola/385d70ad946882f57f72 to your computer and use it in GitHub Desktop.
Save jmikola/385d70ad946882f57f72 to your computer and use it in GitHub Desktop.
Testing memory usage with MongoGridFS::storeFile()

Configuration

For our tests, we used random input data created cat /dev/urandom > random.txt.

Set the URI, database, and GridFS collection names accordingly. Additionally, decide if the GridFS collection should be dropped before inserting any data.

The number of iterations can be customized. Based on the results, there were modest increases to peak memory on successive iterations. Real memory usage tended to drop with each new iteration and climb again.

By default, the driver uses a chunk size of 255 * 1024 (i.e. 255K). This default seems to leak memory (at least until the file insert completes). Increasing the chunk size even just a bit 256K has vastly better results.

The logging of actual memory usage at each insert operation can be toggled. It's fine to leave this disabled, as we're primarily concerned with peak usage after running all iterations; however, it can be useful to see the rate of increase (particularly with smaller chunk sizes).

Lastly, we can configure whether to pass a PHP stream (from fopen()) or a filename to MongoGridFS::storeFile(). In testing, toggling this option did not cause results to significantly differ.

Results

The following tests were run with three iterations and a 357M input file. Inserts were not logged and all tests used streams. The driver and server versions were 1.6.9 and 3.0.4, respectively.

Chunk size: 255K

Iteration: 0
Inserted file 55881b86e84df16c6e8b4567 with size: 356.13MB 
Peak Memory / PHP: 2.34MB / Real: 217.75MB
================================================================================
Iteration: 1
Inserted file 55881b89e84df16c6e8b4aff with size: 356.13MB 
Peak Memory / PHP: 2.38MB / Real: 230.00MB
================================================================================
Iteration: 2
Inserted file 55881b8ce84df16c6e8b5097 with size: 356.13MB 
Peak Memory / PHP: 2.40MB / Real: 230.00MB

Chunk size: 256K

Iteration: 0
Inserted file 55881bebe84df1707d8b4567 with size: 356.13MB 
Peak Memory / PHP: 2.27MB / Real: 3.25MB
================================================================================
Iteration: 1
Inserted file 55881befe84df1707d8b4af9 with size: 356.13MB 
Peak Memory / PHP: 2.28MB / Real: 3.25MB
================================================================================
Iteration: 2
Inserted file 55881bf2e84df1707d8b508b with size: 356.13MB 
Peak Memory / PHP: 2.30MB / Real: 3.25MB

Chunk size: 1024K

Iteration: 0
Inserted file 55881c06e84df18c028b4567 with size: 356.13MB 
Peak Memory / PHP: 3.58MB / Real: 4.50MB
================================================================================
Iteration: 1
Inserted file 55881c10e84df18c028b46cd with size: 356.13MB 
Peak Memory / PHP: 3.59MB / Real: 4.50MB
================================================================================
Iteration: 2
Inserted file 55881c14e84df18c028b4833 with size: 356.13MB 
Peak Memory / PHP: 3.59MB / Real: 4.50MB
<?php
define('INPUT_FILE', __DIR__ . '/random.txt');
define('MONGODB_URI', 'mongodb://localhost:27017');
define('MONGODB_DATABASE', 'test');
define('MONGODB_GRIDFS', 'memtests');
define('DROP_GRIDFS', true);
define('ITERATIONS', 5);
define('CHUNK_SIZE', 255 * 1024);
define('LOG_INSERTS', false);
define('USE_STREAMS', true);
function formatBytes($size, $unit = null) {
if ($unit == 'GB' || (!$unit && $size >= 1<<30)) {
return number_format($size / (1<<30), 2) . 'GB';
}
if ($unit == 'MB' || (!$unit && $size >= 1<<20)) {
return number_format($size / (1<<20), 2) . 'MB';
}
if ($unit == 'KB' || (!$unit && $size >= 1<<10)) {
return number_format($size / (1<<10), 2) . 'KB';
}
return (string) $size;
}
function log_insert($server, $doc, $options) {
printf("Actual Memory / PHP: %s / Real: %s\n", formatBytes(memory_get_usage()), formatBytes(memory_get_usage(true)));
}
$ctx = stream_context_create([
'mongodb' => [
'log_insert' => 'log_insert',
'log_cmd_insert' => 'log_insert',
],
]);
$mc = new MongoClient(MONGODB_URI, [], LOG_INSERTS ? ['context' => $ctx] : []);
$db = $mc->selectDB(MONGODB_DATABASE);
$gridfs = $db->getGridFS(MONGODB_GRIDFS);
$gridfs->drop();
for ($i = 0; $i < ITERATIONS; $i++) {
printf("Iteration: %d\n", $i);
$fp = USE_STREAMS ? fopen(INPUT_FILE, 'rb') : INPUT_FILE;
$id = $gridfs->storeFile($fp, ['chunkSize' => CHUNK_SIZE]);
is_resource($fp) and fclose($fp);
$file = $gridfs->get($id);
printf("Inserted file %s with size: %s \n", $id, formatBytes($file->getSize()));
printf("Peak Memory / PHP: %s / Real: %s\n", formatBytes(memory_get_peak_usage()), formatBytes(memory_get_peak_usage(true)));
echo str_repeat('=', 80), "\n";
}
@jmikola
Copy link
Author

jmikola commented Jun 22, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment