Skip to content

Instantly share code, notes, and snippets.

@joseluisq
Last active March 17, 2022 07:09
Show Gist options
  • Save joseluisq/6ee3876dc64561ffa14b to your computer and use it in GitHub Desktop.
Save joseluisq/6ee3876dc64561ffa14b to your computer and use it in GitHub Desktop.
Comparative between PHP `stream_get_line` and `fgets` about processing large files

Comparative between PHP stream_get_line and fgets about processing large files

Source: http://ie2.php.net/manual/en/function.fgets.php#113113

Regarding Leigh Purdie's comment (from 4 years ago) about stream_get_line being better for large files, I decided to test this in case it was optimized since then and I found out that Leigh's comment is just completely incorrect fgets actually has a small amount of better performance, but the test Leigh did was not set up to produce good results.

The suggested test was:

$ time yes "This is a test line" | head -1000000 | php -r '$fp=fopen("php://stdin","r"); while($line=stream_get_line($fp,65535,"\n")) { 1; } fclose($fp);'

0m1.616s
$ time yes "This is a test line" | head -1000000 | php -r '$fp=fopen("php://stdin","r"); while($line=fgets($fp,65535)) { 1; } fclose($fp);'

0m7.392s

The reason this is invalid is because the buffer size of 65535 is completely unnecessary piping the output of "yes 'this is a test line'" in to PHP makes each line 19 characters plus the delimiter so while I don't know why stream_get_line performs better with an oversize buffer, if both buffer sizes are correct, or default, they have a negligable performance difference - although notably, stream_get_line is consistent - however if you're thinking of switching, make sure to be aware of the difference between the two functions, that stream_get_line does NOT append the delimiter, and fgets DOES append the delimiter.

Here are the results on one of my servers:

Buffer size 65535
stream_get_line:    0.340s
fgets:   2.392s

Buffer size of 1024
stream_get_line:  0m0.348s
fgets: 0.404s

Buffer size of 8192 (the default for both)
stream_get_line: 0.348s
fgets:  0.552s

Buffer size of 100:
stream_get_line: 0.332s
fgets: 0.368s
@hAbd0u
Copy link

hAbd0u commented Apr 10, 2021

Hello,
I was about to make some benchmarks because I have large files to process, you just saved me some time ;). How ever, when some one read you results it seems stream_get_line is the winner here, but you are saying the opposite, would explain this to me?

@InfinitumForm
Copy link

InfinitumForm commented Mar 17, 2022

This is perfect explanation. Let's give you something great. If you need read file a soon as possible, this is the right solution:

$path = 'some/large/file.json';
$data = '';
$chunk_length = 1024;
$fh = fopen($path,'r');
	while (($line = stream_get_line($fh, $chunk_length)) !== false){
		$data.=$line;
		fflush($fh);
	}
fclose($fh); unset($fh);

I use this in the loop to read around 300 json files in some cases and merge data to one array after json decode. I get results arround 1 second.

NOTE: If you read files in the loop, you don't need unset($fh); in foreach reading. Just unset when loop stop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment