bwoebi/text.md

## text.md

      
    Raw
  

              text.md
            
          
    Reading of non-blocking TCP sockets in PHP

As a short heads-up for those unfamiliar:

There is a PHP level buffer. Whenever more is actually read from the socket (default chunk size is 8192), than the user requests in his PHP code, the data is stored there.
There is an OS level buffer. There all incoming network data lands. The event loop only knows of that one and checks that one for being non-empty.

Trivial single read

The trivial reading function in PHP is fread($socket, $maxSize).
fread() does a single read:

If there's data on the (PHP internal) socket buffer, it prepends that data
Then (if buffer size smaller than $size), it issues a recv() call given the socket chunk size (which defaults to 8192).
It returns up to $size bytes, leaving the rest in the socket buffer.

Let's then use this in our event loop (proper error handling is omitted here):
function read($socket) {
  $deferred = new DeferredFuture();
	EventLoop::onReadable($socket, function ($socket, $watcher) use ($deferred) {
		EventLoop::cancel($watcher);
		$deferred->resolve((string)@fread($socket, 65536));
	});
	return $deferred->getFuture();
}
This in itself is pretty fine, with two minor caveats:

We may have chosen a read size of 65536, but fread() only does a single recv() with the chunk size. Thus we introduce extra latency here, by only reading a single time in each event loop tick.
When the $socket is a TLS (or gzip or similar) stream, it will automatically read a bit more than requested from the stream and buffer a little (to process TLS control messages for example), no matter what stream_set_read_buffer() is set to. In this case, if everything is already in the buffer (e.g. from a previous call), the onReadable callback will never trigger. This is only an issue though, if the socket chunk size is larger or close to the read size.

To circumvent these issues, it is possible to use stream_get_contents() instead of fread(), omitting the size argument. This will do repeated reads until nothing is left on the OS socket buffer, and guaranteedly not leave anything on the PHP internal socket buffer.
Beware though, in very high throughput cases using stream_get_contents() without any $maxSize may result in gigantic strings, where full-string scans may have a noticeable performance impact. Therefore it may be optimal to choose a big chunk size (via stream_set_chunk_size()) and then call fread() with a very large $maxSize.
Reading of an explicit length

Now, as an additional constraint, we want to be able to limit the amount of bytes read.
function readBytes($socket, $maxBytes) {
	$deferred = new DeferredFuture();
	$buffer = "";
	EventLoop::onReadable($socket, function ($socket, $watcher) use ($deferred, &$buffer, $maxBytes) {
		$buffer .= @stream_get_contents($socket, $maxBytes - strlen($buffer));
		if (\strlen($buffer) >= $maxBytes) {
			EventLoop::cancel($watcher);
			$deferred->resolve($buffer);
		}
	});
	return $deferred->getFuture();
}
This has a major issue (apart from the missing stream closed/error handling), each causing the program to hang, indefinitely waiting for readBytes() to resolve, despite all data being there:
The trivial case is two consecutive calls to readBytes(), where the sum of both $maxBytes is lower than the chunk size (and no additional data being sent by the client):
In this case the first call to readBytes() works flawlessly, but PHP will have read chunk size bytes and put everything else into the buffer. Now our callback to onReadable() does not get called anymore, because the EventLoop is only aware of the OS buffer, which has been emptied.
There are two possible solutions to this specific problem:

Bypassing the PHP internal buffer by setting stream_set_read_buffer($socket, 0);
Calling stream_get_contents() explicitly on the socket, before installing the event loop handler.

The first solution to just disable the PHP internal stream buffer sounds appealing, but as hinted earlier, this buffer gets used by e.g. TLS sockets. If your connection is using bare TCP sockets, then just disabling buffering is easiest the way to go.
However, if this does not work out, one manually needs to try to read from the socket, before leaving control to the event loop:
function readBytes($socket, $maxBytes) {
	$deferred = new DeferredFuture();
	$buffer = @stream_get_contents($socket, $maxBytes);
	if (strlen($buffer) < $maxBytes) {
		EventLoop::onReadable($socket, function ($socket, $watcher) use ($deferred, &$buffer, $maxBytes) {
			$buffer .= @stream_get_contents($socket, $maxBytes - strlen($buffer));
			if (\strlen($buffer) >= $maxBytes) {
				EventLoop::cancel($watcher);
				$deferred->resolve($buffer);
			}
		});
	} else {
		$deferred->resolve($buffer);
	}
	return $deferred->getFuture();
}
Mistakes to avoid

Avoiding hangs using read lengths being integer multiples of the chunk size

Imagine that we do want to limit the amount of data being processed in each event loop tick. Let's say, we want to process up to 8192 bytes. We realize that this is naturally equal to the chunk size. Thus we're clever and try:
function readUpTo8192($socket) {
	$deferred = new DeferredFuture();
	EventLoop::onReadable($socket, function ($socket, $watcher) use ($deferred, $maxBytes) {
		EventLoop::cancel($watcher);
		$deferred->resolve(@stream_get_contents($socket, 8192));
	});
	return $deferred->getFuture();
}
This is prone to a race condition: stream_get_contents() internally does multiple reads, until our passed $maxSize is reached. Basically:
function stream_get_contents($socket, $maxSize) {
	$buffer = "";
	while (($read = fread($socket, $maxSize - strlen($buffer))) > 0) {
	  $buffer .= $read;
	}
	return $buffer;
}
These individual internal fread() calls in the stream_get_contents() implementation however may be called with a $maxSize not a multiple of the chunk size. Unrolling this, the issue becomes visible:
first fread call: fread($socket, 8192)
// returns a string of size 7000 (because that's what's in the OS buffer at this point)

second fread call: fread($socket, 8192 - 7000)
// the OS buffer happens to have filled up again now, with more than enough data
// thus this now returns a string of size 1192.

Note that each fread() call is individually subject to the chunk size. So, on the second fread() call, our chunk size is still 8192. The OS returns 8192 bytes, of which 1192 bytes are returned and the other 7000 bytes land in the PHP internal buffer, defeating our clever attempt at avoiding the PHP internal buffer from becoming filled.
With the right combination of bad luck, this scenario may never happen in a testing environment and lead to mysterious hangs on production. Don't do this.
Avoiding hangs by using fread with maxSize lower than ~25KB on encrypted streams

I already explained this in the first section:

Yes, using fread will avoid the internal buffer. But only on bare TCP streams. (if >= chunk size)
Encrypted streams may always make use of the PHP internal buffer.

TLS uses a blockwise encoding for data, interspersed with control frames, thus openssl needs separate them from actual data, and sometimes may have more or less data than requested by our fread() call. In the case of more data, all other data lands on the PHP internal buffer. And once the data is there, our onReadable callback does not get triggered anymore.
(Note: 33 KB is calculated as a function of maximum TLS block size (16 KiB), and possible a possible subsequent block of 8 KB, including some TLS data frame overhead.)
Final note

stream_select() is also aware of the PHP internal buffer. With stream_select() many issues here do not exist, as the stream is there recognized as readable and thus triggering the onReadable callbacks.
Using a proper O(1) event loop based on e.g. libev or libuv is exposing the problems presented here. This is especially hideous in case local development happens on a local machine where a stream_select() (which is O(n)) based event loop is used, but production uses the more efficient O(1) event loops.
I generally recommend using a properly designed library for reading from non-blocking sockets, which have sorted out all these issues here, and also error handling, instead of rolling a custom solution here.