CMCDragonkai/http_streaming.md

## http_streaming.md

      
    Raw
  

              http_streaming.md
            
          
    HTTP Streaming (or Chunked vs Store & Forward)

The standard way of understanding the HTTP protocol is via the request reply
pattern. Each HTTP transaction consists of a finitely bounded HTTP request and
a finitely bounded HTTP response.
However it's also possible for both parts of an HTTP 1.1 transaction to stream
their possibly infinitely bounded data. The advantages is that the sender can
send data that is beyond the sender's memory limit, and the receiver can act on
the data stream in chunks immediately instead of waiting for the entire data to
arrive. Basically you're either saving space or you're saving time. The
advantages of streaming is elaborated in Wikipedia's Online algorithm article.
Note that HTTP streaming is only involves the HTTP protocol and not websockets.
Streaming is also the basis for HTML5 server sent events.
So we're going to look at HTTP streaming architecture, and how to achieve
streaming in a few different languages.
The first thing to understand is that HTTP streaming involves streaming within
a single HTTP transaction. In a larger context, each HTTP transaction itself
represents an event as part of a larger event stream. This reveals to us that
the concepts of "streaming" is a context-specific concept, it's relative to what
we consider the "stream" to be.
Firstly we have to consider the HTTP headers that supports streaming. Open this
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields up for reference:
Content-Length

The Content-Length header determines the byte length of the request/response
body. If you neglect to specify the Content-Length header, HTTP servers will
implicitly add a Transfer-Encoding: chunked header. The Content-Length and
Transfer-Encoding header should not be used together. The receiver will have no
idea what the length of the body is and cannot estimate the download completion
time. If you do add a Content-Length header, make sure it matches the entire
body in bytes, if it is incorrect, the behaviour of receivers is undefined.
The Content-Length header will not allow streaming, but it is useful for large
binary files, where you want to support partial content serving. This basically
means resumable downloads, paused downloads, partial downloads, and multi-homed
downloads. This requires the use of an additional header called Range. This
technique is called Byte serving.
Transfer-Encoding

The use of Transfer-Encoding: chunked is what allows streaming within a single
request or response. This means that the data is transmitted in a chunked manner,
and does not impact the representation of the content.
Officially an HTTP client is meant to send a request with a TE header field that
specifies what kinds of transfer encodings the client is willing to accept. This is
not always sent, however most servers assume that clients can process chunked
encodings.
The chunked transfer encoding makes better use of persistent TCP connections, which
HTTP 1.1 assumes to be true by default.
Chunked data is represented in this manner:
4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
 in\r\n\r\nchunks.\r\n
0\r\n
\r\n

Each chunk starts with its byte length expressed as a hexadecimal number followed by
optional parameters (chunk extension) and a terminating CRLF sequence, followed by
the chunk data. The final chunk is terminated by a CRLF sequence.
Chunk extensions can be used to indicate a message digest or an estimated progress.
They are just custom metadata that your layer 7 receiver needs to parse. There's no
standardised format for it. Because of this, it's probably better to just add your
metadata (if any) into the chunk itself for your layer 7.5 application to parse.
For your application to send out chunked data, you must first send out the
Transfer-Encoding header, and then you must flush content in chunks according to
the chunk format. If you don't have an appropriate HTTP server that handles this, then
you need to implement the syntax generator yourself. Sometimes you can use a library
to provide an abstract interface.
For example in PHP, there's the Symfony HTTP Foundation Stream Response
and in NodeJS, it's native HTTP module chunks all responses.
Chunking is a 2 way street. The HTTP protocol allows the client to chunk HTTP
requests. This allows the client to stream the HTTP request. Which is useful for
uploading large files. However not many servers (except NGINX) support this feature,
and most streaming upload implementations rely on Javascript libraries to cut up a
binary file and send it by chunks to the server. Using Javascript gives you more
control over the uploading experience, but the HTTP protocol would be the most simplest.
Browsers natively support chunked data. So if your server sends chunked data, they
will start rendering data as soon as they receive it. However there's a buffer limit
that browsers need to receive before it starts rendering them. This is different for
each browser, but generally it's 1KB. You can see the limits for various browsers
here: http://stackoverflow.com/a/16909228/582917
If however you want to consume an API that supports streaming, you need to be aware of
how your HTTP library handles chunked data. In most cases, you'll need to attach a
callback handler that executes upon each chunk of data. This should mean that your
API will need to frame each chunk in a useful manner. If the API is doing too many
chunks, you may end up needing to buffer the data up into a "semantic protocol data
unit" (PDU) before you can work on it. This of course defeats the purpose of chunking
in the first place. For example in PHP, you can use the Guzzle library or curl.
In considering performance, you want to make sure that you're not producing way
too chunky data. The more "chunking" you do, the more overhead that exists in both
producing the chunks and parsing the chunks. Furthermore, it also results in more
executions of buffering functions if the receiver can't make immediate use of the
chunks. Chunking isn't always the right answer, it adds extra complexity on the
recipient. So if you're sending small units of things that won't gain much from
streaming, don't bother with it!
Do note that byte serving is compatible with chunked encoding, this would be applicable
where you know the total content length, want to allow partial or resumable downloads,
but you want to stream each partial response to the client.
Content-Encoding

It is also possible to compress chunked or non-chunked data. This is practically
done via the Content-Encoding header.
Note that the Content-Length is equal to the length of the body after the
Content-Encoding. This means if you have gzipped your response, then the length
calculation happens after compression. You will need to be able to load the entire
body in memory if you want to calculate the length (unless you have that information
elsewhere).
When streaming using chunked encoding, the compression algorithm must also support
online processing. Thankfully, gzip supports stream compression. I believe that
the content gets compressed first, and then cut up in chunks. That way, the chunks
are received, then decompressed to acquire the real content. If it were the other
way around, you'll get the compressed stream, and then decompressing would give us
chunks. Which doesn't make sense.
A typical compressed stream response may have these headers:
Content-Type: text/html
Content-Encoding: gzip
Transfer-Encoding: chunked

Semantically the usage of Content-Encoding indicates an "end to end" encoding
scheme, which means only the final client or final server is supposed to decode the
content. Proxies in the middle are not suppose to decode the content.
If you want to allow proxies in the middle to decode the content, the correct header
to use is in fact the Transfer-Encoding header. If the HTTP request possessed a
TE: gzip chunked header, then it is legal to respond with Transfer-Encoding: gzip chunked.
However this is very rarely supported. So you should only use Content-Encoding
for your compression right now.
Buffering Problem

The biggest problem when implementing HTTP streaming is understanding the effect of
buffering. Buffering is the practice of accumulating reads or writes into a temporary
fixed memory space. The advantages of buffering include reducing read or write call
overhead. For example instead of writing 1KB 4096 times, you can just write 4096KB at
once. This means your program can create a write buffer holding 4096KB of temporary
data (which can be aligned to the disk blocksize), and once the space limit is reached,
the buffer is flushed to disk.
Typical HTTP architectures include these components:
Client <--> Proxy <--> HTTP Server <--> Application Server <--> Database Server

Each one of these components can possess adjustable and varied buffering styles and
limits.
To correct perform streaming, you have to know and adjust the buffering limits at
each component.
For example, let's invesigate the typical PHP stack such as:
Browser <--> Proxy <--> NGINX <--> PHP <--> MySQL

The Client

Firstly browsers have a rendering buffer limit.
You must send as much data as the limit before the browsers will render the content.
Having chunks smaller than the buffer will just make the browser hold the data until
either the buffer is full or when the connection is closed (or after some time limit).
The Proxies

At the proxy level, this could be your ISP or some custom proxy. If the proxy buffers data
this means, your streamed data from upstream will be stored up the proxy buffer before
sending to the browser. Some mobile wireless ISP will buffer things and you won't be able
to control this behaviour, this is a violation of the end to end principle,
so there's nothing here you can do technically.
The Web Server

At the NGINX level, buffering is dependent upon the type of the upstream connection. There
are 3 common connection types for HTTP: "proxy", "uwsgi", "fastcgi". If you want your NGINX
server to respect streaming, you can either switch off buffering for your connection type, or
match the buffer size with the upstream chunk size. Switching off buffering can be done
using a buffering directive (proxy_buffering, uwsgi_buffering, fastcgi_buffering), or
you can use a special header X-Accel-Buffering: no which tells NGINX to not buffer the
response. The special header is more flexible, as this allows NGINX to buffer responses that
don't need streaming. It also works for all 3 connection types.
If you instead try to match the buffer size with the chunk size, you have to make sure that
the number of buffers multiplied by the buffer size (equal to a system memory page) is equal
to a single chunk size. If it is greater than a single chunk from upstream, then this means
your chunks will be accumulated before they are sent downstream. If it is less than the
chunk size, this would result in NGINX buffering to disk, you want to avoid this as this
results in extra overhead when streaming. For more information on buffer size see this gist.
Just a note on buffering optimisation: the larger the total buffer size, the greater
likelihood of each connection using more memory. This is because if each buffer is large,
there's a chance that you may not be efficiently using the buffer which can cause
memory fragmentation. In
the end, each buffer size should match the system memory page size. The number of buffers
is what can be dynamically allocated. If your total buffer size across all connections
exceeds your OS's memory limit, you're either going to meet an OOM error or starting paging
to disk. To maintain your NGINX's availability, you have to consider the theoretical
number of connections that a single NGINX server can handle, before it exhausts your server's
memory limit.
Be aware of the real chunk size after compression. If your upstream is compressing the content,
the resulting chunk size will be different. In most cases, NGINX should be doing the compression
and it does support compressing for chunk that arrives from upstream. You just need gzip on.
This means your application layer should not be compressing or chunking the content, it should
just flush raw data. NGINX is smart enough to understand and will automatically compress each
received upstream data, and then format it into chunks, which is then flushed to downstream.
There's an advantage in keeping buffers available or having a larger buffer size than the
chunk size. It comes from dealing with slow clients. NGINX as a reverse proxy is very fast
and can read the response from your upstream application server very quickly. NGINX itself
can deal with any slow browsers that has a slower read rate than your upstream's write rate.
Because NGINX is very light weight (asynchronous IO), the cost of holding a connection in
NGINX is far smaller than holding open a process (that is waiting for the client to finish
reading) in your application server. This is of course relative, as your application server
might also be very light weight, and rely on either green threads or asynchronous IO. This
problem does reveal an interesting property of streaming systems. Any stream will only be as
quick as the slowest link (reader or writer) in the chain. This problem with streaming is
related to network back pressure issue in distributed systems.
To take advantage of NGINX's ability of handling slow clients while still streaming data as
fast as possible, there will need to be some tuning of both the buffer size and potentially the
*_busy_buffer_size option. You cannot just increase the total buffer size, as that will
just make NGINX wait until the buffer is full. What you need is some buffer size that is
allocated only for slow clients. This has something to do with the *_busy_buffer_size, but
this is poorly documented currently, so I do not know how make this work.
Here are 2 quotes about the *_busy_buffer_size:

When buffering of responses from the * server is enabled, limits the total size of buffers that can be busy sending a response to the client while the response is not yet fully read. In the meantime, the rest of the buffers can be used for reading the response and, if needed, buffering part of the response to a temporary file. By default, size is limited by the size of two buffers set by the *_buffer_size and *_buffers directives.

NGINX documentation


proxy_busy_buffers_size: This directive sets the maximum size of buffers that can be marked "client-ready" and thus busy. While a client can only read the data from one buffer at a time, buffers are placed in a queue to send to the client in bunches. This directive controls the size of the buffer space allowed to be in this state.

https://www.digitalocean.com/community/tutorials/understanding-nginx-http-proxying-load-balancing-buffering-and-caching


The Application Server

At the PHP level, global buffers can be set inside the php.ini configuration file. There are
3 options defined output_buffering, output_handler and implicit_flush. They
are explained in the output control section of the PHP documentation.
It is interesting to note that for CLI applications, the output buffering is off by default.
This is so that your CLI application can show you results as its running. This buffer is controlled
by the server application programming interface "SAPI". You can control inside your application by
calling flush(), which will flush the entire SAPI buffer.
During runtime, custom buffers can also be created using ob_start(). Once you have added content
to the buffer, you can then flush your custom buffer using ob_flush(). This only flushes the buffer
that you created using ob_start(). Think of the ob_start() as a kind of PHP specific manual
memory management. You're basically asking for some block of memory (fixed or variable), which you
then can only use for your output statements and functions: echo and print.
If you have entered both levels of buffers, you need call the flush functions in this order:
ob_flush(); flush();.
Both the global SAPI buffer and the custom application buffer have settings that enable automatic
flushing. This can depend on hitting the buffer limit, or on some function call. Check the
documentation for more.
The Upstream Data Source

Finally we reach the MySQL level. This can be replaced with any upstream data source that you
are calling in order to prepare a response. By default all SQL queries are buffered. There are
2 options to achieve unbuffered queries (writes and reads). The first is the unbuffered query
option. This allows one to work
with reading large result sets, and to process each row as it arrives (including flushing to the
client).The second option works with just one single column of data. This is useful where a single
column contains a large binary or textual content, and you want to be able to work with a stream
on this data specifically. This involves the usage of the large object option. You can also stream write a large binary or textual content into the database using large
object option. The streaming of writing rows is just done by running multiple insert queries.
With regards to the second method, there are some peculiarities you have to keep in mind:
https://www.percona.com/blog/2007/07/06/php-large-result-sets-and-summary-tables/
A Note About NodeJS

NodeJS has great support for streaming. In fact its entire native HTTP module does streaming by
default for both incoming requests and outgoing responses. Everytime you call response.writeHead or
response.write, it is just writing a chunk of data. However there may be a buffer size inside
NodeJS which is probably the highWaterMark setting. However I have not looked into this further.
NodeJS has a native stream module: https://nodejs.org/api/stream.html that serves as a base object
for all other IO modules.