chrisdickinson/wip.md Secret

## wip.md

      
    Raw
  

              wip.md
            
          
    Streams: Here is What is Supposed to Happen

Node's streams are designed to tackle the problem of asynchronous data spread
out over time — that data might "happen" zero to many times, and it may
terminate either in an errored state or a successful state.
No single dimension of how streams work is particularly hard to mentally model,
but taken altogether it quickly becomes hard to accurately model the entire
system. Nevertheless they are a useful pattern; this document will attempt to
note where liberal abstractions may be safely made.
Streams, by necessity, operate on an asynchronous substrate: events, promises,
or callbacks. Node's streams use EventEmitter events as a substrate. WHATWG
streams operate on Promise's.
The APIs available on streams serve many masters. There are, roughly speaking,
two sets of APIs available:

.read
.write
.end

and:

.push
.pipe

It is not wise to mix the two APIs. The first API of a stream should only
be used if the stream is not currently, and will never be, piped to another
stream. The second API may be used for any stream in a pipeline.
You can cook an egg on an car's engine, or you can drive the car on the
highway. These are both public APIs of a car. It is unwise to attempt to use
both at once.
This document will concentrate on the second API, since most of the power and
complexity of streams are contained in those two methods.
What you need to know about backpressure and watermarks

highWaterMark exists to mask the effects of latency. Backpressure exists
to deal with bandwidth. If those terms don't make sense to you, read on.
Backpressure is the mechanism by which a destination stream may communicate
to a source stream that it is currently busy and should not be sent any more
information.
This mechanism is essential to the composability of streams. Without it, there
is no guarantee that a pipeline would handle differential data rates
appropriately: the author of each link in the pipeline would have to get the
(often tricky) specifics of buffering and communicating the buffered status to
every other stream in the pipeline.
The story you get out of the box with Node streams does not end there.
One of the primary goals of Node is to be fast. Many of Node's streams are
backed by resources that operate by turning them "on" and turning them "off."
You can think of them like a rusty outdoor spigot. When they're on, they
constantly flow; when they're off, nothing comes out.
Turning a rusty spigot on or off takes considerable amounts of time.
Node's streams optimize for this: they attempt to minimize the number of times
a resource will be turned "off" due to backpressure. A stream that has been
told to "back off" will continue to fill up an internal buffer of items until
it's reached a highWaterMark of contents. This happens on both the
readable and writable side of streams.
A writable stream that receives a write while it's currently processing other
data will attempt to pause if highWaterMark is 0. If its highWaterMark is
greater than 0, it buffers the write as a "write request."
When a readable stream receives word from a destination writable that it should
back off from producing data, it won't stop pulling data from its underlying
resource until it hits highWaterMark.
This mitigates the effects of stream latency — that is, a stream that can
consume or produce the same rate of data as other points in the pipeline, but
offsets it in time by a given delay.
Understanding these concepts will help you understand how important events in
a streams lifecycle are communicated.
Events and methods

As mentioned before, streams operate on a substrate made up of another
asynchronous primitive; you could design your own streams using nothing but
node-style callbacks (it's been done before!)
Node's streams are based on EventEmitters by way of inheritance. Anyone with a
reference to an EventEmitter instance may attach listeners for a given
topic on the emitter, or they may emit an event on a topic with
some associated data. Early versions of streams encouraged authors to directly
use EventEmitter machinery. Unfortunately, this was like putting the on-switch
of a garbage disposer inside the drain with the garbage disposer: convenient
for the person installing the garbage disposer, okay for folks who were savvy
to the official garbage disposer activation tools, and very easy to get
disasterously wrong for everyone else.
Even in the newest outings of the Stream API, there are some vestiges of this
Little Shop of Horrors garbage disposal system. As such, you should avoid
calling emit on any stream instance, whether you control it or not. The
pipeline machinery is making use of that mechanism, and it's very easy to jam
that machinery up in novel (and more importantly, hard to diagnose!) ways by
emitting your own events.
Here are the event topics that are part of streams:


topics
PIPE
READABLE
WRITABLE
USER CAN EMIT


data
✅
✅


end
✅
✅


pause

✅


resume

✅


readable
✅
✅


drain
✅

✅


finish
✅

✅


prefinish


✅


pipe


✅


unpipe


✅


close
✅


error
✅


✅
topics	PIPE	READABLE	WRITABLE	USER CAN EMIT
data	✅	✅
end	✅	✅
pause		✅
resume		✅
readable	✅	✅
drain	✅		✅
finish	✅		✅
prefinish			✅
pipe			✅
unpipe			✅
close	✅
error	✅			✅