Skip to content

Instantly share code, notes, and snippets.

@zelig

zelig/swatch.md Secret

Last active August 1, 2019 18:18
  • Star 10 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save zelig/74b3486bcd5523a0b61e12d804d3c00d to your computer and use it in GitHub Desktop.
swatch

swatch - streaming with adaptive transmission channels

Swatch is an open-ended loosely defined project targeting real time audio/video streaming for ethereum/swarm.

Technically speaking live streaming to multiple viewers is a multicast/broadcast. In order for viewers to tap into a stream the solution needs to offer a way for hosts to offer, advertise, launch and cancel sessions as well as for guests to search, accept, join and drop out from sessions. Secondly the solution should provide a way to efficiently stream data between nodes, notably -- in the one to many broadcast case - stream data towards multiple viewers.

webRTC

The first component in webRTC is done by a signalling server and is necessarily a centralised component. Swatch aims to replace this with a decentralised solution. The second component, while more complex from the point of view of the server-client based web 2.0 world, is actually straighforward deterministic routing between nodes.

Note that this does not solve the issue that direct peer-to-peer broadcast is hugely inefficient. There exist solutions that tackle this inefficiency essentially by turning some of the nodes into relaying nodes. Such a solution naturally offers itself in devp2p/swarm where p2p connections already involve intermediaries as forwarding nodes.

Another problem targeted by webRTC is firewalls preventing dial-in. These NAT traversal schemes are done with STUN and TURN. While NAT punching is also a problem for swarm we will assume that the techniques should be applied on the devp2p level where they need to be solved anyway independent of webRTC related media sharing.

To facilitate a piecemeal route to a richly featured decentralised solution, we initially consider using a high level webRTC framework/library (i.e., one that has high level code support for session creation, user media selection etc), put together a simple sample app where you create a room that you host and broadcast video/audio in. Viewers join this room by its id and follow the stream in realtime.

Under the hood webRTC uses a signaling server to negotiate a session, i.e., pass connection and session information between peers. Based on the host info that the peer receives for the session, they attempt at establishing a peer to peer connection which will transmit the video/audio data relating to the session.

This approach offers a roadmap to successively swap out ingredients of the webRTC solution and substitute swarm. Since in this ideal scenario both session logic and messaging/streaming logic is implemented we only need to swap out network communication. Though the signaling server needs negotiation logic implemented, apart from this it is sufficient to swap out the low level calls that do network transport. At least this is the ideal scenario if we find an appropriate library.

Implementing swatch using webRTC is useful even if we end up with a pure devp2p solution since it will offer a reasonable baseline for benchmarking.

Recording

Recording can be done in multiple ways. Muaz Kahn's webRTC experiments show a wealth of methods to add recording capability to a webRTC broadcast.q One solution would be to use a recorder on the host side and instead of writing to file, it would be piped to a socket then a swarm upload could be triggered to read from this socket. Alternatively, viewers could also record the viewed stream but in that case in order not to waste resources by recording multiple times, viewers need a consensus on who records the stream. This however is not ideal since the stream data is best available with the host, its publicity is best decided by the host and its storage is most often paid by the host (maybe financed from subscriber fees).

native solution

Alternatively we can take the logic for session negitiation and solve that internally using the same unicast/multicast messaging. Let us assume that session negotiation means the following:

  • broadcaster creates a room and offers others to join (not restricted)
  • the room id is advertised somehow (searchable by potential viewers)
  • anyone can request to view the room (read, listen, watch)

The primary originator of a broadcast creates a named room. Advertising the room id can be done on any external media (ie., published on whisper, blockchain, ENS or a swarm hosted dapp) and thus is outside the scope of discussion for now. When the session broadcast goes live, essentially the video/audio inputs specified by the user within the dapp gets associated with a channel. The channel is subscribed to by any peer interested in the broadcast. When a channel connection is established between peers, the upstream peer pushes stream data to this channel and relay it to downstream peers in chunks in real time.

[In particular the stream to be read by the pyramid chunker (no length given in advance) The chunker then sends the chunks read to a channel preserving their order.]

Requesting to join a live stream needs to be a distinct message type.

connect_channel(room_id, ...)

If a peer decides to join a room, it sends this channel connection message towards the known originator (to a peer in the kademlia row of the originator's address). When a client receives a channel connection request either the peer offers the channel or not. If a peer offers the channel with the id specific in the request, then the request is accepted and a channel with a stream id gets linked to the peer and allow transmitting a contiguous stream of chunks. If the peer does not know about the channel with the id, the request is forwarded towards the known originator.

These chunks in the channel are cached if pausing the stream or jumping/winding back is required. More on non-live modes of operation below.

Channel requests can potentially be in the future and can help nodes organise to connect in ways that make the broadcast more efficient.

Chunks of a recorded stream can always be passed on, so these channels behave exactly the same way as synchronisation channels corresponding to kademlia rows.

A peer can establish a channel connection to the same stream with multiple peers to help reduced latency.

If the client can have too upstream connections channelling the same chunks, we could apply a two step process analogous to the that used in synchronisation. In the first step batches of chunk hashes are shown to the downstream peer and it responds by filtering out the ones they already have. For live experience, this probably best done chunk by chunk (batch size is 1) but it is unclear if this is will bring real improvement.

Similarly to incentivised synchronisation, we can assume that once a peer establishes a channel connection, it commits to keep downloading all the chunks in that channel. As a result, nodes downstream can challange the peer for a missing chunk.

Since there is no guarantees that chunks shown upstream but not downloaded can still be obtained later, Nodes offering the channel need to continuously download all the chunks in the channel. They always download from upstream before they are advertised downstream.

The profitability of a channel will then hinge on how many nodes we end up relaying chunks to and what percentage of the chunks downloaded we can pass on.

As long as a channel is profitable, accepting new downstream subscription can only increase profit. Conversely, subscribing to an upstream peer is costless for a client node. As a consequence, channel connections will only be dropped if upstream has no profit in the business or the downstream peer is dissatisfied with the throughput.

    X
 ___|___
 |      |
 |      |
 A      B
 |      |
 |      |
 \______/
    |
    Y

Since in these configutations A and B can never profit, Y's request will be passed up to X directly. X, upon receiving join request originating from a non-connected node. One consequence of this strategy is that it converges to multicast tree with ideal properties.

If only nodes that are viewers themselves participate as channel providers, and only one channel connection is used, then the graph of channel connections is isomorphic to the compacted binary trie map of the MSB-XOR distances of viewers to the broadcaster. Compactness means that the trie node is posited only if there are 2 paths with the node as common prefix. Or analogously: intermediate nodes are used only if they have channel connections to both continuations of their common prefix with the originator.

This structure

  • minimises connections given the boundary condition that each node is reachable.
  • limits channel connections to 2
  • number of hops is proportional to logarithmic distance (further nodes, more hops)

The other connectivity graph that minimises hops is when each node directly connects to the host (broadcaster). This corresponds to webRTC default. This minimum hop graph is surely unable to scale.

Channel providers will accept join requests from a peer that is the first in the kademlia row. Channel providers can report back latencies, if after this latencies are not acceptable, peers are rejected. Rejected peers are sent the closest node(s) in the corresponding kademlia proximity bin. This strategy is able to transform the graph with one-to-many direct connections into the trie.

The idea here is that peers following the simple strategy of trying to join the originator (and then follow the peer suggestions) will result in a multicast tree where hop count is minimised and throughput is maximised. So in a one-to-one broadcast the peers will most likely end up directly connecting, but as the number of viewers grow the scalable relaying multicast trees emerge.

Adaptive bitrate

In live mode of operation, syncronising the channeled stream is done by the upstream peer. Syncronising here means that when a chunk is pushed through the channel, its latency is checked against a tolerance threshold. If the allowed maximum latency is surpassed, the chunk is dropped. After some monitoring the encoding rate is adjusted to the estimated throughput. [If the channel has to wait for the next chunk to send, throughput capacity is not fully used, so the rate is upped] In the simplest implementation, encoding rate is just adjusted by dropping every one out of n chunks. Altenatively the codec's resolution can be adjusted: high definition encoding results in longer chunks or more chunks per time unit. Ideally the streaming UX supports an adaptive bitrate steaming standard such as M-DASH or RTSP.

Only live mode channels are considered as candidate upstream peers.

If downstream peer specifies a minimum resolution, the channel can go into non-live modes of operation where latency is (potentially contiunually) increasing with time.

Non-live modes of operation

More complex non-live modes of operation include skipping back in the stream. Recall that the originator reads the stream into the chunker. The pyramid chunker allows chunking infinite streams, i.e., if builds up the chunk tree from bottom up. The chunker keeps track of a series of top nodes covering the entire stream relayed so far. The host may choose to package up these intermediate chunks in a manifest and publish it to swarm. The root hash of the current manifest is relayed as part of channel metadata or registered on ENS. After retrieving the root manifest, random access retrieval using range queries is available in the usual way, so skipping back in the stream is possible. Note that this mode of access is independent of the live channel running. The UX of players is supposed to offer the user a way to switch back to live mode. Note that viewers can choose to record the stream by letting their local pyramid chunker read from the channel (the pyramid chunker should offer this mode of operation where the data comes prechunked). Using local chunker will result in all intermediate chunks are accessible without recourse to network retrievals, therefore skipping back in the stream is available without delay.

The pyramid chunker keeps building up the chunk tree as the stream is read. When reading from the stream fails with EOF, ie., when the stream is stopped or paused, the data has a length, the pyramid can be built up to the top and the root hash calculated. Once all the intermediate chunks have been sent to the swarm, the stream can be considered archived. Once the root hash is published, the stream can be accessed just like any asset uploaded.

If the broadcaster closes the room, the channel connection needs to be terminated. This is indicated downstream by sending a special message (or a zero-size chunk).

The ideas above are fully compatible with obfuscation and/or encryption, so we regard access control and privacy issues as orthogonal to the solution of streaming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment