Skip to content

Instantly share code, notes, and snippets.

@murillo128
Last active April 4, 2016 15:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save murillo128/d9da72ef76df26d2fde848a265c46fc7 to your computer and use it in GitHub Desktop.
Save murillo128/d9da72ef76df26d2fde848a265c46fc7 to your computer and use it in GitHub Desktop.
Single RTP packet stream (SSRC) per RTCRTPSender/RTCRTPReceiver

#Rationale IMHO it is quite difficult to understand what are the RTCRtpSender / RTCRtpReceiver.

  1. Overview

In the figure above, the RTCRtpSender (Section 5) encodes the track provided as input, which is transported over a RTCDtlsTransport

5.1 Overview

The RTCRtpSender includes information relating to the RTP sender.

5.1 Overview

An RTCRtpSender instance is associated to a sending MediaStreamTrack and provides RTC related methods to it.

So, the sender, for example, is a generic object that takes a media track, and generates all short of RTP packets that you send to another peer. It can host one or several encoders, send one or more ssrcs, supporting simulcast and svc.

In that regards, it is quite similar to an RTCPeerconnection, but instead of using a SDP blob, you pass a kind of json-version of an m-line to the and the RTCRtpSender will do it's best to send all you want to send (given that you have correctly discovered all the restrictions to not cause an InvalidParameters exception).

So instead of matching a RTP-world object, as DTLS and ICETransport does, it is a kind of catch-all black-box object.

The RTCRtpReceiver shares same complexity, supporting receiving multiple ssrcs streams and payload types as they share the same RTCRtpParameters dictionary for setting up sending and receiving. It was recently suggested (and I agree with) that ORTC start off by supporting the WebRTC 1.0 simulcast model, which involves sending multiple streams, but receiving only one.

That implies that and RTCRtpReceiver will only receive one RTP packet stream (one ssrc) with one or more payolads (OPUS+dtmf for example). With this change, we can narrow up the definition of an RTCRtpSender and describe it as the object that handles the reception of a rtp packet stream. Again, IMHO, that makes much sense and maps concept in the draft-ietf-rtcweb-rtp-usage.

This proposal takes this idea further, and applies the same concept to the RTCRtpSender. Instead of allowing multiple rtp packets stream to be handled by a RTCRtpSender, we only allow one RTCRtpSender to produce a single rtp packet stream (ssrc).

Now we have a one to one relationship between an RTCRtpSender, an RTCRtpReceiver and a media RTP packet stream.

MediaTrack === > RTCRtpSender ========(single rtp packet stream - SSSRC)===> RTCRtpReceiver ===> MediaTrack

In that regards, following the m-line analogy, it would represent one ssrc-group<media,rtx,fec>.

Simulcast and SVC is also supported (check below).

#Benefits

  • Improve RTCRtpSender/RTCRtpReceiver definitions
  • Cleaner and simpler APIs
  • Make it harder to have parameter inconsistency
  • Provide a single and straight forward way of using the API. Given a DTLS/ICE/RTP stream architecture there is only a single way of implementing it in ORTC.

#Proposal

In order to be the less disruptive with the changes we have made only the following changes to the current API:

  • The main change is to move the ssrc, fec and rtx definition from the encodings to the rtp parameters.
  • Add RTCRtpCodecRTXParameters associated to each RTCRtpCodecParameters to support rtx apt issue (this change can be also be implemented standalone without the rest of the changes)
  • Removed the codec sequence from the parameters and move to the encoding one. This change could be removed, although we believe it is important in sake of clarity (more of it later)
//New dictionary
dictionary RTCRtpCodecRTXParameters {
             payloadtype               payloadType;
             unsigned long             rtxtime;
};

dictionary RTCRtpCodecParameters {
             DOMString                 name;
             payloadtype               payloadType;  
             unsigned long             clockRate;
             unsigned long             maxptime;
             unsigned long             ptime;
             unsigned long             numChannels;
             sequence<RTCRtcpFeedback> rtcpFeedback;
             Dictionary                parameters;
             RTCRtpCodecRTXParameters  rtx;                       // NEW: rtx.payloadType
};

//Not changed, just added here for completeness
dictionary RTCRtpRtxParameters {
             unsigned long ssrc;
             payloadtype   payloadType;
};

//Not changed, just added here for completeness
dictionary RTCRtpFecParameters {
             unsigned long ssrc;
             DOMString     mechanism;
};

dictionary RTCRtpParameters {
             DOMString                                 muxId = "";
             unsigned long                             ssrc;        //media ssrc         - moved from encodings
             RTCRtpFecParameters                       fec;         //includes fec.ssrc  - moved from encodings    
             RTCRtpRtxParameters                       rtx;         //includes rtx.ssrc  - from encodings
             sequence<RTCRtpHeaderExtensionParameters> headerExtensions;
             sequence<RTCRtpEncodingParameters>        encodings;
             RTCRtcpParameters                         rtcp;
             RTCDegradationPreference                  degradationPreference = "balanced";
             //Removed codecs sequence
};

dictionary RTCRtpEncodingParameters {
             RTCRtpCodecParameters codec;             // Moved from parameters
             RTCPriorityType       priority;
             unsigned long         maxBitrate;
             double                minQuality = 0;
             double                resolutionScale;
             double                framerateScale;
             unsigned long         maxFramerate;
             boolean               active = true;
             DOMString             encodingId;
             sequence<DOMString>   dependencyEncodingIds;
             //Removed ssrc fec rtx
};

#Impact analisis

Normal use case (1 sender, 1 receiver, 1 media codec)

As we have removed the sequence of RTCRtpCodecParameters from the parameters, it is required to pass that information in the encodings attributes. So the automatic process that is performed internally by the RTCRtpSender in the current version for this case is not possible:

the browser behaves as though a single encodings[0] entry was provided, with encodings[0].ssrc set to a browser-determined value, encodings[0].active set to "true", encodings[0].codecPayloadType set to codecs[j].payloadType where j is the index of the first codec that is not "cn", "dtmf", "red", "rtx", or a forward error correction codec, and all the other parameters.encodings[0] attributes unset.

However note that in the specification, all the examples uses the following helper function that perform the required steps:

RTCRtpParameters function myCapsToSendParams(RTCRtpCapabilities sendCaps, RTCRtpCapabilities remoteRecvCaps) {
  // Function returning the sender RTCRtpParameters, based on the local sender and remote receiver capabilities.
  // The goal is to enable a single stream audio and video call with minimum fuss.
  //
  // Steps to be followed:
  // 1. Determine the RTP features that the receiver and sender have in common.
  // 2. Determine the codecs that the sender and receiver have in common.
  // 3. Within each common codec, determine the common formats, header extensions and rtcpFeedback mechanisms.
  // 4. Determine the payloadType to be used, based on the receiver preferredPayloadType.
  // 5. Set RTCRtcpParameters such as mux to their default values.
  // 6. Return RTCRtpParameters enablig the jointly supported features and codecs.
}

Note that while that filling the encoding with the first media supported codec is done, it is still needed to process the rtp features (mux, feedback and header extensions) in order to create a compatible encoding parameters.

Simulcast

From RFC 7656

3.6.  Simulcast

   A media source represented as multiple independent encoded streams
   constitutes a simulcast [SDP-SIMULCAST] or Modification Detection
   Code (MDC) of that media source.  Figure 8 shows an example of a
   media source that is encoded into three separate simulcast streams,
   that are in turn sent on the same media transport flow.  When using
   simulcast, the RTP streams may be sharing an RTP session and media
   transport, or be separated on different RTP sessions and media
   transports, or be any combination of these two.  One major reason to
   use separate media transports is to make use of different quality of
   service (QoS) for the different source RTP streams.  Some
   considerations on separating related RTP streams are discussed in
   Section 3.12.

                            +----------------+
                            |  Media Source  |
                            +----------------+
                     Source Stream  |
             +----------------------+----------------------+
             |                      |                      |
             V                      V                      V
    +------------------+   +------------------+   +------------------+
    |  Media Encoder   |   |  Media Encoder   |   |  Media Encoder   |
    +------------------+   +------------------+   +------------------+
             | Encoded              | Encoded              | Encoded
             | Stream               | Stream               | Stream
             V                      V                      V
    +------------------+   +------------------+   +------------------+
    | Media Packetizer |   | Media Packetizer |   | Media Packetizer |
    +------------------+   +------------------+   +------------------+
             | Source               | Source               | Source
             | RTP                  | RTP                  | RTP
             | Stream               | Stream               | Stream
             +-----------------+    |    +-----------------+
                               |    |    |
                               V    V    V
                          +-------------------+
                          |  Media Transport  |
                          +-------------------+

                Figure 8: Example of Media Source Simulcast

   The simulcast relation between the RTP streams is the common media
   source.  In addition, to be able to identify the common media source,
   a receiver of the RTP stream may need to know which configuration or
   encoding goals lay behind the produced encoded stream and its
   properties.  This enables selection of the stream that is most useful
   in the application at that moment.

The main point to take into consideration, is that each layer is provided by an independent encoder. So performance wise, it is irrelevant if one RTPRtpSender provides two encoding, or two RTCRtpSenders provides one encoding each.

So it is possible to cover all the use cases provided by the current spec, for example:

RTCRtpSender (track0)
 |
 +-----endoding[0] = {ssrc1,vp8,pt=96}
 +-----endoding[1] = {ssrc1,vp8,pt=97}
 +-----endoding[2] = {ssrc2,vp8,pt=98}

Will be equivalent to two streams attached to same media track, each one with the encodings for a single ssrc.

RTCRtpSender (track0,ssrc1)
|
+-----endoding[0] = {vp8,pt=96}
+-----endoding[1] = {vp8,pt=97}

RTCRtpSender (track0,ssrc2)
|
+-----endoding[0] = {vp8,pt=98}

Note that in first case, the payloads even if on different ssrcs, were required to have different payload types.

SVC

Also from RFC 7656


3.7.  Layered Multi-Stream

   Layered Multi-Stream (LMS) is a mechanism by which different portions
   of a layered or scalable encoding of a source stream are sent using
   separate RTP streams (sometimes in separate RTP sessions).  LMSs are
   useful for receiver control of layered media.

   A media source represented as an encoded stream and multiple
   dependent streams constitutes a media source that has layered
   dependencies.  Figure 9 represents an example of a media source that
   is encoded into three dependent layers, where two layers are sent on
   the same media transport using different RTP streams, i.e., SSRCs,
   and the third layer is sent on a separate media transport.

                            +----------------+
                            |  Media Source  |
                            +----------------+
                                    |
                                    |
                                    V
       +---------------------------------------------------------+
       |                      Media Encoder                      |
       +---------------------------------------------------------+
               |                    |                     |
        Encoded Stream       Dependent Stream     Dependent Stream
               |                    |                     |
               V                    V                     V
       +----------------+   +----------------+   +----------------+
       |Media Packetizer|   |Media Packetizer|   |Media Packetizer|
       +----------------+   +----------------+   +----------------+
               |                    |                     |
          RTP Stream           RTP Stream            RTP Stream
               |                    |                     |
               +------+      +------+                     |
                      |      |                            |
                      V      V                            V
                +-----------------+              +-----------------+
                | Media Transport |              | Media Transport |
                +-----------------+              +-----------------+

           Figure 9: Example of Media Source Layered Dependency

   It is sometimes useful to make a distinction between using a single
   media transport or multiple separate media transports when (in both
   cases) using multiple RTP streams to carry encoded streams and
   dependent streams for a media source.  Therefore, the following new
   terminology is defined here:

   SRST:  Single RTP stream on a Single media Transport

   MRST:  Multiple RTP streams on a Single media Transport

   MRMT:  Multiple RTP streams on Multiple media Transports

   MRST and MRMT relations need to identify the common media encoder
   origin for the encoded and dependent streams.  When using different
   RTP sessions (MRMT), a single RTP stream per media encoder, and a
   single media source in each RTP session, common SSRCs and CNAMEs can
   be used to identify the common media source.  When multiple RTP
   streams are sent from one media encoder in the same RTP session
   (MRST), then CNAME is the only currently specified RTP identifier
   that can be used.  In cases where multiple media encoders use
   multiple media sources sharing synchronization context, and thus have
   a common CNAME, additional heuristics or identification need to be
   applied to create the MRST or MRMT relationships between the RTP
   streams.

The main advantage with simulcast is that here a single instance of the encoder is able to serve multiple layers, improving performance compared to having several independent encoders.

This is supported in current spec by using the dependencyEncodingIds which allows the browser to correlate SVC layers so they can be provided by the same encoder:

dependencyEncodingIds of type sequence The encodingIds on which this layer depends. Within this specification encodingIds are permitted only within the same RTCRtpEncodingParameters sequence. In the future if MST were to be supported, then if searching within an RTCRtpEncodingParameters sequence did not produce a match, then a global search would be carried out.

Note that currently MST is not supported because the dependency search is only done inside of the encoders of an RTCRtpSender, and as RTCRtpSender is attached to a single transport, it is not possible to send a layer to different transports.

So in current version of ORTC spec, SRST and MRST are supported, but not MRMT. In new version, only SRST would be supported.

This limitation is artificial, as if the encodingId were globally unique, that search could be done across RTCRtpSender. That would mean that SRST, MRST and MRMT would be supported with this proposal.

RTCRtpSender (track0)
 |
 +-----endoding[0] = {ssrc1,vp9,pt=96,encodingId="track0-0"}
 +-----endoding[1] = {ssrc1,vp9,pt=97,encodingId="track0-1",dependencyEncodingIds=["track0-0"]}
 +-----endoding[2] = {ssrc2,vp9,pt=98,encodingId="track0-2",dependencyEncodingIds=["track0-0"]}

Will be equivalent to two streams attached to same media track, each one with the encodings for a single ssrc.

RTCRtpSender (track0,ssrc1)
|
+-----endoding[0] = {vp9,pt=96,encodingId="track0-0"}
+-----endoding[1] = {vp9,pt=97,encodingId="track0-1",dependencyEncodingIds=["track0-0"]}

RTCRtpSender (track0,ssrc2)
|
+-----endoding[0] = {vp9,pt=98,encodingId="track0-2",dependencyEncodingIds=["track0-0"]}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment