Skip to content

Instantly share code, notes, and snippets.

@legastero
Last active January 8, 2020 21:22
Show Gist options
  • Save legastero/df64fcf988151bc7f82c49b5717f48e7 to your computer and use it in GitHub Desktop.
Save legastero/df64fcf988151bc7f82c49b5717f48e7 to your computer and use it in GitHub Desktop.
Thoughts and Clarifications on Jingle Actions

Thoughts and Clarifications on Jingle Actions

Given feedback and implementation experience, here are some thoughts filling in parts of the Jingle state machine related to when and how certain actions can be performed. These things are implied by the XEP once various rules are applied, but they aren't explicitly stated.

I've previously written down some other things not stated in the XEP

session-* actions are two-for-one deals

The XEP does call out that session-accept implicitly also acts as a content-accept. However, that principle also applies to session-initiate and session-terminate, even though the XEP does not make note of it.

This is an important detail for two reasons. First, it has a practical benefit to implementations. But second, it impacts the rules for when other actions may be used

The practical benefit is that processing the session-* actions should happen in two phases, where one of those phases is simply reusing the matching content-* handler.

Specifically:

  • session-initiate is processed as first 'initiating' a session context, and then as a content-add to add content to the newly created session.
  • session-accept is processed as first 'accepting' the offered session context, and then as a content-accept to accept any pending contents.
  • session-terminate is processed as first content-remove (or content-reject, as appropriate) all existing content, and then 'terminate' the session context.

Declining content from a session-initiate

This two-fold nature of session-initiate has an impact on the rules for when a content-reject may be used.

The XEP states that content-reject is only used in reply to a content-add. But how would a responder get rid of an unwanted content that was sent in the session-initiate?

Should content-reject or content-remove be used?

I contend that sending a content-reject is the appropriate action, since it is in response to the implicit content-add contained in the session-initiate.

However, as a pragmatic matter, an implementation would do well to handle receiving either action in this case.

Or not

Alternatively, it is up to the higher level app using Jingle to decide what "declining" means. The responder could choose to still accept the content, but set the content senders to none to deactivate the unwanted content. To do so, it would send a content-modify action setting senders to none, and then proceed with the session-accept.

For example, an audio/video client that is responding to a session-initiate that includes audio and video content might prefer to still accept the video content (after modifying the content senders), even though the user indicated they only want audio. By doing so, it is easier and faster to later activate video if the user decides to enable it, without needing additional negotiation round trips.

Again, this is an implementation choice made above the level of the Jingle engine's concerns, and depends on how the app intends to use the Jingle session to meet its UX needs.

But aren't content-remove and content-reject identical?

Seriously, aren't they? Why do we have both? We only have the one session-terminate instead of having a session-reject and a session-end. So why don't we have a single content-terminate?

In all practical terms, an implementation is going to have a single getRidOfThisContent() routine that gets used by the content-remove, content-reject, and session-terminate action handlers.

However, there is a semantic reason for the remove/reject distinction. Jingle is defined so that it can be mapped to the Offer/Answer model, such as used by SIP. And in those terms, a content-remove is always treated as an Offer, and a content-reject is always treated as an Answer.

How valuable does that distinction actually turn out to be in practice? Not a lot unless you're working on a gateway translating Jingle to SIP, and you need it to be as stateless as possible. But that's the reason.

Now, to continue clarifying the rules:

  • The side that is the content's creator will always use content-remove.
  • The side that is not the content's creator will use content-reject until the content has been accepted. After that, content-remove will be used.

It is very tempting to state the above in terms of initiator and responder for the content, but the terms initiator and responder have very specific meaning in Jingle, and they are related to the session itself. That is, it can be the case that the content creator is the responder side of the session.

A content-reject or content-remove of the last content should be upgraded to session-terminate

The XEP states that, if after processing an incoming content-reject or content-remove there are no remaining contents, then the receiver should issue a session-terminate. This is due to the stated condition that a session with no content is void.

In other words, the XEP's mandates are meant as a fallback to detect and resolve an invalid state. By implication, it is desirable to avoid creating the invalid state in the first place.

Thus, the side that triggers the last remaining content's removal or rejection should upgrade the action type to session-terminate instead of content-remove/content-reject before signaling the request (along with performing any session shutdown/cleanup work). Otherwise, the sender would be left in an invalid state with no content, relying on the other side to resolve the issue by terminating.

This parallels the implied two stages of what receiving a session-terminate means: content-remove/content-reject remaining contents, followed by session shutdown.

Content disposition is a critical detail

Content disposition at first glance appears to be a very minor curiosity in the XEP. Very little is explicity stated about how a content's disposition impacts the Jingle engine's behavior. However, the fact that dispositions exist implies a great deal.

By default, a content's disposition is session. But what does that mean?

A session disposition means that the content can only be active inside an active session context.

Take a few moments to reflect on that.

Why is it that we even need a session-accept action at all? Why not just use content-accept?

Because when the content disposition is session, we have to accept the session first, and only then can we content-accept the content.

Remember how session-accept is a two-phase action, first accepting the overall session and then acting as a content-accept. In fact, the XEP mandates that the session-accept must only include contents with a session disposition.

To state in other terms, it is invalid to content-accept a content with a session disposition before the session has been accepted.

Now, content with a different disposition can certainly get a content-accept before the session is accepted. In fact, that is the purpose of the early-session disposition defined in XEP-0269: Jingle Early Media.

So even though the state diagram in the Jingle XEP shows that you can use content-accept before session-accept, it can only be done for contents that do not have session disposition. And remember, contents have session disposition by default.

This has some additional knock-on effects. The session responder certainly can't use content-accept (on content with session disposition... you can see why this qualifier gets elided so often) before accepting the session. But neither can the session initiator.

So consider a session-initiate with one content, for RTP audio. The responder would like to upgrade and include video for the session. How should the responder proceed?

It could send a content-add for the video content first. But the initiator side would have to delay sending a content-accept for it until after the session-accept is received. That works, but puts the initiator in a very awkward holding pattern, especially in terms of how to present UX.

Instead, it would be better for the responder to send the session-accept for just the audio content. And then send the content-add for video.

On the other hand, there is not as much of a problem if the initiator were to content-add a video content. In that case, the responder is able to session-accept the entire lot as if it had been included in the session-initiate. It does, however, present a UX timing issue: a well timed content-add could make the call acceptance dialog in the responders UI change as the user attempts to click, and accept video without actual consent.

I would go so far as to advocate that the responder sending content-add with a session disposition content before session acceptance should be treated as an out-of-order error. I'm on the fence still if the same should be true for the initiator.

When there are zero session disposition contents

Session disposition also impacts the rules for when removing content should trigger a session-terminate. We stated earlier that this happens when there are zero contents left.

That is still true, but more accurately it should be when there are zero contents with a session disposition left. A session without any session disposition contents is void, which is implied by the requirement for session-initiate to include at least one.

Transport replacements

Either side of a session may attempt to replace the transport for a content, at nearly any time.

Only one replacement at a time

It is only nearly any time because a transport-replace action MUST be responded to by a transport-accept or transport-reject before another transport-replace can be issued.

Why is this?

Jingle operates as a generic framework that treats transports as black boxes, with only a few bits of information for categorizing that black box, but not for identifying instances. Namely, a Jingle engine only knows the name of the transport type, and whether or not it can provide streaming (TCP-like) or datagram (UDP-like) sockets.

More importantly, while a transport replace is typically used to fallback to another connection method, it is still possible to replace one transport black box with another black box of the same type. This can be needed for cases where the original transport offer contains initial parameters that have subsequently expired or changed and need updating.

(Remember that the session can be left in a pending state for an indefinite period of time, only really limited by a human user's patience.)

As such, we need to be certain that a received response was made against the offered parameters. But a Jingle engine is blind to the internals of a transport's black box, so it can't distinguish between instances of the same type. Thus, we MUST wait for a transport-accept / transport-reject before sending a new transport-replace.

Likewise, a content-accept action does not serve as an implicit transport-accept, since we would not reliably know if the transport information in thecontent-accept was made against the original offer, or the replacement.

When to replace a transport?

Replacements can happen for two reasons: feature detection failed and interactive negotiation is now required, or connection establishment failed and a new transport is needed.

Use service discovery!

While the content offer recipient is free to perform a transport replacement, this should be avoided as much as possible.

Needing to initiate transport replacement because the offered transport type is not supported is a last-ditch failure resolution situation.

Replace on the side that detects failure

There is a difference between a transport that fails to ever connect and a transport that later loses connectivity. Mere loss of connectivity is not enough for the Jingle engine to blindly trigger a transport replacement.

Some Jingle content applications can survive an interruption in connection, making a replacement not necessary or even desireable immediately. For example, an RTP application will survive a connection interruption just fine. Likewise, an XML stream application that has enabled stream management can also resume if the transport manages to reconnect itself.

However, other applications such as file transfer do not have a way to cleanly resume after a connection loss. In those cases, the content needs to be ended entirely. For file transfer, a new content would be created to either start over or attempt a ranged transfer.

Thus, for connectivity loss, it is up to the content application to either wait for reconnection or request a replacement.

For a connection establishment failure, however, an automatic replacement should be done by the side that detects the failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment