Given feedback and implementation experience, here are some thoughts filling in parts of the Jingle state machine related to when and how certain actions can be performed. These things are implied by the XEP once various rules are applied, but they aren't explicitly stated.
I've previously written down some other things not stated in the XEP
The XEP does call out that session-accept
implicitly also acts as a content-accept
. However, that principle also applies to session-initiate
and session-terminate
, even though the XEP does not make note of it.
This is an important detail for two reasons. First, it has a practical benefit to implementations. But second, it impacts the rules for when other actions may be used
The practical benefit is that processing the session-*
actions should happen in two phases, where one of those phases is simply reusing the matching content-*
handler.
Specifically:
session-initiate
is processed as first 'initiating' a session context, and then as acontent-add
to add content to the newly created session.session-accept
is processed as first 'accepting' the offered session context, and then as acontent-accept
to accept any pending contents.session-terminate
is processed as firstcontent-remove
(orcontent-reject
, as appropriate) all existing content, and then 'terminate' the session context.
This two-fold nature of session-initiate
has an impact on the rules for when a content-reject
may be used.
The XEP states that content-reject
is only used in reply to a content-add
. But how would a responder get rid of an unwanted content that was sent in the session-initiate
?
Should content-reject
or content-remove
be used?
I contend that sending a content-reject
is the appropriate action, since it is in response to the implicit content-add
contained in the session-initiate
.
However, as a pragmatic matter, an implementation would do well to handle receiving either action in this case.
Alternatively, it is up to the higher level app using Jingle to decide what "declining" means. The responder could choose to still accept the content, but set the content senders to none
to deactivate the unwanted content. To do so, it would send a content-modify
action setting senders
to none
, and then proceed with the session-accept
.
For example, an audio/video client that is responding to a session-initiate
that includes audio and video content might prefer to still accept the video content (after modifying the content senders), even though the user indicated they only want audio. By doing so, it is easier and faster to later activate video if the user decides to enable it, without needing additional negotiation round trips.
Again, this is an implementation choice made above the level of the Jingle engine's concerns, and depends on how the app intends to use the Jingle session to meet its UX needs.
Seriously, aren't they? Why do we have both? We only have the one session-terminate
instead of having a session-reject
and a session-end
. So why don't we have a single content-terminate
?
In all practical terms, an implementation is going to have a single getRidOfThisContent()
routine that gets used by the content-remove
, content-reject
, and session-terminate
action handlers.
However, there is a semantic reason for the remove/reject distinction. Jingle is defined so that it can be mapped to the Offer/Answer model, such as used by SIP. And in those terms, a content-remove
is always treated as an Offer, and a content-reject
is always treated as an Answer.
How valuable does that distinction actually turn out to be in practice? Not a lot unless you're working on a gateway translating Jingle to SIP, and you need it to be as stateless as possible. But that's the reason.
Now, to continue clarifying the rules:
- The side that is the content's creator will always use
content-remove
. - The side that is not the content's creator will use
content-reject
until the content has been accepted. After that,content-remove
will be used.
It is very tempting to state the above in terms of initiator and responder for the content, but the terms initiator
and responder
have very specific meaning in Jingle, and they are related to the session itself. That is, it can be the case that the content creator is the responder
side of the session.
The XEP states that, if after processing an incoming content-reject
or content-remove
there are no remaining contents, then the receiver should issue a session-terminate
. This is due to the stated condition that a session with no content is void.
In other words, the XEP's mandates are meant as a fallback to detect and resolve an invalid state. By implication, it is desirable to avoid creating the invalid state in the first place.
Thus, the side that triggers the last remaining content's removal or rejection should upgrade the action type to session-terminate
instead of content-remove
/content-reject
before signaling the request (along with performing any session shutdown/cleanup work). Otherwise, the sender would be left in an invalid state with no content, relying on the other side to resolve the issue by terminating.
This parallels the implied two stages of what receiving a session-terminate
means: content-remove
/content-reject
remaining contents, followed by session shutdown.
Content disposition at first glance appears to be a very minor curiosity in the XEP. Very little is explicity stated about how a content's disposition impacts the Jingle engine's behavior. However, the fact that dispositions exist implies a great deal.
By default, a content's disposition is session
. But what does that mean?
A session
disposition means that the content can only be active inside an active session context.
Take a few moments to reflect on that.
Why is it that we even need a session-accept
action at all? Why not just use content-accept
?
Because when the content disposition is session
, we have to accept the session first, and only then can we content-accept
the content.
Remember how session-accept
is a two-phase action, first accepting the overall session and then acting as a content-accept
. In fact, the XEP mandates that the session-accept
must only include contents with a session
disposition.
To state in other terms, it is invalid to content-accept
a content with a session
disposition before the session has been accepted.
Now, content with a different disposition can certainly get a content-accept
before the session is accepted. In fact, that is the purpose of the early-session
disposition defined in XEP-0269: Jingle Early Media.
So even though the state diagram in the Jingle XEP shows that you can use content-accept
before session-accept
, it can only be done for contents that do not have session
disposition. And remember, contents have session
disposition by default.
This has some additional knock-on effects. The session responder certainly can't use content-accept
(on content with session
disposition... you can see why this qualifier gets elided so often) before accepting the session. But neither can the session initiator.
So consider a session-initiate
with one content, for RTP audio. The responder would like to upgrade and include video for the session. How should the responder proceed?
It could send a content-add
for the video content first. But the initiator side would have to delay sending a content-accept
for it until after the session-accept
is received. That works, but puts the initiator in a very awkward holding pattern, especially in terms of how to present UX.
Instead, it would be better for the responder to send the session-accept
for just the audio content. And then send the content-add
for video.
On the other hand, there is not as much of a problem if the initiator were to content-add
a video content. In that case, the responder is able to session-accept
the entire lot as if it had been included in the session-initiate
. It does, however, present a UX timing issue: a well timed content-add
could make the call acceptance dialog in the responders UI change as the user attempts to click, and accept video without actual consent.
I would go so far as to advocate that the responder sending content-add
with a session
disposition content before session acceptance should be treated as an out-of-order error. I'm on the fence still if the same should be true for the initiator.
Session disposition also impacts the rules for when removing content should trigger a session-terminate
. We stated earlier that this happens when there are zero contents left.
That is still true, but more accurately it should be when there are zero contents with a session
disposition left. A session without any session
disposition contents is void, which is implied by the requirement for session-initiate
to include at least one.
Either side of a session may attempt to replace the transport for a content, at nearly any time.
It is only nearly any time because a transport-replace
action MUST be responded to by a transport-accept
or transport-reject
before another transport-replace
can be issued.
Why is this?
Jingle operates as a generic framework that treats transports as black boxes, with only a few bits of information for categorizing that black box, but not for identifying instances. Namely, a Jingle engine only knows the name of the transport type, and whether or not it can provide streaming (TCP-like) or datagram (UDP-like) sockets.
More importantly, while a transport replace is typically used to fallback to another connection method, it is still possible to replace one transport black box with another black box of the same type. This can be needed for cases where the original transport offer contains initial parameters that have subsequently expired or changed and need updating.
(Remember that the session can be left in a pending state for an indefinite period of time, only really limited by a human user's patience.)
As such, we need to be certain that a received response was made against the offered parameters. But a Jingle engine is blind to the internals of a transport's black box, so it can't distinguish between instances of the same type. Thus, we MUST wait for a transport-accept
/ transport-reject
before sending a new transport-replace
.
Likewise, a content-accept
action does not serve as an implicit transport-accept
, since we would not reliably know if the transport information in thecontent-accept
was made against the original offer, or the replacement.
Replacements can happen for two reasons: feature detection failed and interactive negotiation is now required, or connection establishment failed and a new transport is needed.
While the content offer recipient is free to perform a transport replacement, this should be avoided as much as possible.
Needing to initiate transport replacement because the offered transport type is not supported is a last-ditch failure resolution situation.
There is a difference between a transport that fails to ever connect and a transport that later loses connectivity. Mere loss of connectivity is not enough for the Jingle engine to blindly trigger a transport replacement.
Some Jingle content applications can survive an interruption in connection, making a replacement not necessary or even desireable immediately. For example, an RTP application will survive a connection interruption just fine. Likewise, an XML stream application that has enabled stream management can also resume if the transport manages to reconnect itself.
However, other applications such as file transfer do not have a way to cleanly resume after a connection loss. In those cases, the content needs to be ended entirely. For file transfer, a new content would be created to either start over or attempt a ranged transfer.
Thus, for connectivity loss, it is up to the content application to either wait for reconnection or request a replacement.
For a connection establishment failure, however, an automatic replacement should be done by the side that detects the failure.