Skip to content

Instantly share code, notes, and snippets.

@iNPUTmice
Last active October 17, 2023 08:19
Show Gist options
  • Star 20 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save iNPUTmice/a28c438d9bbf3f4a3d4c663ffaa224d9 to your computer and use it in GitHub Desktop.
Save iNPUTmice/a28c438d9bbf3f4a3d4c663ffaa224d9 to your computer and use it in GitHub Desktop.

A/V calls in Conversations

Table of contents

Notes for users

Where is the call button?

The call button is in the toolbar, to the left hand side of the lock icon. There are three conditions that must be met in order for the call button to show up.

  • You and your contact need Conversations 2.8.0+ or another compatible client that supports A/V calls (Siskin, Beagle, Dino, Movim, …)
  • You must have each other in your contact lists (with mutual presence subscription)
  • The contact needs to be online. This is indicated by a colorized (not gray) send button. Depending on the recipients setup you might be able to send a regular message first to wake up the recipient’s device. (In the future Conversations might gain the ability to do this automatically. There is experimental support to remember and display the A/V compatibility of the last seen device when the contact is currently offline.)
  • Your contact must not have Tor enabled in Conversations. It is currently not possible to make calls when Tor is enabled as this would leak the IP address. Conversations does not announce the A/V capability when Tor is enabled which in turn hides the call button.

What does ‚App failure‘ mean?

App failure can mean one of two things. Either the XMPP client on the other end isn’t fully compatible to Conversations, or Conversations was unable to setup the native library that powers A/V calls. The former is generally more likely while the latter might happen if you got the Conversations APK from dubios sources or running Conversations on a strange device or emulator. In either case there is not a lot you can do aside from reporting the issue in our community channel.

What does ‚Unable to connect call‘ mean?

A/V calls require a direct connection between the two participating devices (peer-to-peer). ‚Unable to connect call‘ means that one or both participants are in a network that prevents direct connections. Your server might be able to assist your device with establishing that connection regardless. To do that your server needs to support ‚XEP-0215: External Service Discovery‘. You can find out if your server supports that by going into your account details (either by tapping your own avatar or by going through ‚Manage accounts‘), and than selecting ‚Server Info‘ from the overflowmenu. If your server doesn’t support this you might want to contact your provider. If you run your own server see instructions below. Both your server and your contact’s server need to support this.

Notes for server admins

Most calls will require server side assistence. Depending on the network it might be enough to have a STUN server for Conversations to learn your external IP and punch a hole in the NAT. On some, more restrictive, networks however this isn’t enough and Conversations will need a TURN server. TURN servers are used to proxy the entire (encrypted) traffic through the server. (In my initial testing this was often the case when mobile networks were involved.)

To ensure best possible user experience in all situations server admins should set up both.

Conversations will use XEP-0215: External Service Discovery to learn about server-provided STUN/TURN servers, and, in case of TURN, also get short term, temporary credentials to access the TURN server.

Instructions for specific servers

ejabberd

Ejabberd 20.04 has support for XEP-0215 and has also a STUN and TURN server built in. Take a look a the new sample config (Particularly the module: ejabberd_stun section in the listen section and mod_stun_disco.)

Prosody

You will need to install a STUN/TURN server. We recommend coturn or eturnal. Then configure and enable mod_turn_external in Prosody. A full guide can be found on Prosody's site: Audio/video calls with Prosody using a STUN/TURN server.

If the call button shows in the app but calls are unreliable, you can check your TURN server is working with the prosodyctl check turn command. See the section 'Testing your setup' for more info.

Using a public TURN service

If you don't want to self-host, and are okay with using a third-party service (all call data is encrypted, though it will see your IP address), you can use the free TURN service provided by openrelay.metered.ca.

-- Enable mod_turn_external and set the following options:
turn_external_host = "staticauth.openrelay.metered.ca"
turn_external_port = 80
turn_external_secret = "openrelayprojectsecret"
~80% solution with public STUN server

We strongly recommend that you set up your own STUN/TURN server (see above). If you can’t do that for whatever reason (firewall, resource constraints, lack of time, …) you can use Prosody’s mod_extdisco to point to a public STUN server. That should make A/V calls work in ~80% of cases (usually WiFi to WiFi) with minimal configuration required and with the down side of leaking IP addresses to the operator of the STUN server. The configuration looks like this:

modules_enabled = {
    -- other modules ...
    "external_services";
}

external_services = {
    {
        type = "stun";
        transport = "udp";
        host = "stun.conversations.im";
        port = "443";
    }
}
CSI

This is rare, but if you don’t get any notifications for incoming calls make sure that your CSI module is up to date. Some older versions did not recognise incoming call requests as high priority.

OpenFire

You need an external STUN/TURN server like coturn or eturnal and the External Service Discovery plugin. More information can be found here.

UDP vs TCP vs TLS

STUN and TURN can operate over three different protocols. UDP, TCP and TCP/TLS. The latter is indicated by the stuns and turns URI scheme. Using TLS does not increase security as calls are always end-to-end encrypted with DTLS-SRTP. On the down side using TCP or TLS instead of UDP might negatively impact latency and performance. The only benefit of using TURN over TLS on port 443 is that you have a higher chance of passing through restrictive firewalls. However this should only be a fallback and not the default connection mechanism.

As a general recommendation we advise you to announce the following services over XEP-0215:

  • STUN over UDP
  • TURN over UDP
  • TURNS over TLS on port 443 (requires extra IP on the server)

Ideally those three variants should exist both on IPv4 and IPv6 for a total of 6 variants.

Testing

The compliance tester will check if that discovery mechanism is working; however that doesn’t necessarily mean that TURN and STUN themselves are setup correctly. To test this you should get two Android phones and put them into different networks; connect one of them to your computer via USB and use adb logcat. Here are the lines two look out for:

Firstly if you grep for ICE with adb -d logcat -v time -s conversations | grep ICE you should see log entries like this:

04-18 12:57:59.373 D/conversations(13867): alice@example.com: discovered ICE Server: [stun:89.238.78.51:443?transport=udp] [:] [TLS_CERT_POLICY_SECURE] [] [null] [null]
04-18 12:57:59.373 D/conversations(13867): alice@example.com:: discovered ICE Server: [turn:89.238.78.51:443?transport=udp] [1587211080:a781616cb9061724:F17BHTfLXyxzOyWSEutjpmzlCrs=] [TLS_CERT_POLICY_SECURE] [] [null] [null]
04-18 12:57:59.374 D/conversations(13867): alice@example.com:: discovered ICE Server: [stun:89.238.78.51:443?transport=tcp] [:] [TLS_CERT_POLICY_SECURE] [] [null] [null]
04-18 12:57:59.374 D/conversations(13867): alice@example.com:: discovered ICE Server: [turn:89.238.78.51:443?transport=tcp] [1587211080:a781616cb9061724:F17BHTfLXyxzOyWSEutjpmzlCrs=] [TLS_CERT_POLICY_SECURE] [] [null] [null]

This means Conversations has been able to discover the servers (you should see at least one line with stun and one line with turn.

Secondly when the call connects and you grep for candidate with adb -d logcat -v time -s conversations | grep candidate you should see lines like this:

04-18 12:57:59.584 D/conversations(13867): received candidate: audio:0:candidate:2431496480 1 udp 41754623 89.238.78.51 63631 typ relay raddr 94.134.91.66 rport 22965 generation 0 ufrag JRB3::UNKNOWN
04-18 12:57:59.592 D/conversations(13867): received candidate: audio:0:candidate:3731770832 1 udp 24977151 89.238.78.51 61571 typ relay raddr 94.134.91.66 rport 22966 generation 0 ufrag JRB3::UNKNOWN
04-18 12:57:59.592 D/conversations(13867): sending candidate: audio:0:candidate:2765706476 1 udp 2122194687 10.255.12.234 42571 typ host generation 0 ufrag xJI/ network-id 3 network-cost 900::UNKNOWN
04-18 12:57:59.641 D/conversations(13867): sending candidate: audio:0:candidate:842163049 1 udp 1685987071 2.247.248.234 31910 typ srflx raddr 10.255.12.234 rport 42571 generation 0 ufrag xJI/ network-id 3 network-cost 900:stun:89.238.78.51:443:UNKNOWN
04-18 12:57:59.691 D/conversations(13867): sending candidate: audio:0:candidate:826779982 1 tcp 1518283007 2a02:303e:5014:d2d0:3188:ad11:a0db:35b6 9 typ host tcptype active generation 0 ufrag xJI/ network-id 4 network-cost 900::UNKNOWN
04-18 12:57:59.700 D/conversations(13867): sending candidate: audio:0:candidate:2431496480 1 udp 41820159 89.238.78.51 62645 typ relay raddr 2.247.248.234 rport 31910 generation 0 ufrag xJI/ network-id 3 network-cost 900:turn:89.238.78.51:443?transport=udp:UNKNOWN
04-18 12:57:59.742 D/conversations(13867): sending candidate: audio:0:candidate:3731770832 1 udp 25042687 89.238.78.51 62032 typ relay raddr 2.247.248.234 rport 6483 generation 0 ufrag xJI/ network-id 3 network-cost 900:turn:89.238.78.51:443?transport=tcp:UNKNOWN

typ srflx means STUN. typ relay means TURN. If you see entries with relay coming up that is already an OK sign. However the only true tell is if you get lines like:

04-18 12:57:59.846 D/conversations(13867): remote candidate selected: :-1:candidate:842163049 1 udp 1685921535 94.134.91.66 22965 typ srflx raddr 192.168.178.39 rport 50732 generation 0 ufrag JRB3 network-cost 10::UNKNOWN
04-18 12:57:59.846 D/conversations(13867): local candidate selected: :-1:candidate:2431496480 1 udp 41820159 89.238.78.51 62645 typ relay raddr 2.247.248.234 rport 31910 generation 0 ufrag xJI/ network-id 3 network-cost 900::CELLULAR

were at least one of them is a typ relay with your TURN server. If the call connects but you only see host or srflx it just means that you lucked out on the network and your network didn’t need turn. (And that case you should try changing networks for a better testing enviroment.

If the call doesn’t connect at all it also means that your setup might not be correct.

Testing without Conversations

If you must test without Conversations the Trickle ICE test in the WebRTC samples might give you some indication. You will probably have to manually XEP-0215 services-query your XMPP server to get temporary TURN credentials. For optimal testing (though not perfect) you should run the Trickle ICE test in Chromium since Conversations and Chromium use the same WebRTC library and there might be subtle differences between Firefox and Google’s libwebrtc.

Prosody 0.12 also has a STUN/TURN testing tool built in.

Notes for developers

WebRTC vs libwebrtc vs Jingle

WebRTC is a stack standardized by the W3C. It's a family of protocols and codecs that all work together to enable peer to peer real time communication. Imagine you'd wanted to implment A/V calls from scratch: At every level of the stack you’d have the choice between multiple protocols. WebRTC tells you exactly what protocols to use.

libwebrt is one implementation of this stack. It's the same library used by Google Chrome. There are other libraries and building blocks that can be used to implement WebRTC. (pjsip, gstreamer, …)

Jingle is a signaling protocol on top of XMPP. While WebRTC defines what protocols to use the information of "I’m calling you and I’m listing on this IP address" - "I accept your call and I’m listing on this other IP address" still needs to be exchanged. That’s what Jingle is for. Jingle is somewhat equivalent to SIP.

Note: Jingle is not exclusively used for real time communication. It can also be used to signal: I want to share this file. It can also be used to set up real time communication that uses a different set of protocols than those defined by WebRTC. This means that just having Jingle support doesn’t automatically make two applications compatible.

XMPP

Conversations will display the call button if one of the contact’s connected clients supports all of the following namespaces:

  • urn:xmpp:jingle:1
  • urn:xmpp:jingle:transports:ice-udp:1
  • urn:xmpp:jingle:apps:rtp:1
  • urn:xmpp:jingle:apps:dtls:0
  • urn:xmpp:jingle:apps:rtp:audio and optionally also urn:xmpp:jingle:apps:rtp:video

Note: Conversations requires XEP-0115: Entity Capabilities to detect those features.

If any of the contact’s available resources annouce urn:xmpp:jingle-message:0 as a Disco feature Conversations will use XEP-0353: Jingle Message Initiation send to the bare jid of the contact to establishe a call. If not it will initialize the Jingle session directly. If you use Jingle Message Initiation make sure that you include the exact same descriptions that your following session-initiate will include as well. So for example putting only media="audio"in the propose but audio and video in the session-initiate the call will be rejected.

You will also need DTLS otherwise your call will be rejected.

Usually Conversations tries to put human readable text in the termination reasons. So if your call fails make sure to check those out. As far as the UI is concerned Conversations will display 'Unable to connect call' on network errors and 'application error' when something else goes wrong or the session got terminated. Usually this means the implementation are incompatible (again; check those <text>…</text> in the reason element. (Also the reason itself might be a hint; like getting security-error when DTLS is not sent.)

Obviously if you are the developer of another client and want to ensure inter-op feel free to contact me; You will most like already have my XMPP address. If not feel free to step by the Conversations channel.

Since version 2.9.8 Conversations shows a shield icon in the top left corner if it was able to verify the DTLS fingerprint with a preexisting OMEMO session. This vendor specific protocol is entirely optional (as in you don’t have to use it to make a succesfull call with Conversations) and described here.

@oksjd
Copy link

oksjd commented Apr 27, 2020

Thanks for implementing this feature! But what about calling in an IPv6-only environment? Previously I've been able to use a/v calling on ATalk (https://github.com/cmeng-git/atalk-android) without any special configuration on ejabberd server. Surely with IPv6 there's no need for STUN/TURN.

@iNPUTmice
Copy link
Author

It says most calls will require STUN or TURN. Not all. Calls within the same WiFi network or any call where you have globally routed addresses (this isn’t depended on IPv6) won’t need it.

@oksjd
Copy link

oksjd commented Apr 27, 2020

Thanks for the speedy reply and all the hard work! Just finished the translation of telephony related strings on Transifex. I'll be testing this when F-Droid version is out.

@2561024
Copy link

2561024 commented May 2, 2020

Are calls encrypted?

@iNPUTmice
Copy link
Author

Are calls encrypted?

yes

@2561024
Copy link

2561024 commented May 3, 2020

I have try use stunS and turnS with ejabberd, but this is do not wok, "conversations" support it?
discovered ICE Server: [turns:example.com:5349?transport=tcp] [:*****:***] [TLS_CERT_POLICY_SECURE] [] [null] [null]
discovered ICE Server: [stuns:example.com:5349?transport=tcp] [:] [TLS_CERT_POLICY_SECURE] [] [null] [null]
but if i setting only turns/stuns, without turn/stun -- calls do not work..

@iNPUTmice
Copy link
Author

Conversations up to and including 2.8.1 does not support turns. In general it is recommended to use turn instead of turns. TLS just adds unnecessary round trips and delay for now security gain. The media stream itself is encrypted independently of TLS.

@2561024
Copy link

2561024 commented May 3, 2020

Conversations up to and including 2.8.1 does not support turns. In general it is recommended to use turn instead of turns. TLS just adds unnecessary round trips and delay for now security gain. The media stream itself is encrypted independently of TLS.

Thank you. In future turns planned support?

@2561024
Copy link

2561024 commented May 3, 2020

And turn UDP do not wotk if "conversations" and my server in same network, with turn TCP it OK!

[ "conversations" ] ------- NAT ------- [ nat net ---- NAT server and ejabberd -- public ip ] -- --- iternet, other user "conersations" other server

if in this scheme ejabberd have only udp stun/turn - calls do not work, with stun/turn tcp - work

@kousu
Copy link

kousu commented May 5, 2020

This is amazing! Thank you for your work @iNPUTmice. I had no idea this was in the works.

@kousu
Copy link

kousu commented May 5, 2020

Will group calling work? MUC chats? Is Jitsi Meet compatibility in the roadmap?

@kousu
Copy link

kousu commented May 7, 2020

For any developers coming this way in the future I want to clarify that "supports all of the following namespaces" means XEP 0030 disco#info <feature> tags, and that Conversations also needs you to support XEP 0115 or else it won't query for your disco#info (-- this is actually specified as part of XEP 0115: "Clients should not engage in the older "disco/version flood" behavior and instead should use Entity Capabilities as specified herein"), and also there's a small typo in the list: rn:xmpp:jingle:1 should be urn:xmpp:jingle:1.

@iNPUTmice
Copy link
Author

Will group calling work? MUC chats? Is Jitsi Meet compatibility in the roadmap?

Group calling would require a server side component like Jitsi Meet and there is no interest on the side of the Jitsi Meet people to make this happen.

@kousu
Copy link

kousu commented May 7, 2020 via email

@mase76
Copy link

mase76 commented May 8, 2020

How to make sure, that CSI lets the XEP-353 messages through? I have Prosody with mod_csi and mod_csi_battery_saver. There are no config options.

@moppman
Copy link

moppman commented May 8, 2020

If your mod_csi_battery_saver is up to date, you're good. It has been patches recently, see https://hg.prosody.im/prosody-modules/rev/19c5bfc3a241

@mase76
Copy link

mase76 commented May 8, 2020

Updated the mod. Works now.
Thx!

@zhnikita
Copy link

https://gist.github.com/iNPUTmice/a28c438d9bbf3f4a3d4c663ffaa224d9#gistcomment-3281944
How audio-video calls encrypt?
Srtp, zrtp, etc?

iNPUTmice - thanks for your work! It's cool!!!

@2561024
Copy link

2561024 commented May 11, 2020

https://gist.github.com/iNPUTmice/a28c438d9bbf3f4a3d4c663ffaa224d9#gistcomment-3281944
How audio-video calls encrypt?
Srtp, zrtp, etc?

iNPUTmice - thanks for your work! It's cool!!!

"as calls are always end-to-end encrypted with DTLS-SRTP"

@sajeenthiran95
Copy link

Audio and video calls are work fine When mobile phones are in the same network,
but when mobile phones are in different network audio and video calls are not work,
we implement our own server using http://help.conversations.im/ this reference.
we can't find solution.

What is the solution to this?

Copy link

ghost commented Jun 23, 2021

I have coturn with prosody, seemingly configured entirely the way this article suggests.

However, with this message in the log, the connection fails:

19: handle_udp_packet: New UDP endpoint: local addr my.turn.ip.address:3478, remote addr 220.196.60.103:37900
19: session 128000000000000001: realm <my.turn.domain.name> user <>: incoming packet BINDING processed, success
19: session 128000000000000001: realm <my.turn.domain.name> user <>: incoming packet message processed, error 401: Unauthorized
19: IPv4. Local relay addr: my.turn.ip.address:55017
19: session 128000000000000001: new, realm=<my.turn.domain.name>, username=<1624544695>, lifetime=600
19: session 128000000000000001: realm <my.turn.domain.name> user <1624544695>: incoming packet ALLOCATE processed, success
20: session 128000000000000001: peer 10.145.142.243 lifetime updated: 300
20: session 128000000000000001: realm <my.turn.domain.name> user <1624544695>: incoming packet CREATE_PERMISSION processed, success
29: session 128000000000000001: realm <my.turn.domain.name> user <1624544695>: incoming packet BINDING processed, success
35: session 128000000000000001: refreshed, realm=<my.turn.domain.name>, username=<1624544695>, lifetime=0
35: session 128000000000000001: realm <my.turn.domain.name> user <1624544695>: incoming packet REFRESH processed, success
36: session 128000000000000001: usage: realm=<my.turn.domain.name>, username=<1624544695>, rp=159, rb=21264, sp=6, sb=624
36: session 128000000000000001: peer usage: realm=<my.turn.domain.name>, username=<1624544695>, rp=80, rb=7680, sp=153, sb=15300
36: session 128000000000000001: closed (2nd stage), user <1624544695> realm <my.turn.domain.name> origin <>, local my.turn.ip.address:3478, remote 220.196.60.103:37900, reason: allocation timeout
36: session 128000000000000001: delete: realm=<my.turn.domain.name>, username=<1624544695>
36: session 128000000000000001: peer 10.145.142.243 deleted

I replaced the public IP of my server and the domain name in the realm for privacy protection, but the realm is exactly the same as the domain name.

220.196.60.103 is my phone's public IP, as discovered by STUN, and this is what the provider NATs the local ip (10..4.31.54) to.
10.145.2.243 is the ip of the person I am calling, apparently, it is inside his provider's NAT zone.

Any ideas why it may fail to connect?

Copy link

ghost commented Jun 24, 2021

Using a different TURN server (reTurnServer), I am getting almost identical errors:

INFO | 20210624-044033.282 | reTurnServer | RETURN | 0x42533392240 | RequestHandler.cxx:245 | Received Request with no Message Integrity. Sending 401. Sender=[UDP 36.18.235.235:46478]
INFO | 20210624-044033.489 | reTurnServer | RETURN | 0x42533392240 | UdpServer.cxx:141 | UdpServer: received retransmission of request with tid: 1118048801:1113077049:1244362341:859076172
INFO | 20210624-044033.537 | reTurnServer | RETURN | 0x42533392240 | UdpServer.cxx:141 | UdpServer: received retransmission of request with tid: 1118048801:1716012647:1213361519:1848988246
INFO | 20210624-044033.571 | reTurnServer | RETURN | 0x42533392240 | TurnAllocation.cxx:51 | TurnAllocation created: clientLocal=[UDP 0.0.0.0:3478] clientRemote=[UDP 36.18.235.235:46478] allocation=[UDP 0.0.0.0:49154] lifetime=600
INFO | 20210624-044033.572 | reTurnServer | RETURN | 0x42533392240 | TurnAllocation.cxx:121 | TurnAllocation refreshed: clientLocal=[UDP 0.0.0.0:3478] clientRemote=[UDP 36.18.235.235:46478] allocation=[UDP 0.0.0.0:49154] lifetime=600
INFO | 20210624-044033.572 | reTurnServer | RETURN | 0x42533392240 | UdpRelayServer.cxx:30 | UdpRelayServer started.  [0.0.0.0:49154]
INFO | 20210624-044033.822 | reTurnServer | RETURN | 0x42533392240 | UdpServer.cxx:141 | UdpServer: received retransmission of request with tid: 1118048801:1769429337:1816283219:1197044344
INFO | 20210624-044033.857 | reTurnServer | RETURN | 0x42533392240 | TurnAllocation.cxx:62 | TurnAllocation destroyed: clientLocal=[UDP 0.0.0.0:3478] clientRemote=[UDP 36.18.235.235:46478] allocation=[UDP 0.0.0.0:49154]
INFO | 20210624-044033.858 | reTurnServer | RETURN | 0x42533392240 | UdpRelayServer.cxx:36 | ~UdpRelayServer - destroyed.  [0.0.0.0:49154]

@bjarkan
Copy link

bjarkan commented Aug 8, 2021

Yes I already know jitsi is only interested in the own silo.

JSXC it's also playing with videoconferencing jsxc ... maybe jitsi group want to go their way, but kurento could be an alternative to jitsi videobridge.

To support A/V conferencing in conversations, coturn and stun, has to be installed, configured and informed to the xmpp server... why not another thing to allow muc with A/V? Why to think small...? Let's face jitsi, face to face. Or even use Jitsi-Videobridge in our own benefit.

@iNPUTmice What do you think about it? It could even been integrated in snikket, I'm sure that prosody, metronome and tigase will follow...

@bjarkan
Copy link

bjarkan commented Sep 5, 2021

@anmol1991
Copy link

can anyone explain the role of webrtc for peer to peer audio or video session using conversation android application?

@abolfazlghanbari23
Copy link

Does the voice and video call in the application have e2ee? If so, is it possible to disable it in the source code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment