Audio and Video calls in XMPP are encrypted end-to-end with DTLS-SRTP as per XEP-0320: Use of DTLS-SRTP in Jingle Sessions.
This protocol replaces XEP-0320 with something that is encrypted with and verified by OMEMO.
Disclaimer: The proper solution is to use OMEMO version 0.5+ and Stanza Content Encryption and encrypt the entire Jingle handshake. However we are still a long road away from having OMEMO 0.5+ in general and any implementational experience with SCE for IQ based protocols in particular. The protocol proposed here is a hack that is hopefully not too dirty.