Monday, June 23, 2014

Verto - WebRTC and FreeSWITCH Get Hitched

Unless you've been hiding under a rock you know that WebRTC is posed to be the next big thing in real time communications.

If you're familiar with the technical details of WebRTC you also know that WebRTC doesn't mandate a signaling protocol - that's left up to well, whoever. For many of us coming from a SIP/telephony background it's made the most sense to use a signaling protocol we all know - SIP.

WebRTC makes extensive use of WebSockets and this combined with various JavaScript SIP libraries (sipml5sipjsjssip, etc) allows you to do call control using SIP from a browser to a remote system over WebSockets. This assumes, of course, that the remote system supports SIP over WebSocket transport. Then there's also the matter of implementing the various requirements for WebRTC media support such as ICE, STUN, DTLS, SRTP, etc. In short, implementing full WebRTC support is no small task.

Over a year ago I worked with the FreeSWITCH developers to get support for this approach to WebRTC added to FreeSWITCH. As usual, Tony and team delivered a very impressive result - the ability to use the aforementioned JavaScript libraries to call into a FreeSWITCH system using audio, video, or both - all from your browser! Of course once you were connected to a FreeSWITCH system all of the existing functionality was available to you. Conferencing, bridging to existing/legacy endpoints, etc. It wasn't a stretch at all to connect from Chrome using ICE, SRTP, OPUS, and SIP over secure WebSocket and then bridge to an endpoint using SIP over UDP and G729 (or even a PRI)! It boggles the mind to consider what is happening to the audio alone in this scenario - encrypting/decrypting, transcoding, and resampling. All magically and masterfully handled by FreeSWITCH.

One of the most interesting aspects of WebRTC is the ability to develop applications using the full power of the web and the browser environment. For years now we've seen WebRTC sample applications emerge demonstrating just how easy and powerful the peer-to-peer capabilities of WebRTC are. Some of these even demonstrate increasingly complicated "multi-peer" scenarios involving multiple parties.

Where does FreeSWITCH fit with all of these new-fangled web technologies? First, I've already mentioned the ability for FreeSWITCH to bridge to endpoints on different networks. Even though WebRTC shares some standards with these legacy endpoints, direct communication or even media will almost certainly never be possible. Requiring DTLS and SRTP virtually guarantees that.

FreeSWITCH also hosts many powerful applications - including voicemail, conferencing, and even entire custom applications written using LUA or Javascript. Custom applications can also be developed using ESL - a powerful event socket with accompanying library.

Historically there hasn't been a good way to make use of all of this functionality - the power of HTML5/JS/CSS in the browser (via WebRTC) and the power of voice/video applications hosted in the cloud using FreeSWITCH.

Until now.

Verto is an exciting new Javascript library and FreeSWITCH endpoint module. Together they allow web developers to use a single Javascript library for call control and FreeSWITCH event handling and interaction. That's right - call control and ESL in the same endpoint/protocol/library!

The FreeSWITCH endpoint module is configured to listen on a WebSocket or Secure WebSocket (or both). The Javascript library is included and configured to point to the FreeSWITCH instance. With a single library and simple API a web developer can make full use of a remote FreeSWITCH system using WebRTC within minutes!

One more thing - WebRTC (by way of OPUS) supports two channel audio (stereo). As part of Verto development the FreeSWITCH team decided it was finally time to implement stereo as well!

Make sure to stay tuned for more about WebRTC and Verto but in the meantime - enjoy the demo!

Friday, May 2, 2014

VoIP Users Conference Talks Crypto

I spent about two hours today on the always awesome VoIP Users Conference talking with Olle Johansson, Dan York, Tim Panton, Dave Taht, and others about VoIP security and encryption in general.

If you can handle listening to my voice and seeing my stupid mug it should be pretty educational. I hope you appreciate the Nagel print hanging on my office wall!


Wednesday, March 26, 2014

2013 Kamailio Awards

It has come to my attention that Daniel-Constantin Mierla has selected my blog as a 2013 Kamailio Award Winner! This is a huge honor and I appreciate the recognition from Daniel and the entire Kamailio project.

With only six posts in all of 2013 it seems that (thankfully for me) some people do prefer quality over quantity.

Thanks again Daniel!

Tuesday, March 11, 2014

Securing Real-time Communications - Quickly

This is a post that would be received completely differently a year ago.

We now live in a “post-Snowden” era where even average members of the general public are at least vaguely aware of things like “wiretaps” and “encryption”. With that said this isn’t another post about those revelations.

Over the past year I’ve been steadily working to improve the security of the various real-time communications platforms I’m involved in. Timing on this has been coincidental with the news about the NSA and the various programs operated by them. Of course with these revelations I’ve been paying extra special attention to key sizes, cipher suites, and of course the use of elliptic curve cryptography.

As regular readers of my blog might suspect my primary technical focus is on SIP and RTP.  SIP provides for the use of TLS to secure the signaling channel. While TLS isn’t always TLS (TLS versions, cipher suites, etc) there isn’t anything special about the application of TLS to SIP.

TLS is one of those things we just take for granted these days (kids - get off my lawn)!  However, there are several well known issues with various TLS implementations currently in use. Even though TLS is used to secure banking and highly sensitive web traffic everyday, SSL Labs reports that only 20% of web servers currently support TLS 1.2. Yikes.

Of course TLS isn’t just for web traffic. SIP can be secured with TLS as well. This is a very interesting scenario because as the name implies, SIP is almost exclusively used as a signaling channel to establish another session (which may or may not be secured). This means that securing SIP with TLS is not only important for the integrity of the SIP session, it’s often reused to secure the communications established by that session. This is the case with real time voice and video communications using RTP or more specifically the secure profile of RTP (SRTP).

As always, anything SIP related is complicated and security is no exception. To offer a completely encrypted solution the alphabet soup and resulting complexity adds up pretty quickly.  First, you must establish a secure signaling channel using SIP over TLS. You then use SDES (hopefully) to exchange offers and key information over SDP. Once the SDP offer answer model has done it’s thing you will have negotiated the transport of secure voice and/or video using SRTP.

Originally defined in RFC 3711, SRTP has now been updated with several follow-ups, errata, and extensions. Looking around in the open source community and commercial VoIP space, however, shows that many implementations have not moved very far beyond this original specification for various reasons.

One of these reasons is due to existing libraries implementing SRTP. Just about all of the open source implementations of SRTP that I’m aware of use some version of the original libsrtp written by David McGrew of Cisco Systems over seven years ago. Since then libsrtp has been forked, patched, and otherwise used by pjsip, FreeSWITCH, Asterisk, the Chromium/Chrome web browsers from Google (for WebRTC), and countless other projects.

For some time now I’ve been interested in replacing the built in AES and SHA functions in libsrtp with those provided by OpenSSL. This goal led me to the work of John Foley (also of Cisco Systems) and the official libsrtp Github repository. When I first started looking into this John had (recently) completed work on a branch of libsrtp that accomplishes exactly this.  He also went two steps further: the use of AES-NI and support for AES-GCM mode in SRTP.

What?

AES-NI is the reason I was first interested in using OpenSSL for SRTP. Long story short new(ish) Intel CPUs include a new instruction set that greatly increases the performance of AES (especially with AES-GCM and larger payload sizes). This is a good thing and should (hopefully) lead to more widespread adoption of SRTP.

The use of OpenSSL included another perk: support for AES-GCM. While AES-GCM isn’t necessarily new there hadn’t been support for it with SRTP yet. AES-GCM is the current “gold standard” AES block cipher mode of operation, especially for “streaming” operations like VPNs and other packetized data. In fact, Cisco currently considers AES-GCM a “next generation encryption” solution.  AES-GCM is also preferred by the National Security Agency’s (NSA) Suite B crypto specification (for better or worse in a “post-Snowden” world).

AES-GCM is one of those “standing on the shoulders of giants” moments I really appreciate. To think that I’m writing a blog post in 2014 talking about a technology that uses a field named after an early nineteenth century French mathematician (with a really interesting life) is amazing to me. As Isaac Newton said “If I have seen further it is by standing on the shoulders of giants.” In this case the giant was born over 200 years ago and only lived to the age of 20.

Anyway, John Foley (another giant) was able to provide a fork of pjsip that included the feature-openssl branch of libsrtp and had been patched to support AES-GCM mode with SIP (SDP/SDES) and SRTP. He just didn’t have anything else to test it against.

I wrote patches implementing 128 bit AES-GCM mode using the updated libsrtp branch for both FreeSWITCH and Asterisk. Because of limitations in the original SRTP specification both projects would require fairly significant refactoring to include support for 192 and 256 bit GCM modes, “big AES SRTP”, and the use of GCM mode with DTLS (for WebRTC). In fact, both FreeSWITCH and Asterisk implemented support for SRTP by largely differentiating only on authentication tag length.

In any case we were able to use these patches to test 128 bit AES-GCM mode SRTP interoperability between pjsip, Asterisk, and FreeSWITCH. However, there was still work to be done.

I’m happy to report that as of last week the FreeSWITCH master branch now not only has support for the use of 128, 192 and 256 bit AES-GCM modes it also includes a very flexible means to control the use of various SRTP crypto suites on both incoming and outgoing channels. Unfortunately this hasn’t been implemented for DTLS use with WebRTC yet but I plan to advocate for that shortly. I’m also still working with the Asterisk project to clean up my patch for inclusion there.

I think it’s safe to say that with these changes FreeSWITCH now supports state of the art (some may say “next generation”) cryptography with SRTP. However, if you recall the keys used for SRTP are still exchanged using SIP over TLS, which depending on various factors, could still have numerous issues and be fundamentally insecure. How can we be sure that our SIP signaling channel is properly secured?

FreeSWITCH and Kamailio now both also support the explicit specification of TLS version and cipher suite. When using OpenSSL 1.0.1 (which is required by FreeSWITCH) this allows the user to require the use of TLS 1.2 and only the strongest cipher suites supported by OpenSSL. Of course this is great news for FreeSWITCH and Kamailio, but what about the rest of the SIP+SRTP implementations out there?

Unfortunately support for these standards is bleeding edge at this point and not widely deployed. While any use of SIP over TLS and SRTP is certainly preferred for various reasons (performance and security) I’m pushing for support of TLS 1.2 and AES-GCM everywhere possible.

Hopefully I’ll be able to offer more on that soon but for now enjoy the faster and more secure real time communications made possible using FreeSWITCH and Kamailio!

Tuesday, December 10, 2013

High Quality Entropy



This is a (relatively) short post that I’ve been meaning to write for a while.  A recent Ars Technica story (and the resulting HN discussion) prompted me to finally sit down and write this.

I’ve been aware of software defined radios for quite some time.  The entire concept seemed very cool (especially because I’ve always been interested in radio).  Unfortunately they were also somewhat expensive.

The RTL-SDR series of dongles changed all of that.  For less than $20 you could enter the exciting world of software defined radio - using GNURadio and other programs to apply open source and programming to radio.  It’s all very cool stuff but I struggled to find a practical application.  Then I found rtl_entropy.

This time last year you would never hear words like “encryption”, “cryptography”, and “entropy” on news broadcasts like CNN.  Then Edward Snowden starting revealing the reach of the National Security Agency’s various programs and since then we’ve been bombarded with a new revelation on an almost weekly basis.

Ideas that would have seemed crazy and paranoid just a year ago now seem remarkably likely.  Backdoors in closed source programs.  Submarines tapping undersea cables.  Data collection on a massive scale.  Tampering of hardware by various state agencies (the NSA and counterparts in China, for example).

Just today Ars Technica posted a story about the FreeBSD developers no longer explicitly trusting hardware RNG modules from Intel (RDRAND) and Via (padlock).  Note that I say explicitly because unlike Linux (which has always mixed all available sources of entropy) FreeBSD uses one hardware source explicitly.  With all of the questions surrounding the (potential) involvement of the NSA in these designs this probably wasn’t the best idea.  In FreeBSD 10 that’s going to change.

However, RDRAND and padlock were always very good (supposedly) sources for LOTS of entropy.  With their use being called into question, what are we to do for applications that require large amounts of available system entropy but are (hopefully) free from tampering and yet readily available?

That’s where rtl_entropy comes in.  I first found this project back in August.  rtl_entropy uses atmospheric radio noise from an RTL-SDR dongle to generate entropy.  Because the RTL-SDR hardware is able to sample about 3.2 MS/S this translates into about 3.4Mbits/s of entropy.  That’s a lot, especially when you consider what some high price (yet still proprietary and unknown) hardware entropy sources cost.

rtl_entropy is open source and available for inspection.  It runs with cheap, mass-produced radio dongles which are (presumably) difficult to “cook” for this purpose.  As the author states “it samples atmospheric noise, does Von-Neumann debiasing, runs it through the FIPS 140-2 tests, optionally (-e) does Kaminsky debiasing if it passes the FIPS tests, then writes to the output. It can be run as a daemon which by default writes to a FIFO, which can be read by rngd to add entropy to the system pool”.  All of this for less than $20.

When compared to hardware random number generator devices from the Wikipedia table we can see just how revolutionary this is.  With proprietary hardware (and in some cases software), which is still unavailable for inspection, the next cheapest source of high quality entropy is $235 per Mbit/s.  With a $15 RTL-SDR dongle rtl_entropy is about $6 per Mbit/s and the entire software stack is open source.  As said before, it’s also ready to use by just about any Linux system via rngd.

Also, because Linux always mixes the sources of available entropy rtl_entropy can be safely combined with various other mechanisms available - RDRAND, haveged, aed, etc.

I’m proud my company and I have been able to sponsor Paul and his work on rtl_entropy.  This is what open source is all about!

Monday, September 23, 2013

Learning "Stupid NAT Tricks" from Apple

In my last post I spent some time (and many words) describing the what, how, and why behind some of the changes Apple has made to the Facetime protocol over the past three years.

Needless to say I was impressed.  There were several things those of us in the open source world can learn from that analysis:


- SIP, STUN, RTP, RTCP port multiplexing (while not including the a=rtcp-mux attribute)
- Compact SIP headers
- SDP minimization (removing rtpmap lines, etc)
- SDP compression

The IETF is currently working on port multiplexing (mainly via WebRTC) but I have yet to see a proposal that includes multiplexing signaling and media on the same port (with the architecture of WebRTC that's not a surprise).  Either way I'll consider this "in progress".  I guess there's always IAX ;).

Compact SIP headers are already supported by a variety of open source applications.  We'll consider this one done.

SDP minimization is something that can't generally be done in a standards compliant way.  You'll note I labelled these optimizations as a "slight protocol violation" in my previous post.  We'll table this one for now.

SDP compression, however, is very interesting.  SIP bares more than a resemblance to HTTP and HTTP compression has been standard and well supported for quite some time.  With that in mind I see no reason why it can't become more widely supported for SIP applications as well.  It's also important to point out that while I'm calling this "SDP compression" it's really SIP body compression that could be used for any MIME type or combinations of MIME type (multipart).

As we saw from my last post this is no easy task.  The parsing of plaintext SDPs is so easy and commonplace today that anything else is a significant undertaking.  For compressed SIP bodies to be transparent to me in my everyday life they would need to be supported in at least the following projects/products:

- Ixia IXLoad (probably not going to happen)

Fortunately I have good news!  Shortly after my last post Daniel-Constantin Mierla from Kamailio contacted me and let me know that he had written the gzcompress module for "the fun of a quick coding in the evening".  There's still work to be done on this module but this was an awesome (and unexpected) start!

In the same day I was also contacted by ashdnazg from the Pyreshark project.  First of all, for a protocol junky like myself the ability to write Wireshark protocol dissectors on the fly in Python is very, very exciting.  I can't believe I hadn't heard of this project before!  Anyway, within hours of receiving the Facetime packet capture ashdnazg was able to provide me with a Pyreshark-based Wireshark dissector (in 26 lines of Python, no less)!  My local (Ubuntu) instance of Wireshark now looks like this when loading a Facetime packet capture:



Oh yeah!

This is another marvel of open source - within 24 hours of my last post the leaders of two open source projects from across the globe were able to collaborate with me to support the functionality I described here.  Amazing.

Of course we're not done yet.  I created a bounty for FreeSWITCH to support this functionality as well.  Anthony and the boys are very busy but we'll see if I can get any traction on this one.

That leaves PJSIP and HOMER.  Seeing as HOMER is based on Kamailio it's probably not that difficult to implement this.  I'd also like to think (based on the use case and project goals) that they may be more interested in this than anyone.

Stay tuned!

Friday, September 20, 2013

Apple's new Facetime - a SIP Perspective

A colleague of mine recently sent over a PCAP file containing an Apple Facetime session between two iPhone devices running just-released iOS 7.  Being a protocol junkie he thought I might be interested in seeing them.  Clearly that’s the case!

As always (because I’m lazy) I looked around the internet to see what other research people had done on the “Facetime protocol”.  I found several excellent (though now somewhat dated) articles.  Without giving it away just yet it looks like Apple has been very busy over the last three years (like you needed me to tell you that).  As we usually do here, let’s start looking at packets!

At first glance Facetime 2013 resembles any normal SIP capture:





Then some interesting details begin to emerge…  What are these unknown packets?  What’s up with this INVITE?

Let’s talk about the unknown UDP packets.  We’ll select one and force Wireshark to decode it as RTP:





That worked and we can see this is RTP using a dynamic payload type.  Our RTP packets are now correctly dissected but Wireshark seems to have confused the SIP and STUN packets for RTP.  Why?

Forcing Wireshark to decode a given stream (where “stream” is a SRC IP:PORT pair and a DST IP:PORT pair) attempts to force a decode of that protocol type on all packets belonging to that stream.  So what happened here?

If we look closer at the UDP layer we can see that STUN, SIP, and RTP all appear to be using the same port number on each endpoint (16402 in this case).  We also have no idea what codec payload type 104 is (we’d need to see the SDP for that).  Now’s probably a good time to look at the SIP signaling a bit closer.





Apple is using compact SIP headers (interesting).  The User-Agent uses the codename for Facetime and GK more than likely implies the use of Apple’s GameKit.  That SDP, however, looks a little strange.  Certainly not the simple, printable ASCII we’re used to seeing in SIP bodies!

However, not all hope is lost.  SIP compact header “e” maps to encoding.  Encoding “deflate” is pretty standard HTTP 1.1 compression (from RFC 1951).  Wireshark clearly doesn’t support this with SIP so we’ll have to do a little more work…

I saved the SIP message bodies (SDPs) from the INVITE and 200 OK into separate files.  Here’s what hexdump had to say about the INVITE:


0000000 da78 4f75 6e41 3083 bc10 f123 3f07 4da0
0000010 36bc e021 0f95 2886 d46d 4224 8a26 057a
0000020 554a 20d4 9010 bfbc a6eb 7352 d641 3d6a
0000030 3b33 f5e3 734d ebdf cbf4 b9db aa6b fd3a
0000040 a62a 1ebc 746e 9c65 eece 76c8 c059 1620
0000050 080b 85a3 b090 8800 6f7c 6dd4 3657 da97
0000060 2af7 373d 6a53 2b93 39c1 4fe5 c29a af7c
0000070 dbd0 8e7d 6b67 c714 bdc3 6a59 00a0 28dd
0000080 721e 0cf5 3fb8 4c59 624d 4c52 92ad dfb8
0000090 323a ee23 0555 ac68 960a 49f2 032e b77c
00000a0 22e8 8737 bac4 9a9e f7cc 5d5a 3f5c 8e9a
00000b0 1841 c170 29ec 9a5b c673 d380 7c7a 1545
00000c0 98b2 057e b082 5410 9ce0 54c3 eaf5 e1d7
00000d0 67d0 f53b 98ca e594 db45 ea5f ab31 e487
00000e0 55d2 2cdf f888 ba7d 6ddf 8494 8e28 5ae5
00000f0 d2c4 c571 8555 75ab effc 2f77 e58e e580
0000100 f3a3 594f 2acd 1b21 0aeb 5467 97da 07d4
0000110 770c 03fc 5b01 3c74                    
0000118

Oh, you can’t read that?  Yeah, I can’t either.  We’ll have to inflate these.  Unfortunately no standard utility (gunzip, etc) seemed to want to uncompress these binary blobs so I had to hack up some PHP (first thing I could find with Google, whatever):

#!/usr/bin/php
//Add the usual php open and close statements (thanks Blogger)
$filename = "./my-sdp";
$handle = fopen($filename, "rb");
$contents = fread($handle, filesize($filename));
fclose($handle);
$uncompressed = gzuncompress($contents);
echo $uncompressed;

Which gave me a legible SDP offer from the INVITE:

v=0
o=GKVoiceChatService 0 0 IN IP4 192.168.231.118
s=mobile
c=IN IP4 192.168.231.118
b=AS:2000
t=0 0
a=FLS;VRA:0;MVRA:0;RVRA1:1;AS:2;MS:-1;LTR;CABAC;CR:3;LF:-1;PR;CH:4;AR:4/3,3/4;XR;
a=DMBR
a=CAP
m=audio 16402 RTP/AVP 104 105 106 9 0 124 122 121
a=rtcp:16402
a=fmtp:AAC SamplesPerBlock 480
a=rtpID:3189937293
a=au:65792
a=fmtp:104 sbr;block 480
a=fmtp:105 sbr;block 480
a=fmtp:106 sec;sbr;block 480
a=fmtp:122 sec
a=fmtp:121 sec

Now we can begin to analyze what’s actually happening here.  Our INVITE contains a perfectly valid audio offer advertising support for PCMU (payload type 0), 16kHz G722 (payload 9), and six dynamic payload types.  The IANA has defined static payload types for RTP.  Simply put, we know payload type 0 is PCMU but anything between 96-127 SHOULD (RFC 4566) have a corresponding rtpmap line to map RTP payload type to a “media encoding name”.  Looking at this trace alone I don’t know what these RTP payload types are.

However, it’s very likely that the payload types have stayed the same even though Apple has now removed the suggested rtpmap lines.  More than likely Apple has “hardcoded” these payload type codec maps internally.  We’ll get to why in just a bit.

The other iPhone 5 running iOS 7 responds with the following SDP answer in the 200 OK:

v=0
o=GKVoiceChatService 0 0 IN IP4 192.168.231.100
s=mobile
c=IN IP4 192.168.231.100
b=AS:2000:2000
t=0 0
a=FLS;VRA:0;MVRA:0;RVRA1:1;AS:2;MS:-1;LTR;CABAC;CR:3;LF:-1;PR;CH:4;AR:4/3,3/4;XR;
a=DMBR
a=CAP
m=audio 16402 RTP/AVP 104 106 121 122
a=rtcp:16402
a=fmtp:AAC SamplesPerBlock 480
a=rtpID:3770747611
a=au:65792
a=fmtp:104 sbr;block 480
a=fmtp:106 sec;sbr;block 480
a=fmtp:121 sec
a=fmtp:122 sec

Using what we know from the 2010 analysis (complete with rtpmap lines) it seems we have agreed to use the AAC_ELD codec at 24kHz and 16kHz sample rates.  However, absent analysis of an RTP stream with payloads 121 or 122 (and missing rtpmap lines) it’s hard for me to say what those other payload types represent.

A casual reader may have read this far and thought to themselves: at the end of the day we’re ending up with 24kHz AAC_ELD audio just like we were in 2010.  Apple has gone through all of this work for nothing.  Oh and by the way, how/why are SIP, STUN, RTP, and RTCP using the same UDP port?

It’s called port multiplexing and it has become all the rage these days (it’s standard in WebRTC, for example - although not to this extent).  Unfortunately I’m not able to determine if port multiplexing is new to Facetime 2013 (I’m sure someone will chime in here).  Either way it’s an important technical distinction.  Typically a SIP session of this type would require at least three UDP ports:

- SIP signaling
- RTP for audio
- RTCP for well, RTCP

What’s wrong with three ports?  Three ports make it much more difficult (if not impossible) to cross some types of NAT devices and firewalls.  Three ports and three possibly bad interactions with some firewall or other device.  Three times as much exposure to Murphy's law.  With port multiplexing Apple has greatly increased the chances that Facetime will work through challenging network environments.  With this aggressive (I’ve never seen it before) use of UDP multiplexing for all of these standard protocols Apple has virtually guaranteed that if the signalling works the media and everything else will too.

UPDATE: The always on-point Olle Johansson has postulated that Apple might be multiplexing everything over a single port for greater compatibility with carrier grade NAT (CGN) implementations, which place specific limits on the number of ports used per client.  Thanks Olle!

Any SIP engineer will tell you NAT is the bane of our existence.  That’s why (even from 2010) Apple has made use of protocols and features such as STUN, ICE, TURN, etc.  These are discussed all over the internet so I won’t get into them here but in summary they are all technologies used to traverse NAT devices and network firewalls.

It is very clear Apple has made Facetime (finally) ready for primetime.  Let me explain why and how I came to that conclusion.  First let’s talk strategy.

Without going into all of the details it is my opinion that Skype became as popular as it is because (like many successes) “it just works”.  The main reason Skype “just works” is its almost-magical NAT traversal.  The creators of Skype learned a lot from their previous gig defeating firewalls for peer to peer music sharing with Kazaa.  Both Skype and Kazaa have an almost legendary reputation for NAT and firewall traversal.  If there is a way through a NAT device or firewall they will probably figure it out.  It’s with this technology (and timing, codecs, etc) that Skype became so popular.  When nothing else worked, Skype would (and still does).  It just works (and as we’ve seen that’s hugely valuable).

The more endpoints that “just work” the more Metcalfe’s law comes into effect.  If Apple can succeed in making as many endpoints as possible “just work” they have a hugely valuable real-time communications network with Facetime.

A skeptic might point out that Apple is already using STUN, TURN, and ICE. They’ve already got NAT “figured out”.  Generally speaking, yes.  However, they’ve now taken some extraordinary steps to take their NAT handling (and by extension the “value” of Facetime) to the next level.  Apple wants Facetime to work on as many networks as possible and they’ve spent a lot of time making sure of that.  Let’s look at the changes from Facetime of 2010 to now:

- SIP, STUN, RTP, RTCP port multiplexing (while not including the a=rtcp-mux attribute)
- Compact SIP headers
- SDP minimization (removing rtpmap lines, etc)
- SDP compression

This clearly isn’t an off the shelf SIP and RTP based solution anymore, but how does it all add up?  Also, why are three of these efforts focused on minimizing packet size (even if it means violating standards)?

I know this has been a long and winding road.  Hopefully you’re still with me!  To understand the value of minimizing packet size you need to peek into another little-known area of the internet: IP fragments.  I’m going to butcher a lot of this but at this point you just need to get the broad strokes.

Each network link type is configured for a maximum transmission unit (MTU).  For Ethernet this is typically 1500 bytes.  This means that the maximum size of a single Ethernet frame can be 1500 bytes.  In many, cases, however, the various links and links inside of links (encapsulation), etc mean that the effective end-to-end MTU is significantly smaller than that (we won’t get into ATM, etc).  There are two ways this can be addressed:

- PMTU (Path MTU discovery)

With these two technologies who cares about packet size?  Firewalls and NAT devices, that’s who.  First, many firewall administrators or vendors carelessly block all ICMP packets.  That means PMTU is out.  If Apple Facetime had greater than end-to-end MTU sized packets and depended on functioning PMTU there would be many instances where it would not work.

IP fragmentation poses another problem.  Once again, many network vendors and firewall administrators outright block IP fragments.  To add insult to injury (or is that injury to insult?) many of these devices have broken and/or buggy support for IP fragment reassembly.  This especially goes for UDP (which to be fair is not a “connection oriented” protocol).  I have personally witnessed many instances where firewall devices could successfully reassemble IP+TCP fragments but not IP+UDP fragments.

This leads to an almost unwinnable situation.  Apple could use TCP for the signalling and UDP for media but then they’d lose the benefits of single port multiplexing.  They could use TCP over a single port for everything but that would just be crazy.  In either case TCP has a larger header anyway (larger header = larger packet for the same amount of data).

This has posed such a longstanding problem that some network operators and academics study the behavior of MTU interactions and IP fragmentation on the internet.  It came up most recently on NANOG a few weeks ago.  Let’s look at the results of Emile Aben’s mini-study:

Results:
size = ICMP packet size, add 20 for IPv4 packet size
fail% = % of vantage points where 5 packets where sent, 0 where received.
#size   fail%   vantage points
100     0.88    2963
300     0.77    3614
500     0.88    1133
700     1.07    3258
900     1.13    3614
1000    1.04    770
1100    2.04    3525
1200    1.91    3303
1300    1.76    681
1400    2.06    3014
1450    2.53    3597
1470    3.01    2192
1470    3.12    3592
1473    4.96    3566
1475    4.96    3387
1480    6.04    679
1480    4.93    3492 [*]
1481    9.86    3489
1482    9.81    3567
1483    9.94    3118

From these results we can see that you start to have more than 1% failure when packet size is greater than 700 1000 bytes.

The Facetime 2010 INVITE packet was 1093 bytes.  The Facetime 2013 INVITE packet is 714 bytes.  In any case there is a greater chance that the 714 byte packet will reach its intended destination.  From these results we can also see that you start to run into real trouble once you reach 1400 bytes.  Sure that’s twice the size of our current packet but who knows what the future holds for the capabilities of Facetime?  Screen sharing, multi-party conferencing, file transfer, etc all depend on larger SDP descriptions.  With the changes Apple has made they’ve not only increased the robustness of Facetime today, they’ve given themselves “room to grow” in the future.

Apple has spent some time in the trenches over the last few years and found out how difficult real-time communications can be in the real world.  Facetime isn't playing around anymore and Apple is becoming a serious networking (services?) company that's posed to take on current best-of-breed solutions.

If you're interested to see how this might be implemented in various open source projects, check out the update!