Wednesday, January 7, 2009

Heads up!

UPDATE: Any developments on this and other SIP/RTP issues can be found here.

Some serious issues for all of those of you in SIP land:

There is a pretty serious RTP problem with Sonus equipment that has been making the rounds...

Simply put, Sonus equipment will not accept two RTP packets with the same timestamp, even if the sequence number has been properly incremented. According to various RFCs (namely 1889 and 2833) this is perfectly valid and in some cases (like video) desired.

A few slight problems... Many implementations (including Asterisk AND FreeSWITCH) will (did -more on this later) send out RFC 2833 DTMF events with the same timestamp as the last voice RTP packet. This is perfectly valid according to the RFCs mentioned above.

It appears (after my own testing) that Sonus will actually drop BOTH the voice RTP packet and the event packet. After some testing against Sonus gear it was pretty clear that no audio was being passed as long as the DTMF event occured. This makes sense because per RFC2833 a variable length DTMF event must use the same timestamp, increment the sequence counter and increase the duration when it is resent - DO NOT change the timestamp. Oh Sonus.

Both Asterisk and FreeSWITCH have incremented workarounds to address this. They are similar but there is one key difference. Asterisk now (as of SVN 12/15/2008 or so) will always use a unique timestamp for every RTP packet. I guess that solves that problem. FreeSWITCH is slightly smarter about it (as of SVN about the same time, interestingly enough) but I"m worried...

FreeSWITCH will parse the SDP to find the originator line (o=). If it is equal to "Sonus_UAC" FreeSWITCH activates a specific workaround to always send RTP packets with different timestamps. This seems more elegant but I am worried they will have to expand this hack for other equipment in the future (requiring a code change and recompile).

One could argue that Sonus has gotten this far with their current implementation and expected behavior. While it is valid (per the RFCs) to use the same timestamp, it is more /compatible/ to always use different timestamps. That appears to be what most equipment does.

This issue is what (apparantly) caused so many issues for Teliax a while back while they switched from Asterisk to FreeSWITCH. At least that's what I heard. What doesn't make any sense is that Asterisk had the same behavior as FreeSWITCH - they both sent voice and event RTP packets with identical timestamps. So that part doesn't make any sense.

Also, one would like to think that when you provide voice services (which are pretty important to your customers) you would *test* something like DTMF when you were completely switching platforms. I discovered these issues while testing Star2Star with Level(3), for example. I'm glad I was paying attention. Our customers would have been upset with broken DTMF while we updated all of our Asterisk machines (several hundred).

I'm suprised no one noticed this until mid-December or so. It will be interesting to see what other things pop out of this mess...

5 comments:

Michael S Collins said...

K,

Thanks for bringing this to light! The issue with Sonus is a very serious one because it underscores one of the key shortcomings in SIP/RTP/VoIP: a critical lack of true standards. If the RFCs are suggestions as opposed to standards then one big vendor can mess with everyone else and force all of these nasty hacks to be introduced for the sake of interop.

I have to give the FreeSWITCH devs props for having the guts to engineer the software properly and tell *any* vendor that they are doing some against the RFC. How many devs roll over and integrate hacks just to avoid headaches?

I can think of several lessons to be learned from this episode:
#1 Make sure your tests emulate properly what your customers will see when they upgrade! The Teliax guys did a lot of testing but none of them had a Sonus in between the SIP phone and FreeSWITCH, thus the issue was discovered by paying customers. Ouch.

#2 Service providers and their vendors still don't care about the little guys.

#3 Always listen to Anthony Minessale when talking telecom. Dude just is never wrong. If every big equipment vendor would let their engineers spend an hour with Tony and Co. then the world would be a better place.

Anyway, thanks for giving FreeSWITCH a fair shake and thanks for asking great questions that will help us all make our OSS telephony platforms be better.

-Michael S Collins

Kristian Kielhofner said...

Michael,

RFCs are technically (per the IETF) not standards. However, when you purchase a product or service that says it complies to RFC XXXX on the spec sheet, you are given the power to enforce that RFC as a "standard". There does have to be /some/ truth in advertising...

I myself have used this against several vendors - notably Cisco and Adtran. When faced with blatantly RFC incorrect or ignorant behavior both vendors fixed their implementations. Most of the time it helped them with a sale so everyone wins.

Service providers get this treatment too. Most of the time when the configuration is under their control (like routing SIP requests based off of To: instead of the request URI) I've gotten them to change their configuration/behavior.

Sometimes (like with this Sonus issue) the carrier is caught in the middle. Someone, somewhere, somehow, at Level(3) decided to go with Sonus. Sonus does not provide a configuration option to change this behavior. Level(3) is basically stuck in the middle.

Level(3) should take a page out of my book and go after Sonus. After all, they're just another customer to Sonus (albeit a big one, probably). Unfortunately, Level(3) is really just too big, slow, and bureaucratic to coordinate this.

I do agree with you about Tony. While no one (not even Tony) can be right all the time, he really knows what he is talking about. Maybe we can arrange for a "spend a day with Tony" program! ;)

FreeSWITCH rocks (more on this coming soon, believe me)!

Diego Viola said...

FreeSWITCH rocks!

Paul (paul@timmins.net) said...

Thanks for this research. So is the asterisk fix in SVN or in a current release?

Kristian Kielhofner said...

Paul - The timestamp fix is in Asterisk 1.4.23.