It seems that my Sonus rant has crossed over into some other arenas...
Yahoo! Financial Discussion
I've signed up for an account (kristian.kielhofner -at- yahoo.com) to participate in this thread. Should be interesting.
I created AstLinux but I write and rant about a lot of other things here. Mostly rants about SIP and the other various technologies I deal with on a daily basis.
Friday, February 13, 2009
Monday, February 9, 2009
More on FreeSWITCH
I've hinted at it for some time:
I've been playing with FreeSWITCH.
Anyone who is reading this should already know what FreeSWITCH is and why someone (such as myself) would be so interested in it. I'm not going to go over all of that again; there are plenty of rave reviews all over the internet. I don't need to write another one (although I probably will some day).
Here's an update on what I've done so far:
1) FreeSWITCH support in AstLinux. Still coming along but much progress has already been made. It compiles cleanly (one more hack for sqlite) and appears to work. More testing soon but I was pleasantly surprised - the build system seems to be just as well designed as the rest of the project. They've done a great job!
2) I hate transcoding. Long, long ago I led an effort to re-record and convert all of the Asterisk prompts to various native file formats to avoid transcoding. More than two years later I'm doing it again for FreeSWITCH although this time I don't have to pay to re-record them all! Luckily they are made available in various sample rates already. I just had to update the script and do the converting. Big thanks to sox, Asterisk/res_convert and FreeSWITCH/Mod_native_file.
3) This one is barely worth mentioning but I've started (SVN branch, that's about it) working to re-implement recqual using FreeSWITCH to place calls. I've got some big plans for this. Let's see how much time I actually have to work on it. Don't expect much progress anytime soon.
4) Various production and consulting projects. Obviously.
As always, expect more to come!
I've been playing with FreeSWITCH.
Anyone who is reading this should already know what FreeSWITCH is and why someone (such as myself) would be so interested in it. I'm not going to go over all of that again; there are plenty of rave reviews all over the internet. I don't need to write another one (although I probably will some day).
Here's an update on what I've done so far:
1) FreeSWITCH support in AstLinux. Still coming along but much progress has already been made. It compiles cleanly (one more hack for sqlite) and appears to work. More testing soon but I was pleasantly surprised - the build system seems to be just as well designed as the rest of the project. They've done a great job!
2) I hate transcoding. Long, long ago I led an effort to re-record and convert all of the Asterisk prompts to various native file formats to avoid transcoding. More than two years later I'm doing it again for FreeSWITCH although this time I don't have to pay to re-record them all! Luckily they are made available in various sample rates already. I just had to update the script and do the converting. Big thanks to sox, Asterisk/res_convert and FreeSWITCH/Mod_native_file.
3) This one is barely worth mentioning but I've started (SVN branch, that's about it) working to re-implement recqual using FreeSWITCH to place calls. I've got some big plans for this. Let's see how much time I actually have to work on it. Don't expect much progress anytime soon.
4) Various production and consulting projects. Obviously.
As always, expect more to come!
Thursday, February 5, 2009
The update you've been waiting for...
UPDATE: Any updates for this and other SIP/RTP issues can be found here.
In my last post over one month ago, I ranted on and on (big surprise, right) about some issues with Sonus equipment we were experiencing. After learning more I should elaborate on "Sonus equipment".
Like many other manufacturers Sonus has multiple products. We'll be talking about their NBS SBC. Many providers use the NBS SBC in conjunction with GSX gateways and PSX route servers. I have no comments about GSX gateways or PSX route servers; this equipment is largely transparent to us "end users". My gripes are with the NBS SBC.
Providers that use Sonus NBS:
- Level(3) (w/ GSX)
- XO (w/ PSX & GSX)
- Global Crossing
- Broadvox
- Many others
If you are using these carriers for SIP services, be aware.
Last time I was talking about timestamps. This time it's far more insidious...
Apparently (as relayed to me from Level(3) engineers) Sonus has a DSP buffer limitation for RTP packet handling. If there is ever more than a 100ms (my experience has shown it to be much less) gap in RTP Sonus will in technical terms, "freak out".
We have now identified four RTP interop issues with Sonus equipment:
1) Sonus requires all RTP packets (events or voice) to have unique timestamps. The RFCs specifically state that not only is it valid to use the same timestamp for various RTP packets, it is ideal in some cases (like events, for example).
2) The RFC 2833 events generated by Sonus equipment are goofy, to put it lightly. The event duration increments do not match the packetization of the voice stream as stated in RFC 2833 and elaborated on in RFC 4733. Specifically, Sonus equipment increments RFC 2833 duration 80 samples
at a time as if the voice stream is 10ms (regardless of what it actually is). I don't know of any other implementations that do this. Even when the audio stream is *clearly* 20 ms (in the SDP, too) Sonus will continue to increment 80 samples at a time.
3) The most recent (and biggest problem) has been caused by the Sonus (seemingly arbitrary) requirement that there never be greater than 100ms gaps in RTP. This is inherently broken behavior for robustness in IP networks.
4) Sonus has yet another issue with RTP timing and sequencing... If a call is brought up with an endpoint that clocks it's own RTP stream (IVR server, for example) everything will be fine. Until the IVR server (or whatever) bridges that channel to another device that also clocks its own RTP. Sonus (probably related to #3 above) will lose sync and drop audio for up to several seconds while it catches up to the new RTP stream. This requires those of us that work with Sonus equipment to rewrite all timestamps and sequence numbers on our equipment; which has the adverse effect of less than optimal jitter buffering (which should ideally be done at each far endpoint).
Asterisk is largely ok with all of these issues, believe it or not. The one that still causes problems is #3. If you are using Asterisk and Sonus gateways, make DAMN SURE that you are using Packet2Packet bridging and that your devices (whatever they may be) implement RFC 2833 the Sonus way. If not...
NO DTMF FOR YOU!
If you are not using Packet2Packet bridging and your events need to traverse the Asterisk core (for features, fixup, or anything else) there will be a variable length RTP gap that often exceeds the Sonus DSP buffer requirement. With gaps in RTP...
NO DTMF FOR YOU!
FreeSWITCH is also ok as long as you avoid #4. FreeSWITCH provides the configuration option to rewrite timestamps and break jitter buffering. If you are using Sonus gateways you should enable it, otherwise...
NO DTMF FOR YOU!
All of this makes me wish I was around back in the old days when there was one telco and all DTMF was inband!
In my last post over one month ago, I ranted on and on (big surprise, right) about some issues with Sonus equipment we were experiencing. After learning more I should elaborate on "Sonus equipment".
Like many other manufacturers Sonus has multiple products. We'll be talking about their NBS SBC. Many providers use the NBS SBC in conjunction with GSX gateways and PSX route servers. I have no comments about GSX gateways or PSX route servers; this equipment is largely transparent to us "end users". My gripes are with the NBS SBC.
Providers that use Sonus NBS:
- Level(3) (w/ GSX)
- XO (w/ PSX & GSX)
- Global Crossing
- Broadvox
- Many others
If you are using these carriers for SIP services, be aware.
Last time I was talking about timestamps. This time it's far more insidious...
Apparently (as relayed to me from Level(3) engineers) Sonus has a DSP buffer limitation for RTP packet handling. If there is ever more than a 100ms (my experience has shown it to be much less) gap in RTP Sonus will in technical terms, "freak out".
We have now identified four RTP interop issues with Sonus equipment:
1) Sonus requires all RTP packets (events or voice) to have unique timestamps. The RFCs specifically state that not only is it valid to use the same timestamp for various RTP packets, it is ideal in some cases (like events, for example).
2) The RFC 2833 events generated by Sonus equipment are goofy, to put it lightly. The event duration increments do not match the packetization of the voice stream as stated in RFC 2833 and elaborated on in RFC 4733. Specifically, Sonus equipment increments RFC 2833 duration 80 samples
at a time as if the voice stream is 10ms (regardless of what it actually is). I don't know of any other implementations that do this. Even when the audio stream is *clearly* 20 ms (in the SDP, too) Sonus will continue to increment 80 samples at a time.
3) The most recent (and biggest problem) has been caused by the Sonus (seemingly arbitrary) requirement that there never be greater than 100ms gaps in RTP. This is inherently broken behavior for robustness in IP networks.
4) Sonus has yet another issue with RTP timing and sequencing... If a call is brought up with an endpoint that clocks it's own RTP stream (IVR server, for example) everything will be fine. Until the IVR server (or whatever) bridges that channel to another device that also clocks its own RTP. Sonus (probably related to #3 above) will lose sync and drop audio for up to several seconds while it catches up to the new RTP stream. This requires those of us that work with Sonus equipment to rewrite all timestamps and sequence numbers on our equipment; which has the adverse effect of less than optimal jitter buffering (which should ideally be done at each far endpoint).
Asterisk is largely ok with all of these issues, believe it or not. The one that still causes problems is #3. If you are using Asterisk and Sonus gateways, make DAMN SURE that you are using Packet2Packet bridging and that your devices (whatever they may be) implement RFC 2833 the Sonus way. If not...
NO DTMF FOR YOU!
If you are not using Packet2Packet bridging and your events need to traverse the Asterisk core (for features, fixup, or anything else) there will be a variable length RTP gap that often exceeds the Sonus DSP buffer requirement. With gaps in RTP...
NO DTMF FOR YOU!
FreeSWITCH is also ok as long as you avoid #4. FreeSWITCH provides the configuration option to rewrite timestamps and break jitter buffering. If you are using Sonus gateways you should enable it, otherwise...
NO DTMF FOR YOU!
All of this makes me wish I was around back in the old days when there was one telco and all DTMF was inband!
Wednesday, January 7, 2009
Heads up!
UPDATE: Any developments on this and other SIP/RTP issues can be found here.
Some serious issues for all of those of you in SIP land:
There is a pretty serious RTP problem with Sonus equipment that has been making the rounds...
Simply put, Sonus equipment will not accept two RTP packets with the same timestamp, even if the sequence number has been properly incremented. According to various RFCs (namely 1889 and 2833) this is perfectly valid and in some cases (like video) desired.
A few slight problems... Many implementations (including Asterisk AND FreeSWITCH) will (did -more on this later) send out RFC 2833 DTMF events with the same timestamp as the last voice RTP packet. This is perfectly valid according to the RFCs mentioned above.
It appears (after my own testing) that Sonus will actually drop BOTH the voice RTP packet and the event packet. After some testing against Sonus gear it was pretty clear that no audio was being passed as long as the DTMF event occured. This makes sense because per RFC2833 a variable length DTMF event must use the same timestamp, increment the sequence counter and increase the duration when it is resent - DO NOT change the timestamp. Oh Sonus.
Both Asterisk and FreeSWITCH have incremented workarounds to address this. They are similar but there is one key difference. Asterisk now (as of SVN 12/15/2008 or so) will always use a unique timestamp for every RTP packet. I guess that solves that problem. FreeSWITCH is slightly smarter about it (as of SVN about the same time, interestingly enough) but I"m worried...
FreeSWITCH will parse the SDP to find the originator line (o=). If it is equal to "Sonus_UAC" FreeSWITCH activates a specific workaround to always send RTP packets with different timestamps. This seems more elegant but I am worried they will have to expand this hack for other equipment in the future (requiring a code change and recompile).
One could argue that Sonus has gotten this far with their current implementation and expected behavior. While it is valid (per the RFCs) to use the same timestamp, it is more /compatible/ to always use different timestamps. That appears to be what most equipment does.
This issue is what (apparantly) caused so many issues for Teliax a while back while they switched from Asterisk to FreeSWITCH. At least that's what I heard. What doesn't make any sense is that Asterisk had the same behavior as FreeSWITCH - they both sent voice and event RTP packets with identical timestamps. So that part doesn't make any sense.
Also, one would like to think that when you provide voice services (which are pretty important to your customers) you would *test* something like DTMF when you were completely switching platforms. I discovered these issues while testing Star2Star with Level(3), for example. I'm glad I was paying attention. Our customers would have been upset with broken DTMF while we updated all of our Asterisk machines (several hundred).
I'm suprised no one noticed this until mid-December or so. It will be interesting to see what other things pop out of this mess...
Some serious issues for all of those of you in SIP land:
There is a pretty serious RTP problem with Sonus equipment that has been making the rounds...
Simply put, Sonus equipment will not accept two RTP packets with the same timestamp, even if the sequence number has been properly incremented. According to various RFCs (namely 1889 and 2833) this is perfectly valid and in some cases (like video) desired.
A few slight problems... Many implementations (including Asterisk AND FreeSWITCH) will (did -more on this later) send out RFC 2833 DTMF events with the same timestamp as the last voice RTP packet. This is perfectly valid according to the RFCs mentioned above.
It appears (after my own testing) that Sonus will actually drop BOTH the voice RTP packet and the event packet. After some testing against Sonus gear it was pretty clear that no audio was being passed as long as the DTMF event occured. This makes sense because per RFC2833 a variable length DTMF event must use the same timestamp, increment the sequence counter and increase the duration when it is resent - DO NOT change the timestamp. Oh Sonus.
Both Asterisk and FreeSWITCH have incremented workarounds to address this. They are similar but there is one key difference. Asterisk now (as of SVN 12/15/2008 or so) will always use a unique timestamp for every RTP packet. I guess that solves that problem. FreeSWITCH is slightly smarter about it (as of SVN about the same time, interestingly enough) but I"m worried...
FreeSWITCH will parse the SDP to find the originator line (o=). If it is equal to "Sonus_UAC" FreeSWITCH activates a specific workaround to always send RTP packets with different timestamps. This seems more elegant but I am worried they will have to expand this hack for other equipment in the future (requiring a code change and recompile).
One could argue that Sonus has gotten this far with their current implementation and expected behavior. While it is valid (per the RFCs) to use the same timestamp, it is more /compatible/ to always use different timestamps. That appears to be what most equipment does.
This issue is what (apparantly) caused so many issues for Teliax a while back while they switched from Asterisk to FreeSWITCH. At least that's what I heard. What doesn't make any sense is that Asterisk had the same behavior as FreeSWITCH - they both sent voice and event RTP packets with identical timestamps. So that part doesn't make any sense.
Also, one would like to think that when you provide voice services (which are pretty important to your customers) you would *test* something like DTMF when you were completely switching platforms. I discovered these issues while testing Star2Star with Level(3), for example. I'm glad I was paying attention. Our customers would have been upset with broken DTMF while we updated all of our Asterisk machines (several hundred).
I'm suprised no one noticed this until mid-December or so. It will be interesting to see what other things pop out of this mess...
Monday, December 22, 2008
Introducing Recqual
I've been waiting to talk about this one for a while.
Several months ago Star2Star was having problems with one of our upstream SIP carriers. We were starting to notice a large increase in the number of one way audio calls our customers were reporting.
When most people think of one way calls their first reaction is to blame SIP. Must be NAT! Must be a firewall! SIP sucks! Etc, etc.
I knew that wasn't the case. I just had to prove it.
I was convinced the problem wasn't SIP/UDP/IP related at all. We had multiple pcaps where we were sending RTP to the appropriate gateway. It just wasn't getting to the PSTN. Where was it going? When was this happening? Which gateways (out of hundreds) were the most problematic? We needed to know and we needed to know quickly.
I came up with and "wrote" recqual over a couple of days. After a few runs we were noticing patterns with problematic RTP endpoint IP addresses. Long story short, once these were identified we worked with the carrier to replace various bits of equipment (DSPs, line cards, etc). The one way audio problem has largely disappeared and we continue to run recqual. If this starts happening again we should know /BEFORE/ our customers do.
Of course I'm using Asterisk to place the calls. The best part of using Asterisk is it's multi-protocol flexibility. You should be able to test just about any combination of voice technologies - G.279a, G711, GSM, SIP, IAX, PRI, FXO, FXO, gtalk/jabber/jingle, skype, etc. The possibilities boggle the mind.
I've just been too busy to get it together and release this to the community - until now.
Tarball with instructions here.
Questions? Comments? Suggestions? Drop me a line.
Several months ago Star2Star was having problems with one of our upstream SIP carriers. We were starting to notice a large increase in the number of one way audio calls our customers were reporting.
When most people think of one way calls their first reaction is to blame SIP. Must be NAT! Must be a firewall! SIP sucks! Etc, etc.
I knew that wasn't the case. I just had to prove it.
I was convinced the problem wasn't SIP/UDP/IP related at all. We had multiple pcaps where we were sending RTP to the appropriate gateway. It just wasn't getting to the PSTN. Where was it going? When was this happening? Which gateways (out of hundreds) were the most problematic? We needed to know and we needed to know quickly.
I came up with and "wrote" recqual over a couple of days. After a few runs we were noticing patterns with problematic RTP endpoint IP addresses. Long story short, once these were identified we worked with the carrier to replace various bits of equipment (DSPs, line cards, etc). The one way audio problem has largely disappeared and we continue to run recqual. If this starts happening again we should know /BEFORE/ our customers do.
Of course I'm using Asterisk to place the calls. The best part of using Asterisk is it's multi-protocol flexibility. You should be able to test just about any combination of voice technologies - G.279a, G711, GSM, SIP, IAX, PRI, FXO, FXO, gtalk/jabber/jingle, skype, etc. The possibilities boggle the mind.
I've just been too busy to get it together and release this to the community - until now.
Tarball with instructions here.
Questions? Comments? Suggestions? Drop me a line.
Sunday, December 21, 2008
Consulting Time Available
I haven't blogged in a while but there is some good news...
I have made some time available for consulting work!
You might not be as excited about it as I am but this is a good thing. I'm looking for interesting projects, people, and companies to work with.
If you or anyone you know might be interested please contact me. Resume, references, etc available on request.
I'll be offering bonus time, discounts, and a few other potential incentives to anyone that lets me blog about my projects and/or release any work under a liberal (read: FOSS) license.
Between my change in schedule and some (hopefully) fun new projects you can all expect to see much more frequent blogging soon!
I have made some time available for consulting work!
You might not be as excited about it as I am but this is a good thing. I'm looking for interesting projects, people, and companies to work with.
If you or anyone you know might be interested please contact me. Resume, references, etc available on request.
I'll be offering bonus time, discounts, and a few other potential incentives to anyone that lets me blog about my projects and/or release any work under a liberal (read: FOSS) license.
Between my change in schedule and some (hopefully) fun new projects you can all expect to see much more frequent blogging soon!
Monday, November 17, 2008
SBCs are Killing SIP
Wow... Over a month since my last post! My how time flies.
No time to reminisce or catch up. I've got a rant that needs to get out - NOW.
SBCs (Session Border Controllers) are killing SIP. Breaking SIP. Smothering SIP. Especially when used by "carriers". Carriers and their SBCs I tells ya.
SBCs, technically, are pretty cool devices. While I certainly understand their purpose they tend to be overused, misconfigured, and misunderstood. Many entities deploy SBCs without any idea of the other components (I'm looking at you, proxies) that make up a well designed SIP network.
Why do I hate SBCs so much?
1) SIP is cool because it is end to end and designed with intelligent endpoints in mind (endpoints that can think for themselves).
2) SIP is very flexible, especially with regards to handling media.
3) Ubiquity.
SBCs (especially when misconfigured) break many of these features:
1) SBCs (by design) hide endpoints from one another. Both endpoints support G.722? The SBC doesn't and it's going to rewrite the SDP with it's capabilities. Too bad.
2) SBCs (by design) handle media. While this can be good often times it isn't and there are other, less drastic ways to ensure quality of media.
3) When the only tool you have is a hammer, every problem starts to look like a nail.
My biggest concerns with SBCs relate to the last point. I swear, there are many providers, enterprises, etc that have deployed SIP in some capacity using ONLY SBCs and simple UACs and UASs. They've never heard of a proxy. Or a registrar. Heck, I'd even go for a signalling-only B2BUA and call it a compromise. Chances are they've never heard of that either.
I have dealt with several devices that break down, utterly fall apart when used with a proxy. I've covered it on this blog before. I'm just too mad to look up the link now. Again, $MANUFACTURER designs and markets a SIP device. They only test it against SBCs and they've (apparently) never heard of a proxy. Guess what happens...
Some poor soul like myself tries to deploy said device in what I consider to be a well designed SIP network. Unfortunately for me, this call path might not involve an SBC. Guess what happens? The device doesn't understand traversing proxies (Record-Route, Via, etc) and does something silly like parse the Contact header when trying to send a response. Call failure and all kinds of brokenness ensue.
So... I talk to $MANUFACTURER and get the standard "We've deployed this device thousands of times and never seen this problem before". Let's assume that's true. I don't know what's more depressing: the fact that they skipped over multiple sections of a basic SIP RFC like 3261 or the fact that no one noticed it for this long because (apparantly) no one uses proxies anymore. Ugh. Gross.
It's not just device manufacturers. Carriers do this too. Often times the actual issue lies with their SBC. Many carriers (especially those using ACME SBCs, it seems) parse To: instead of the Request-URI. Probably because their customers are using SBCs too and Request-URI and To: match. Not so with a proxy. I don't blame the carrier's use of an SBC. This makes sense. That's what they were designed for. However, please test your device and configuration against something other than another SBC.
What happens if your Request-URI and To: don't match? They send a 404! Yet another RFC3261 violation. Section 8.2.2.1 allows for a UAS to route based off To (although it doesn't sound preffered). However, for the love of God, if you are going to deny a request because of the content of a To header, please send a 403 as specified in the RFC. Your 404s are confusing and ignorant. Was it really not found, or are you just routing based off To instead of the Request-URI? Once again I blame SBCs and a world where it's becoming common for SBCs to talk to each other (and nothing else).
This is yet another situation where assumptions are made based on the behavior of SBCs. It's bad. Please stop.
No time to reminisce or catch up. I've got a rant that needs to get out - NOW.
SBCs (Session Border Controllers) are killing SIP. Breaking SIP. Smothering SIP. Especially when used by "carriers". Carriers and their SBCs I tells ya.
SBCs, technically, are pretty cool devices. While I certainly understand their purpose they tend to be overused, misconfigured, and misunderstood. Many entities deploy SBCs without any idea of the other components (I'm looking at you, proxies) that make up a well designed SIP network.
Why do I hate SBCs so much?
1) SIP is cool because it is end to end and designed with intelligent endpoints in mind (endpoints that can think for themselves).
2) SIP is very flexible, especially with regards to handling media.
3) Ubiquity.
SBCs (especially when misconfigured) break many of these features:
1) SBCs (by design) hide endpoints from one another. Both endpoints support G.722? The SBC doesn't and it's going to rewrite the SDP with it's capabilities. Too bad.
2) SBCs (by design) handle media. While this can be good often times it isn't and there are other, less drastic ways to ensure quality of media.
3) When the only tool you have is a hammer, every problem starts to look like a nail.
My biggest concerns with SBCs relate to the last point. I swear, there are many providers, enterprises, etc that have deployed SIP in some capacity using ONLY SBCs and simple UACs and UASs. They've never heard of a proxy. Or a registrar. Heck, I'd even go for a signalling-only B2BUA and call it a compromise. Chances are they've never heard of that either.
I have dealt with several devices that break down, utterly fall apart when used with a proxy. I've covered it on this blog before. I'm just too mad to look up the link now. Again, $MANUFACTURER designs and markets a SIP device. They only test it against SBCs and they've (apparently) never heard of a proxy. Guess what happens...
Some poor soul like myself tries to deploy said device in what I consider to be a well designed SIP network. Unfortunately for me, this call path might not involve an SBC. Guess what happens? The device doesn't understand traversing proxies (Record-Route, Via, etc) and does something silly like parse the Contact header when trying to send a response. Call failure and all kinds of brokenness ensue.
So... I talk to $MANUFACTURER and get the standard "We've deployed this device thousands of times and never seen this problem before". Let's assume that's true. I don't know what's more depressing: the fact that they skipped over multiple sections of a basic SIP RFC like 3261 or the fact that no one noticed it for this long because (apparantly) no one uses proxies anymore. Ugh. Gross.
It's not just device manufacturers. Carriers do this too. Often times the actual issue lies with their SBC. Many carriers (especially those using ACME SBCs, it seems) parse To: instead of the Request-URI. Probably because their customers are using SBCs too and Request-URI and To: match. Not so with a proxy. I don't blame the carrier's use of an SBC. This makes sense. That's what they were designed for. However, please test your device and configuration against something other than another SBC.
What happens if your Request-URI and To: don't match? They send a 404! Yet another RFC3261 violation. Section 8.2.2.1 allows for a UAS to route based off To (although it doesn't sound preffered). However, for the love of God, if you are going to deny a request because of the content of a To header, please send a 403 as specified in the RFC. Your 404s are confusing and ignorant. Was it really not found, or are you just routing based off To instead of the Request-URI? Once again I blame SBCs and a world where it's becoming common for SBCs to talk to each other (and nothing else).
This is yet another situation where assumptions are made based on the behavior of SBCs. It's bad. Please stop.
Subscribe to:
Posts (Atom)