Just a quick one today (and some shameless self promotion on my part)... Star2Star has been recognized on a few "lists" this year, check it out:
Inc 500
Forbes 100 "Most Promising"
I'm lucky enough to tell people the same story all of the time - when I was a little kid I played with all of this stuff because I thought it was fun and I loved it. Only later did I realize that one day I'd be getting paid for it. I certainly never thought it could come to this!
Ok, enough of that for now. I'll be getting back to some tech stuff soon...
I created AstLinux but I write and rant about a lot of other things here. Mostly rants about SIP and the other various technologies I deal with on a daily basis.
Friday, December 2, 2011
Tuesday, November 15, 2011
Building a Startup (the right way)
(Continued from Building a Startup)
Our way wasn’t working. To put it mildly our “business grade” solution didn’t perform much better than Vonage. We became to exemplify VoIP - jittery calls, dropped calls, one way calls, etc, etc, etc. Most of this was because of the lack of quality ITSPs at that time. Either way our customers didn’t care. It was us. If we went to market with what we had the first time around we were going to loose.
The problem was the other predominant architecture at the time was “hosted”. Someone hosts a PBX for you and ships you some phones. You plug them in behind your router and magically you have a phone system. They weren’t doing much better. Sure, their sales looked good but even then it was becoming obvious customer churn was quite high. People didn’t like hosted either, and for good reason. Typically they have less control over the call than we do.
As I’ve eluded before I thought there was a better way. We needed to host the voice applications where it made the most “sense”. We were primarily using Asterisk and with a little creative provisioning, a kick-ass SIP proxy, and enough Asterisk machines we could build the perfect business PBX - even if that meant virtually none of it existed at the customer premise. Or maybe all of it did. That flexibility was key. After a lot of discussions, whiteboard sessions, and late nights everyone agreed. We needed a do-over.
So we got to work and slowly our new architecture began to take shape. We added a kick-ass SIP proxy (OpenSER). OpenSER would power the core routing between various Asterisk servers each meeting different needs - IVR/Auto Attendant, Conferencing, Voicemail, remote phones (for “hosted” phones/softphones), etc. The beauty was the SIP proxy could route between all of these different systems including the original AstLinux system at the customer premise. Customer needs to call voicemail? No problem - the AstLinux system at the CPE fires an INVITE off to the proxy and the proxy figures out where their voicemail server is. The call is connected and the media goes directly between the two endpoints. Same thing for calls between any two points on the network - AstLinux CPE to AstLinux CPE, PSTN to voicemail, IVR to conference.
This is a good time to take a break and acknowledge what really made this all possible - OpenSER. While it’s difficult to explain the exact history and family tree with any piece of SER software I can tell you one thing - this company would not be possible without it. There is no question in my mind. It’s now 2011 and whether you select Kamailio or OpenSIPS for your SIP project you will not be sorry. Even after five years you will not find a more capable, flexible, scalable piece of SIP server software. It was one of the best decisions we ever made.
Need to add another server to meet demand for IVR? No problem, bring another server online, add the IP to a table and presto - you’re now taking calls on your new IVR. Eventually a new IVR lead to several new IVRs, voicemail servers, conference systems, web portals, mail servers, various monitoring systems, etc.
What about infrastructure? Starting at our small scale, regional footprint, and focus on quality we began to buy our own PRIs and running them on a couple of Cisco AS5350XM gateways. This got us past our initial issues with questionable ITSPs. Bandwidth was becoming another problem... We had an excellent colocation provider that provided blended bandwidth but still we needed more control. Here came BGP, ARIN, AS numbers, a pair of Cisco 7206VXRs w/ G2s, iBGP, multiple upstream providers, etc.
At times I would wonder - whatever happened to spending my time worrying about cross compilers? Looking back I’m not sure which was worse - GNU autoconf cross-compiling hell or SIP interop, BGP, etc. It’s fairly safe to say I’m a sadomasochist either way.
Even with all of the pain, missteps, and work we finally had an architecture to take to market. It would be the architecture that would serve us well for several years. Of course there was more work to be done...
The problem was the other predominant architecture at the time was “hosted”. Someone hosts a PBX for you and ships you some phones. You plug them in behind your router and magically you have a phone system. They weren’t doing much better. Sure, their sales looked good but even then it was becoming obvious customer churn was quite high. People didn’t like hosted either, and for good reason. Typically they have less control over the call than we do.
As I’ve eluded before I thought there was a better way. We needed to host the voice applications where it made the most “sense”. We were primarily using Asterisk and with a little creative provisioning, a kick-ass SIP proxy, and enough Asterisk machines we could build the perfect business PBX - even if that meant virtually none of it existed at the customer premise. Or maybe all of it did. That flexibility was key. After a lot of discussions, whiteboard sessions, and late nights everyone agreed. We needed a do-over.
So we got to work and slowly our new architecture began to take shape. We added a kick-ass SIP proxy (OpenSER). OpenSER would power the core routing between various Asterisk servers each meeting different needs - IVR/Auto Attendant, Conferencing, Voicemail, remote phones (for “hosted” phones/softphones), etc. The beauty was the SIP proxy could route between all of these different systems including the original AstLinux system at the customer premise. Customer needs to call voicemail? No problem - the AstLinux system at the CPE fires an INVITE off to the proxy and the proxy figures out where their voicemail server is. The call is connected and the media goes directly between the two endpoints. Same thing for calls between any two points on the network - AstLinux CPE to AstLinux CPE, PSTN to voicemail, IVR to conference.
This is a good time to take a break and acknowledge what really made this all possible - OpenSER. While it’s difficult to explain the exact history and family tree with any piece of SER software I can tell you one thing - this company would not be possible without it. There is no question in my mind. It’s now 2011 and whether you select Kamailio or OpenSIPS for your SIP project you will not be sorry. Even after five years you will not find a more capable, flexible, scalable piece of SIP server software. It was one of the best decisions we ever made.
Need to add another server to meet demand for IVR? No problem, bring another server online, add the IP to a table and presto - you’re now taking calls on your new IVR. Eventually a new IVR lead to several new IVRs, voicemail servers, conference systems, web portals, mail servers, various monitoring systems, etc.
What about infrastructure? Starting at our small scale, regional footprint, and focus on quality we began to buy our own PRIs and running them on a couple of Cisco AS5350XM gateways. This got us past our initial issues with questionable ITSPs. Bandwidth was becoming another problem... We had an excellent colocation provider that provided blended bandwidth but still we needed more control. Here came BGP, ARIN, AS numbers, a pair of Cisco 7206VXRs w/ G2s, iBGP, multiple upstream providers, etc.
At times I would wonder - whatever happened to spending my time worrying about cross compilers? Looking back I’m not sure which was worse - GNU autoconf cross-compiling hell or SIP interop, BGP, etc. It’s fairly safe to say I’m a sadomasochist either way.
Even with all of the pain, missteps, and work we finally had an architecture to take to market. It would be the architecture that would serve us well for several years. Of course there was more work to be done...
Wednesday, November 2, 2011
Building a Startup
(Continued from Starting a Startup)
After several days of meetings in Sarasota we determined:
1) I was moving to Sarasota to start a company with Norm and Joe.
2) We were going to utilize open source software wherever possible (including AstLinux, obviously).
3) The Internet was the only ubiquitous, high quality network to build a nationwide platform.
4) The Internet was only getting more ubiquitous, more reliable, and faster in coming months/years/decades/etc.
5) We were going to take advantage of as much of this as possible.
These were some pretty lofty goals. Remember, this is early 2006. Gmail was still invitation-only beta. Google docs didn’t exist. Amazon EC2 didn’t exist. “Cloud computing” hadn’t come back into fashion yet. The term itself didn’t exist. The Internet was considered (by many) to be “best effort”, “inherently unreliable”, and “unsuitable” for critical communications (such as real time business telephony). There were many naysayers who were confident this would be a miserable failure. As it turns out, they were almost right.
We thought the “secret sauce” to business grade voice over the internet was monitoring and management. If one could monitor and manage the internet connection business grade voice should be possible. Of course this is very ambiguous but it lead to several great hires. We hired
Joe had already deployed several embedded Asterisk systems to various businesses in the Sarasota area. They used an embedded version of Linux he patched together and a third party (unnamed) “carrier” to connect to the PSTN. The first step was upgrading these machines and getting them on AstLinux. Once this was accomplished we felt confident enough to proceed with our plan. This was Star2Star Communications and in the beginning of 2006 it looked something like this:
1) Soekris net4801 machines running AstLinux on the customer premise.
2) Grandstream GXP-2000 phones at each desk.
3) Connectivity to a third party “ITSP”.
4) Management/monitoring systems (check IP connectivity, phone availability, ITSP reliability, local LAN, etc).
5) Central provisioning of AstLinux systems, phones, etc.
This was Star2Star and there was something I really liked about it - it was simple. Anyone who knows me or knows of my projects (AstLinux, for example) has to know I favor simplicity whenever possible. Keep it simple, keep it simple, keep it simple (stupid).
As time went on we started to learn that maybe this was too simple. We didn’t have enough control. Out monitoring wasn’t as mature as it should be. We didn’t pick the right IP phones. These could be easily fixed. However, we soon realized our biggest mistake was architecture (or lack thereof). This wasn’t going to be an easy fix.
We couldn’t find an ITSP that offered a level of quality we considered to be acceptable. Very few ITSPs had any more experience with VoIP, SIP, and the internet than we did. More disturbing, however, was an almost complete lack of focus on quality and reliability. No process.
What we (quickly) discovered is the extremely low barrier to entry for ITSPs, especially back then. Virtually anyone could install Asterisk on a $100/mo box in a colo somewhere, buy dialtone from someone (who knows) and call themselves an ITSP. After going through several of these we discovered we needed to do it ourselves.
Even assuming we could solve the PSTN connectivity problem we discovered yet another issue. All of the monitoring and management in the world cannot make up for a terrible last mile. If the copper in the ground is rotting and the DSL modem can only negotiate 128kbps/128kbps that’s all you’re going to get. To make matters worse in the event of a cut or outage the customer would be down completely. While that may have always happened with the PSTN and an on premise PBX we considered this to be unacceptable.
So then, in the eleventh hour, just before launch I met with the original founders and posed a radical idea - scrap almost everything. There was a better way.
(Continued in Building a Startup (the right way))
1) I was moving to Sarasota to start a company with Norm and Joe.
2) We were going to utilize open source software wherever possible (including AstLinux, obviously).
3) The Internet was the only ubiquitous, high quality network to build a nationwide platform.
4) The Internet was only getting more ubiquitous, more reliable, and faster in coming months/years/decades/etc.
5) We were going to take advantage of as much of this as possible.
These were some pretty lofty goals. Remember, this is early 2006. Gmail was still invitation-only beta. Google docs didn’t exist. Amazon EC2 didn’t exist. “Cloud computing” hadn’t come back into fashion yet. The term itself didn’t exist. The Internet was considered (by many) to be “best effort”, “inherently unreliable”, and “unsuitable” for critical communications (such as real time business telephony). There were many naysayers who were confident this would be a miserable failure. As it turns out, they were almost right.
We thought the “secret sauce” to business grade voice over the internet was monitoring and management. If one could monitor and manage the internet connection business grade voice should be possible. Of course this is very ambiguous but it lead to several great hires. We hired
Joe had already deployed several embedded Asterisk systems to various businesses in the Sarasota area. They used an embedded version of Linux he patched together and a third party (unnamed) “carrier” to connect to the PSTN. The first step was upgrading these machines and getting them on AstLinux. Once this was accomplished we felt confident enough to proceed with our plan. This was Star2Star Communications and in the beginning of 2006 it looked something like this:
1) Soekris net4801 machines running AstLinux on the customer premise.
2) Grandstream GXP-2000 phones at each desk.
3) Connectivity to a third party “ITSP”.
4) Management/monitoring systems (check IP connectivity, phone availability, ITSP reliability, local LAN, etc).
5) Central provisioning of AstLinux systems, phones, etc.
This was Star2Star and there was something I really liked about it - it was simple. Anyone who knows me or knows of my projects (AstLinux, for example) has to know I favor simplicity whenever possible. Keep it simple, keep it simple, keep it simple (stupid).
As time went on we started to learn that maybe this was too simple. We didn’t have enough control. Out monitoring wasn’t as mature as it should be. We didn’t pick the right IP phones. These could be easily fixed. However, we soon realized our biggest mistake was architecture (or lack thereof). This wasn’t going to be an easy fix.
We couldn’t find an ITSP that offered a level of quality we considered to be acceptable. Very few ITSPs had any more experience with VoIP, SIP, and the internet than we did. More disturbing, however, was an almost complete lack of focus on quality and reliability. No process.
What we (quickly) discovered is the extremely low barrier to entry for ITSPs, especially back then. Virtually anyone could install Asterisk on a $100/mo box in a colo somewhere, buy dialtone from someone (who knows) and call themselves an ITSP. After going through several of these we discovered we needed to do it ourselves.
Even assuming we could solve the PSTN connectivity problem we discovered yet another issue. All of the monitoring and management in the world cannot make up for a terrible last mile. If the copper in the ground is rotting and the DSL modem can only negotiate 128kbps/128kbps that’s all you’re going to get. To make matters worse in the event of a cut or outage the customer would be down completely. While that may have always happened with the PSTN and an on premise PBX we considered this to be unacceptable.
So then, in the eleventh hour, just before launch I met with the original founders and posed a radical idea - scrap almost everything. There was a better way.
(Continued in Building a Startup (the right way))
Tuesday, October 25, 2011
Starting a Startup
I know I’ve apologized for being quiet in the past. This is not one of those times because (as you’ll soon find out) I’ve been hard at work and only now can I finally talk about it.
Six years ago I was spending most of my time working with Asterisk and AstLinux. I spent a lot of time promoting both - working the conference circuit, blogging, magazines, books, etc. Conferences are a great way to network and meet new people. I did just that. With each conference I attended came new business opportunities. Sure, not all of them were a slam dunk and eventually I started to pick and chose which conferences I considered worthy of the time and investment.
For anyone involved with Asterisk Astricon is certainly worthy of your time and energy - the mecca of the Asterisk community. Astricon was always a whirlwind and 2005 was no exception. We were in Anaheim, California and embedded Asterisk was starting to really heat up. I announced my port of AstLinux to Gumstix and announced the “World’s Smallest PBX”, leading to an interview and story in LinuxDevices. I worked a free community booth (thanks Astricon) with Dave Taht and was introduced to Captain Crunch (that’s another post for another day).
It was at Astricon in 2005 that I also met one of my soon to be business partners (although I certainly didn’t know it at the time). While I was promoting embedded Asterisk and AstLinux I met a man from Florida named Joe Rhem. Joe had come up with the idea of using embedded Asterisk systems as the cornerstone of a new way to provide business grade telephone services. Joe and I met for a few minutes and discussed the merits of embedded Asterisk. Unfortunately (and everyone already knows this) I don’t remember meeting with Joe. Like I said Astricon was always a whirlwind and I had these conversations with dozens if not hundreds of people at each show. I made my way through Astricon, made a pit stop in Santa Clara for (the now defunct) ISPCon and then returned home to Lake Geneva, WI with a stack of business cards, a few new stories, and a lot of work to finish (or start, depending on your perspective).
A couple of months later I received an e-mail from Joe Rhem discussing how he’d like to move forward with what we discussed in Anaheim. Joe had recruited another partner to lead the new venture. Norm Worthington was a successful serial entrepreneur and his offer to lead the company was the equivalent of “having General Patton lead your war effort”. After some catch up I was intrigued with Joe’s idea. A few hours on the phone later everyone was pretty comfortable with how this could work.
Now I just needed to fly to Sarasota, FL (where’s that - sounds nice, I thought) to meet with everyone, discuss terms, plan a relocation, and (most importantly) start putting the company, product, and technology together.
A short time later I found myself arriving in Sarasota. It was early January and I coming from Wisconsin I couldn’t believe how nice it was. Looking back on it I’m sure Norm and Joe were very confident I’d be joining them in Sarasota. Working with technology I love “in paradise”, how could I resist?
(Continued in Building a Startup)
Six years ago I was spending most of my time working with Asterisk and AstLinux. I spent a lot of time promoting both - working the conference circuit, blogging, magazines, books, etc. Conferences are a great way to network and meet new people. I did just that. With each conference I attended came new business opportunities. Sure, not all of them were a slam dunk and eventually I started to pick and chose which conferences I considered worthy of the time and investment.
For anyone involved with Asterisk Astricon is certainly worthy of your time and energy - the mecca of the Asterisk community. Astricon was always a whirlwind and 2005 was no exception. We were in Anaheim, California and embedded Asterisk was starting to really heat up. I announced my port of AstLinux to Gumstix and announced the “World’s Smallest PBX”, leading to an interview and story in LinuxDevices. I worked a free community booth (thanks Astricon) with Dave Taht and was introduced to Captain Crunch (that’s another post for another day).
It was at Astricon in 2005 that I also met one of my soon to be business partners (although I certainly didn’t know it at the time). While I was promoting embedded Asterisk and AstLinux I met a man from Florida named Joe Rhem. Joe had come up with the idea of using embedded Asterisk systems as the cornerstone of a new way to provide business grade telephone services. Joe and I met for a few minutes and discussed the merits of embedded Asterisk. Unfortunately (and everyone already knows this) I don’t remember meeting with Joe. Like I said Astricon was always a whirlwind and I had these conversations with dozens if not hundreds of people at each show. I made my way through Astricon, made a pit stop in Santa Clara for (the now defunct) ISPCon and then returned home to Lake Geneva, WI with a stack of business cards, a few new stories, and a lot of work to finish (or start, depending on your perspective).
A couple of months later I received an e-mail from Joe Rhem discussing how he’d like to move forward with what we discussed in Anaheim. Joe had recruited another partner to lead the new venture. Norm Worthington was a successful serial entrepreneur and his offer to lead the company was the equivalent of “having General Patton lead your war effort”. After some catch up I was intrigued with Joe’s idea. A few hours on the phone later everyone was pretty comfortable with how this could work.
Now I just needed to fly to Sarasota, FL (where’s that - sounds nice, I thought) to meet with everyone, discuss terms, plan a relocation, and (most importantly) start putting the company, product, and technology together.
A short time later I found myself arriving in Sarasota. It was early January and I coming from Wisconsin I couldn’t believe how nice it was. Looking back on it I’m sure Norm and Joe were very confident I’d be joining them in Sarasota. Working with technology I love “in paradise”, how could I resist?
(Continued in Building a Startup)
Tuesday, October 26, 2010
Breaking RFC compliance to improve monitoring
A colleague came to me today and had a troubling issue. He's using sipsak and nagios to monitor some SIP endpoints. Pretty standard so far, right? He noticed that when using UDP and checking on an endpoint that was completely offline sipsak would take over 30 seconds to finally return with an error. Meanwhile Nagios would block and wait for sipsak to return...
Without a simple command line option in sipsak that appeared to change this behavior, we had to enter the semi-complicated world of SIP timers. I feared that to change this behavior we'd have to do some things that might not necessarily be RFC compliant...
What's this? For once I'm actually suggesting you do something against the better advice of an RFC?
That's right, I am.
RFC3261 defines multiple timers and timeouts for messages and transactions. It says things like:
"If there is no final response for the original request in 64*T1 seconds"
"The UAC core considers the INVITE transaction completed 64*T1 seconds after the reception of the first 2xx response."
"The 2xx response is passed to the transport with an interval that starts at T1 seconds and doubles for each retransmission until it reaches T2 seconds"
Without even knowing what "T1" is you can start to see that it's a pretty important timing parameter and (more or less) serves as the father of all timeouts in SIP. Let's look at section 17 to find out what T1 is:
"The default value for T1 is 500 ms. T1 is an estimate of the RTT between the client and server transactions. Elements MAY (though it is NOT RECOMMENDED) use smaller values of T1 within closed, private networks that do not permit general Internet connection. T1 MAY be chosen larger, and this is RECOMMENDED if it is known in advance (such as on high latency access links) that the RTT is larger. Whatever the value of T1, the exponential backoffs on retransmissions described in this section MUST be used."
T1 is essentially a variable for RTT between two endpoints that serves as a multiplier for other timeouts. Unless we know better T1 should default to 500ms, which is quite high. Some implementations (such as Asterisk with the SIP peer qualify option) automatically send OPTIONS requests to endpoints in an effort to better determine RTT instead of using the RFC default of 500ms.
In reading through the sipsak source code it appeared to be RFC compliant for timing, using a default T1 value of 500ms and a transaction timeout value of 64*T1. This is why it was taking over 30 seconds (32 seconds to be exact) for sipsak to finally timeout and return the status code to nagios. This comes directly from the RFC:
"For any transport, the client transaction MUST start timer B with a value of 64*T1 seconds (Timer B controls transaction timeouts)."
This is all well and good but what happens when you don't have a way to dynamically determine T1 and you can't wait T1*64 (32s) for your results like my sipsak/nagios check earlier? Simple: you go renegade, throw out the RFC, and hack the sipsak source yourself!
So I had three options:
1) Change the default value of T1.
2) Change the value of T2 by changing the multiplier or setting a static timeout.
3) Some combination of both.
I decided to go with option #3 (RFC be damned). Why?
1) 500ms is crazy high for most of our endpoints. At a glance 100ms would be fine for ~90% of them. I'll pick 150ms.
2) I don't need that many retransmits. If the latency and/or packet loss is that bad I'm not going to wait (my RTP certainly isn't) and I just want to know about it that much quicker.
So I ended up with a quick easy patch to sipsak:
diff -urN sipsak-0.9.6.orig/sipsak.h sipsak-0.9.6/sipsak.h
--- sipsak-0.9.6.orig/sipsak.h 2006-01-28 16:11:50.000000000 -0500
+++ sipsak-0.9.6/sipsak.h 2010-10-26 18:38:45.000000000 -0400
@@ -102,11 +102,7 @@
# define FQDN_SIZE 100
#endif
-#ifdef HAVE_CONFIG_H
-# define SIP_T1 DEFAULT_TIMEOUT
-#else
-# define SIP_T1 500
-#endif
+#define SIP_T1 150
#define SIP_T2 8*SIP_T1
diff -urN sipsak-0.9.6.orig/transport.c sipsak-0.9.6/transport.c
--- sipsak-0.9.6.orig/transport.c 2006-01-28 16:11:34.000000000 -0500
+++ sipsak-0.9.6/transport.c 2010-10-26 18:38:51.000000000 -0400
@@ -286,7 +286,7 @@
}
}
senddiff = deltaT(&(srt->starttime), &(srt->recvtime));
- if (senddiff > (float)64 * (float)SIP_T1) {
+ if (senddiff > inv_final) {
if (timing == 0) {
if (verbose>0)
printf("*** giving up, no final response after %.3f ms\n", senddiff);
This changes the value of T1 to 150ms (more reasonable for most networks) and allows you to specify the number of retransmits (and thus the total timeout) using -D on the sipsak command line:
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D1 -v
** timeout after 150 ms**
*** giving up, no final response after 150.334 ms
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D2 -v
** timeout after 150 ms**
** timeout after 300 ms**
*** giving up, no final response after 460.612 ms
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D4 -v
** timeout after 150 ms**
** timeout after 300 ms**
** timeout after 600 ms**
*** giving up, no final response after 1071.137 ms
kkmac:sipsak-0.9.6-build kris$
Needless to say our monitoring situation is much improved.
Without a simple command line option in sipsak that appeared to change this behavior, we had to enter the semi-complicated world of SIP timers. I feared that to change this behavior we'd have to do some things that might not necessarily be RFC compliant...
What's this? For once I'm actually suggesting you do something against the better advice of an RFC?
That's right, I am.
RFC3261 defines multiple timers and timeouts for messages and transactions. It says things like:
"If there is no final response for the original request in 64*T1 seconds"
"The UAC core considers the INVITE transaction completed 64*T1 seconds after the reception of the first 2xx response."
"The 2xx response is passed to the transport with an interval that starts at T1 seconds and doubles for each retransmission until it reaches T2 seconds"
Without even knowing what "T1" is you can start to see that it's a pretty important timing parameter and (more or less) serves as the father of all timeouts in SIP. Let's look at section 17 to find out what T1 is:
"The default value for T1 is 500 ms. T1 is an estimate of the RTT between the client and server transactions. Elements MAY (though it is NOT RECOMMENDED) use smaller values of T1 within closed, private networks that do not permit general Internet connection. T1 MAY be chosen larger, and this is RECOMMENDED if it is known in advance (such as on high latency access links) that the RTT is larger. Whatever the value of T1, the exponential backoffs on retransmissions described in this section MUST be used."
T1 is essentially a variable for RTT between two endpoints that serves as a multiplier for other timeouts. Unless we know better T1 should default to 500ms, which is quite high. Some implementations (such as Asterisk with the SIP peer qualify option) automatically send OPTIONS requests to endpoints in an effort to better determine RTT instead of using the RFC default of 500ms.
In reading through the sipsak source code it appeared to be RFC compliant for timing, using a default T1 value of 500ms and a transaction timeout value of 64*T1. This is why it was taking over 30 seconds (32 seconds to be exact) for sipsak to finally timeout and return the status code to nagios. This comes directly from the RFC:
"For any transport, the client transaction MUST start timer B with a value of 64*T1 seconds (Timer B controls transaction timeouts)."
This is all well and good but what happens when you don't have a way to dynamically determine T1 and you can't wait T1*64 (32s) for your results like my sipsak/nagios check earlier? Simple: you go renegade, throw out the RFC, and hack the sipsak source yourself!
So I had three options:
1) Change the default value of T1.
2) Change the value of T2 by changing the multiplier or setting a static timeout.
3) Some combination of both.
I decided to go with option #3 (RFC be damned). Why?
1) 500ms is crazy high for most of our endpoints. At a glance 100ms would be fine for ~90% of them. I'll pick 150ms.
2) I don't need that many retransmits. If the latency and/or packet loss is that bad I'm not going to wait (my RTP certainly isn't) and I just want to know about it that much quicker.
So I ended up with a quick easy patch to sipsak:
diff -urN sipsak-0.9.6.orig/sipsak.h sipsak-0.9.6/sipsak.h
--- sipsak-0.9.6.orig/sipsak.h 2006-01-28 16:11:50.000000000 -0500
+++ sipsak-0.9.6/sipsak.h 2010-10-26 18:38:45.000000000 -0400
@@ -102,11 +102,7 @@
# define FQDN_SIZE 100
#endif
-#ifdef HAVE_CONFIG_H
-# define SIP_T1 DEFAULT_TIMEOUT
-#else
-# define SIP_T1 500
-#endif
+#define SIP_T1 150
#define SIP_T2 8*SIP_T1
diff -urN sipsak-0.9.6.orig/transport.c sipsak-0.9.6/transport.c
--- sipsak-0.9.6.orig/transport.c 2006-01-28 16:11:34.000000000 -0500
+++ sipsak-0.9.6/transport.c 2010-10-26 18:38:51.000000000 -0400
@@ -286,7 +286,7 @@
}
}
senddiff = deltaT(&(srt->starttime), &(srt->recvtime));
- if (senddiff > (float)64 * (float)SIP_T1) {
+ if (senddiff > inv_final) {
if (timing == 0) {
if (verbose>0)
printf("*** giving up, no final response after %.3f ms\n", senddiff);
This changes the value of T1 to 150ms (more reasonable for most networks) and allows you to specify the number of retransmits (and thus the total timeout) using -D on the sipsak command line:
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D1 -v
** timeout after 150 ms**
*** giving up, no final response after 150.334 ms
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D2 -v
** timeout after 150 ms**
** timeout after 300 ms**
*** giving up, no final response after 460.612 ms
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D4 -v
** timeout after 150 ms**
** timeout after 300 ms**
** timeout after 600 ms**
*** giving up, no final response after 1071.137 ms
kkmac:sipsak-0.9.6-build kris$
Needless to say our monitoring situation is much improved.
Thursday, August 5, 2010
A ClueCon Update
Cluecon is going very well this year... I spoke the first day and have spent the rest of my time here enjoying the presentations and interacting with the community.
A few highlights:
A few highlights:
- Perfect wireless provided by Meraki. I've never been to a tech conference where the wifi has kept up with the crowd. Well done.
- The Trump Tower. Phenomenal.
- FreeSWITCH HA support in Sofia! This is worthy of its own post and it will have one when I get back and play with it. In the meantime my guy Jay Binks has been working to document this exciting new feature.
- Chicago. I just LOVE this town.
Friday, May 21, 2010
A ClueCon Preview...
A while back I saw a preview for the new A-Team movie. While the movie itself looks horrible I was reminded of the original TV series with its many interesting characters and catch phrases. Among my personal favorites?
I love it when a plan comes together.
That's exactly how I feel with one of my "pet projects" from the past couple of months. Much like Hanibel and the A-Team I was up against formidable issues in trying to accomplish my task: implementing a flexible (very flexible), reasonably high performance LCR server that could be added to my existing architecture.
First I needed to select an LCR "engine". Multiple possibilities were considered but I left the final recommendation up to the DB and billing teams I work with. They selected mod_lcr from FreeSWITCH. While I was certain droute from OpenSIPS (or something similar) would have higher performance I accepted their recommendation. After playing with mod_lcr a bit I can also see its potential.
So now the question was: can FreeSWITCH respond with the proper SIP signaling (300 Multiple Choices)? Using the redirect application from mod_dptools it could not. I created a bounty to add multiple Contact/300 Multiple Choices functionality to FreeSWITCH. Tony had it implemented that day.
With the ability to respond properly I now had to get the data. Mod_lcr looked nice but it certainly wasn't designed for this application. All of the default syntax, tables, etc showed it being used with FreeSWITCH for FreeSWITCH. The tables and code used several bridge specific syntax examples. I hacked mod_lcr to return data to mod_dptools/redirect properly. A created a JIRA issue with my patch and a couple of days later Rupa had it committed.
So now FreeSWITCH could be a route server. All I needed to do was make sure OpenSIPS could route from what FreeSWITCH returned. Turns out it could not. RFC 3261 (section 21.3.1) states "...the SIP response MAY contain several Contact fields or a list of addresses in a Contact field." The Sofia stack from FreeSWITCH used multiple Contact headers, each with its own URI. OpenSIPS would only parse the first one returned. Sofia couldn't be changed easily so OpenSIPS would need to be changed (it was non-compliant anyway). Without this change there is no ability to handle multiple contacts and only the first would be used. It could be worse but obviously this wasn't good enough.
I contacted Bogdan from OpenSIPS to see what it would take to update the parser to handle multiple Contact headers. He indicated it would take four hours or so. Once he got back to me I had an OpenSIPS system that would handle multiple contact headers and create new branches from a failure route as desired.
So how did it all turn out? Well, you have two ways to hear the end of this story:
1) Attend ClueCon at the Trump Hotel in Chicago, IL in early August.
2) Wait until mid-August for an update here.
I'll make sure to post all of my materials - conference presentation, sipp scenarios for testing, OpenSIPS configuration, FreeSWITCH configuration, DB tweaks, etc.
Too late to make it to ClueCon this year? Just make sure to register next year, I'm sure I'll be there.
I love it when a plan comes together.
That's exactly how I feel with one of my "pet projects" from the past couple of months. Much like Hanibel and the A-Team I was up against formidable issues in trying to accomplish my task: implementing a flexible (very flexible), reasonably high performance LCR server that could be added to my existing architecture.
First I needed to select an LCR "engine". Multiple possibilities were considered but I left the final recommendation up to the DB and billing teams I work with. They selected mod_lcr from FreeSWITCH. While I was certain droute from OpenSIPS (or something similar) would have higher performance I accepted their recommendation. After playing with mod_lcr a bit I can also see its potential.
So now the question was: can FreeSWITCH respond with the proper SIP signaling (300 Multiple Choices)? Using the redirect application from mod_dptools it could not. I created a bounty to add multiple Contact/300 Multiple Choices functionality to FreeSWITCH. Tony had it implemented that day.
With the ability to respond properly I now had to get the data. Mod_lcr looked nice but it certainly wasn't designed for this application. All of the default syntax, tables, etc showed it being used with FreeSWITCH for FreeSWITCH. The tables and code used several bridge specific syntax examples. I hacked mod_lcr to return data to mod_dptools/redirect properly. A created a JIRA issue with my patch and a couple of days later Rupa had it committed.
So now FreeSWITCH could be a route server. All I needed to do was make sure OpenSIPS could route from what FreeSWITCH returned. Turns out it could not. RFC 3261 (section 21.3.1) states "...the SIP response MAY contain several Contact fields or a list of addresses in a Contact field." The Sofia stack from FreeSWITCH used multiple Contact headers, each with its own URI. OpenSIPS would only parse the first one returned. Sofia couldn't be changed easily so OpenSIPS would need to be changed (it was non-compliant anyway). Without this change there is no ability to handle multiple contacts and only the first would be used. It could be worse but obviously this wasn't good enough.
I contacted Bogdan from OpenSIPS to see what it would take to update the parser to handle multiple Contact headers. He indicated it would take four hours or so. Once he got back to me I had an OpenSIPS system that would handle multiple contact headers and create new branches from a failure route as desired.
So how did it all turn out? Well, you have two ways to hear the end of this story:
1) Attend ClueCon at the Trump Hotel in Chicago, IL in early August.
2) Wait until mid-August for an update here.
I'll make sure to post all of my materials - conference presentation, sipp scenarios for testing, OpenSIPS configuration, FreeSWITCH configuration, DB tweaks, etc.
Too late to make it to ClueCon this year? Just make sure to register next year, I'm sure I'll be there.
Subscribe to:
Posts (Atom)