As someone who often finds themselves "in the trenches" dealing with some extremely nerdy technical nuances it's often easy to miss the larger picture. I guess Mark Spencer was right when he said "Not many people get excited about telephones but the ones who do get REALLY excited about telephones".
As someone who's natural inclination is to get stuck in the details I certainly understand this. Some of you might be right there with me. At this point I've gotten so specialized I'm next to useless on some pretty basic "computer things". Think of the aunt or other relative/friend that lights up when they find out you're a "computer guy". Then they inevitably pull you aside at a wedding to ask for help with their printer or "some box that pops up in Windows". I'm happy to not have to feign ignorance any longer: I truly am ignorant on issues like these.
I primarily use a Mac because it just works - for me. That's not the point I'm trying to make here, though. A Mac works so well for me because I use just two applications: a terminal (Iterm2, to be exact) and Google Chrome. Ok, ok every once in a while I whip up some crazy Wireshark display syntax but that's another post for another day. For the most part when I need to work I take out my Mac, it comes out of sleep, connects to a network, and I start working. It's a tool.
As far as anything with a GUI goes that's the extent of my "expertise". If my aunt wanted to ask me about my bash_profile, screenrc settings, IPv4 address exhaustion, or SIP network architecture I may have something to say. Other than that you'll find me speaking in vague generalities than may lead the more paranoid to suspect I'm secretly a double for the CIA or some international crime syndicate member. I wish I were kidding, this has actually happened before although "international crime syndicate" usually gets loosely translated to "drug dealer". How else does a supposed "computer guy" not understand what's wrong with my printer?!?!
As usual there's a point to all of this. My hyperspecialization, in this case, allows me to forget what is really going on all around me: a shakeup in the 100 year old industry I find myself in and a change in the way we communicate.
The evolution of the telephone is a strange thing. It is a device and service that has remained largely unchanged for 100 years. I'm not kidding. To this day, in some parts of the United States, the only telephone service available could be installed by Alexander Graham Bell himself. Sure there have been many advances since the 1900s but they've been incremental improvements at best - digital services with the same voice bandwidth (dating to 1972), various capacity and engineering changes, and of course - the cell phone.
In the end, however, we're left with a service that isn't much different than what my grandparents had. You still have to phonetically spell various upper-frequency consonants ("S as in Sam, P as in Paul, T as in Tom") because the upper limit of the voice bandwidth on these services is ridiculously low (3.1 kHz). Straining to hear the party at the remote end of a phone has only gotten worse with various digital compression standards in use today - EVRC, AMR, G.729, etc. I love to compare the "pin drop" Sprint commercials of the 80s and 90s to the Verizon Wireless "CAN YOU HEAR ME NOW?" campaign over 20 years later. We still dial by randomly assigned strings of 10 digit numbers. This is supposedly progress?
One thing that has changed - the network has gotten bigger. Much bigger. My grandparents may have not had much use for their party line because they didn't have anyone of interest to talk to on the other end. In this manner the network has exploded - and it has exploded using the same standards that have been in place for these past 100 years. I can directly dial a cell phone on the other side of the world and be connected in seconds.
Meanwhile, there has been another network explosion - IP networks and the internet. The internet, of course, needs no introduction. While I'd love to spend some time talking about IP that's time I don't have at this point. Let's just look at a couple of ways IP has been extremely disruptive for this 100 year old franchise.
Not many people outside of telecom noticed it at the time but back in 2009 AT&T (THE AT&T) petitioned the FCC to decommission the legacy PSTN (copper and pairs and what-not). Just over two years later we're starting to see some results, and AT&T is realizing some ancillary benefits.
As someone who has spent some time (not a lot, thankfully) in these central offices the maze of patch cables, wiring blocks, DC battery banks, etc make you really appreciate the analysis of this report. Normally networks are completely faceless - you go to www.google.com or dial 8005551212 without seeing the equipment that gets you to the other end. The fact that SBC reclaimed as much as 250 MILLION square feet by eliminating this legacy equipment is incredible.
That's all well and good but what has AT&T done for us, the users? The answer is, unfortunately, both good and bad. AT&T like many physical, trench-digging network providers, has realized they are in the business of providing IP connectivity. They don't have much of a product anymore and the product they do have is becoming more and more of a commodity everyday.
Getting out of the way is the smartest thing they could be doing. Speaking of AT&T, remember the Apple iPhone deal? At the time a cell phone was a cell phone - AT&T provided an IP network and got some minutes but Apple built an application platform and changed the way people view the devices they carry with them everywhere they go. Huge.
Watch any sci-fi movie from the past 50 years and one almost ubiquitous "innovation" is the video phone. Did AT&T or some other 100 year old company provide the video phone for baby's first steps to be beamed to Grandma across the country? No - Apple did it with Facetime and a little company from Estonia (Skype) did it over the internet. Thanks to these companies and IP networks we finally have video conferencing (maybe they'll release a 30th anniversary edition of Blade Runner to celebrate).
Unfortunately, there will always be people that cling to technologies of days past - this new network works well for all of these applications that were designed for it. Meanwhile, some technologies are being shoehorned in with disastrous results. Has anyone noticed faxing has actually gotten LESS reliable over the past several years? That's what happens when you try to use decades-old modem designs on a completely different network. You might as well try to burn diesel in your gasoline engine.
The future is the network and the network (regardless of physical access medium) is IP.
And now, for good measure, here are some random links for further reading:
Graves on SOHO Technology - An early advocate of HD Voice, etc.
The Voice Communication Exchange - Wants to push the world to HD Voice by 2018.
I created AstLinux but I write and rant about a lot of other things here. Mostly rants about SIP and the other various technologies I deal with on a daily basis.
Monday, February 13, 2012
Tuesday, December 20, 2011
Performance Testing (Part 1)
Over the past few years (like many other people in this business) I’ve needed to do performance testing. Open source software is great but this is one place where you need to do your own leg work. This conundrum first presented itself in the Asterisk community. There are literally thousands of variables that can affect system the performance of Asterisk, FreeSWITCH, or any other software solution. In no particular order:
- Configuration. Which modules do you have loaded? How are they configured? If you’re using Kamailio, do you do hundreds of huge, slow, nasty DB queries for each call setup? How is your logging configured? Maybe you use Asterisk or FreeSWITCH and so several system calls, DB lookups, LUA scripts, etc? Even the slightest misstep in configuration (synchronous syslogging with Kamailio, for example) can reduce your performance by 90%.
- Features in use. Paging groups (unicast) are notorious for destroying performance on standard hardware - every call needs to be setup individually, you need to handle RTP, and some audio mixing is involved. Hardware that can’t do 10 members in a page group using Asterisk or FreeSWITCH may be capable of hundreds of sessions using Kamailio with no media.
- Standard performance metrics. “Thousands of calls” you say? How many calls per second? Are you transcoding? Maybe you’re not handling any media at all? What is the delay in call setup?
- Hardware. This may seem obvious (MORE HERTZ) but even then there are issues... If you’re handling RTP, what are you using for timing? If you have lots of RTP, which network card are you using? Does it and your kernel support MSI or MSI-X for better interrupt handling? Can you load balance IRQs across cores? How efficient (or buggy) is the driver (Realtek I’m looking at you)?!?
- The “guano” effect. As features are added to the underlying toolkit (Asterisk, FreeSWITCH, etc) and to your configuration, how is performance affected over time? Add a feature here, and a feature there - and repeat. Over the months and years (even with faster hardware) you may find that each “little” feature reduced call capacity by 5%. Or maybe your calls per second went down by two each time. Not a big deal overall yet over time this adds up - assuming no other optimizations your call capacity could be down by 50% after ten “minor” changes. It adds up - it really does.
Even when pointing out all of these issues you’d still be surprised how often one is faced with the question “Well yeah but how many calls can I handle on my dual core Dell server?”.
In almost every case the best answer is “Get your hardware, develop your solution, run sipp against it and see what happens”. That’s really about as good as we can do.
SIPP is a great example of a typical, high quality open source tool. In true “Unix philosophy” it does one thing and it does it well: SIP performance testing. SIPP can be configured to initiate (or receive) just about any conceivable SIP scenario - from simple INVITE call handling to full SIMPLE test cases. In these tests SIPP will tell you call setup time, messages received, successful dialogs, etc.
SIPP even goes a step further and includes some support for RTP. SIPP has the ability to echo RTP from the remote end or even replay RTP from a PCAP file you have saved to disk. This is where SIPP starts to show some deficiencies. Again, you can’t blame SIPP because SIPP is a SIP performance testing tool - it does that and it does it well. RTP testing leaves a lot to be desired. First of all, you’re on your own when it comes to manipulating any of the PCAP parameters. Length, content, codec, payload types, etc, etc need to be configured separately. This isn’t a problem, necesarily, as there are various open source tools to help you with some of these tasks. I won’t get into all of them here but they too leave something to be desired.
What about analyzing the quality of the RTP streams? SIPP provides mechanisms to measure various SIP “quality” metrics - SIP response times, SIP retransmits, etc. With RTP you’re on your own. Once again, sure, you could setup tshark on a SPAN port (or something) to do RTP stream analysis on every stream but this would be tedious and (once again) subject you to some of the harsh realities of processing a tremendous amount of small packets in software on commodity hardware.
Let’s face it - for a typical B2BUA handling RTP the numbers add up very quickly - let’s assume 20ms packetization for the following:
Single RTP stream = 50 packets per second (pps)
Bi-directional RTP stream = 100 pps
A-leg bi-directional RTP stream = 100 pps
B-leg bi-directional RTP stream = 100 pps
A leg + B leg = 200 pps PER CALL
What does this look like with 10,000 channels (using g711u)?
952 mbit/s (close to Gigabit wire speed) in each direction
1,000,000 (total) packets per second
Open source software is great - it provides us with the tools to (ultimately) build services and businesses. Many of us choose what to focus on (our core competency). At Star2Star we provide business grade communication services and we spend a lot of time and energy to build these services because it’s what we do. We don’t sell, manufacture, or support testing platforms.
At this point some of you may be getting an idea... Why don’t I build/design an open source testing solution? It’s a good question and while I don’t want to crush your dreams there are some harsh realities:
1) This gets insanely complicated, quickly. Anyone who follows this blog knows SIP itself is complicated enough.
2) Scaling becomes a concern (as noted above).
3) Who would use it?
The last question is probably the most serious - who really needs the ability to initiate 10,000 SIP channels at 100 calls per second while monitoring RTP stream quality, etc? SIP carriers? SIP equipment manufacturers? A few SIP software developers? How large is the market? What kind of investment would be required to even get the project off the ground? What does the competition look like? While I don’t have the answers to most of these questions I can answer the last one.
Commercial SIP testing equipment is available from a few vendors:
Spirent
Empirix
Ixia
...and I’m sure others. We evaluated a few of these solutions and I’ll be talking more about them in a follow-up post in the near future.
- Configuration. Which modules do you have loaded? How are they configured? If you’re using Kamailio, do you do hundreds of huge, slow, nasty DB queries for each call setup? How is your logging configured? Maybe you use Asterisk or FreeSWITCH and so several system calls, DB lookups, LUA scripts, etc? Even the slightest misstep in configuration (synchronous syslogging with Kamailio, for example) can reduce your performance by 90%.
- Features in use. Paging groups (unicast) are notorious for destroying performance on standard hardware - every call needs to be setup individually, you need to handle RTP, and some audio mixing is involved. Hardware that can’t do 10 members in a page group using Asterisk or FreeSWITCH may be capable of hundreds of sessions using Kamailio with no media.
- Standard performance metrics. “Thousands of calls” you say? How many calls per second? Are you transcoding? Maybe you’re not handling any media at all? What is the delay in call setup?
- Hardware. This may seem obvious (MORE HERTZ) but even then there are issues... If you’re handling RTP, what are you using for timing? If you have lots of RTP, which network card are you using? Does it and your kernel support MSI or MSI-X for better interrupt handling? Can you load balance IRQs across cores? How efficient (or buggy) is the driver (Realtek I’m looking at you)?!?
- The “guano” effect. As features are added to the underlying toolkit (Asterisk, FreeSWITCH, etc) and to your configuration, how is performance affected over time? Add a feature here, and a feature there - and repeat. Over the months and years (even with faster hardware) you may find that each “little” feature reduced call capacity by 5%. Or maybe your calls per second went down by two each time. Not a big deal overall yet over time this adds up - assuming no other optimizations your call capacity could be down by 50% after ten “minor” changes. It adds up - it really does.
Even when pointing out all of these issues you’d still be surprised how often one is faced with the question “Well yeah but how many calls can I handle on my dual core Dell server?”.
In almost every case the best answer is “Get your hardware, develop your solution, run sipp against it and see what happens”. That’s really about as good as we can do.
SIPP is a great example of a typical, high quality open source tool. In true “Unix philosophy” it does one thing and it does it well: SIP performance testing. SIPP can be configured to initiate (or receive) just about any conceivable SIP scenario - from simple INVITE call handling to full SIMPLE test cases. In these tests SIPP will tell you call setup time, messages received, successful dialogs, etc.
SIPP even goes a step further and includes some support for RTP. SIPP has the ability to echo RTP from the remote end or even replay RTP from a PCAP file you have saved to disk. This is where SIPP starts to show some deficiencies. Again, you can’t blame SIPP because SIPP is a SIP performance testing tool - it does that and it does it well. RTP testing leaves a lot to be desired. First of all, you’re on your own when it comes to manipulating any of the PCAP parameters. Length, content, codec, payload types, etc, etc need to be configured separately. This isn’t a problem, necesarily, as there are various open source tools to help you with some of these tasks. I won’t get into all of them here but they too leave something to be desired.
What about analyzing the quality of the RTP streams? SIPP provides mechanisms to measure various SIP “quality” metrics - SIP response times, SIP retransmits, etc. With RTP you’re on your own. Once again, sure, you could setup tshark on a SPAN port (or something) to do RTP stream analysis on every stream but this would be tedious and (once again) subject you to some of the harsh realities of processing a tremendous amount of small packets in software on commodity hardware.
Let’s face it - for a typical B2BUA handling RTP the numbers add up very quickly - let’s assume 20ms packetization for the following:
Single RTP stream = 50 packets per second (pps)
Bi-directional RTP stream = 100 pps
A-leg bi-directional RTP stream = 100 pps
B-leg bi-directional RTP stream = 100 pps
A leg + B leg = 200 pps PER CALL
What does this look like with 10,000 channels (using g711u)?
952 mbit/s (close to Gigabit wire speed) in each direction
1,000,000 (total) packets per second
Open source software is great - it provides us with the tools to (ultimately) build services and businesses. Many of us choose what to focus on (our core competency). At Star2Star we provide business grade communication services and we spend a lot of time and energy to build these services because it’s what we do. We don’t sell, manufacture, or support testing platforms.
At this point some of you may be getting an idea... Why don’t I build/design an open source testing solution? It’s a good question and while I don’t want to crush your dreams there are some harsh realities:
1) This gets insanely complicated, quickly. Anyone who follows this blog knows SIP itself is complicated enough.
2) Scaling becomes a concern (as noted above).
3) Who would use it?
The last question is probably the most serious - who really needs the ability to initiate 10,000 SIP channels at 100 calls per second while monitoring RTP stream quality, etc? SIP carriers? SIP equipment manufacturers? A few SIP software developers? How large is the market? What kind of investment would be required to even get the project off the ground? What does the competition look like? While I don’t have the answers to most of these questions I can answer the last one.
Commercial SIP testing equipment is available from a few vendors:
Spirent
Empirix
Ixia
...and I’m sure others. We evaluated a few of these solutions and I’ll be talking more about them in a follow-up post in the near future.
Stay tuned because this series is going to be good!
Friday, December 2, 2011
Star2Star Gets Noticed
Just a quick one today (and some shameless self promotion on my part)... Star2Star has been recognized on a few "lists" this year, check it out:
Inc 500
Forbes 100 "Most Promising"
I'm lucky enough to tell people the same story all of the time - when I was a little kid I played with all of this stuff because I thought it was fun and I loved it. Only later did I realize that one day I'd be getting paid for it. I certainly never thought it could come to this!
Ok, enough of that for now. I'll be getting back to some tech stuff soon...
Inc 500
Forbes 100 "Most Promising"
I'm lucky enough to tell people the same story all of the time - when I was a little kid I played with all of this stuff because I thought it was fun and I loved it. Only later did I realize that one day I'd be getting paid for it. I certainly never thought it could come to this!
Ok, enough of that for now. I'll be getting back to some tech stuff soon...
Tuesday, November 15, 2011
Building a Startup (the right way)
(Continued from Building a Startup)
Our way wasn’t working. To put it mildly our “business grade” solution didn’t perform much better than Vonage. We became to exemplify VoIP - jittery calls, dropped calls, one way calls, etc, etc, etc. Most of this was because of the lack of quality ITSPs at that time. Either way our customers didn’t care. It was us. If we went to market with what we had the first time around we were going to loose.
The problem was the other predominant architecture at the time was “hosted”. Someone hosts a PBX for you and ships you some phones. You plug them in behind your router and magically you have a phone system. They weren’t doing much better. Sure, their sales looked good but even then it was becoming obvious customer churn was quite high. People didn’t like hosted either, and for good reason. Typically they have less control over the call than we do.
As I’ve eluded before I thought there was a better way. We needed to host the voice applications where it made the most “sense”. We were primarily using Asterisk and with a little creative provisioning, a kick-ass SIP proxy, and enough Asterisk machines we could build the perfect business PBX - even if that meant virtually none of it existed at the customer premise. Or maybe all of it did. That flexibility was key. After a lot of discussions, whiteboard sessions, and late nights everyone agreed. We needed a do-over.
So we got to work and slowly our new architecture began to take shape. We added a kick-ass SIP proxy (OpenSER). OpenSER would power the core routing between various Asterisk servers each meeting different needs - IVR/Auto Attendant, Conferencing, Voicemail, remote phones (for “hosted” phones/softphones), etc. The beauty was the SIP proxy could route between all of these different systems including the original AstLinux system at the customer premise. Customer needs to call voicemail? No problem - the AstLinux system at the CPE fires an INVITE off to the proxy and the proxy figures out where their voicemail server is. The call is connected and the media goes directly between the two endpoints. Same thing for calls between any two points on the network - AstLinux CPE to AstLinux CPE, PSTN to voicemail, IVR to conference.
This is a good time to take a break and acknowledge what really made this all possible - OpenSER. While it’s difficult to explain the exact history and family tree with any piece of SER software I can tell you one thing - this company would not be possible without it. There is no question in my mind. It’s now 2011 and whether you select Kamailio or OpenSIPS for your SIP project you will not be sorry. Even after five years you will not find a more capable, flexible, scalable piece of SIP server software. It was one of the best decisions we ever made.
Need to add another server to meet demand for IVR? No problem, bring another server online, add the IP to a table and presto - you’re now taking calls on your new IVR. Eventually a new IVR lead to several new IVRs, voicemail servers, conference systems, web portals, mail servers, various monitoring systems, etc.
What about infrastructure? Starting at our small scale, regional footprint, and focus on quality we began to buy our own PRIs and running them on a couple of Cisco AS5350XM gateways. This got us past our initial issues with questionable ITSPs. Bandwidth was becoming another problem... We had an excellent colocation provider that provided blended bandwidth but still we needed more control. Here came BGP, ARIN, AS numbers, a pair of Cisco 7206VXRs w/ G2s, iBGP, multiple upstream providers, etc.
At times I would wonder - whatever happened to spending my time worrying about cross compilers? Looking back I’m not sure which was worse - GNU autoconf cross-compiling hell or SIP interop, BGP, etc. It’s fairly safe to say I’m a sadomasochist either way.
Even with all of the pain, missteps, and work we finally had an architecture to take to market. It would be the architecture that would serve us well for several years. Of course there was more work to be done...
The problem was the other predominant architecture at the time was “hosted”. Someone hosts a PBX for you and ships you some phones. You plug them in behind your router and magically you have a phone system. They weren’t doing much better. Sure, their sales looked good but even then it was becoming obvious customer churn was quite high. People didn’t like hosted either, and for good reason. Typically they have less control over the call than we do.
As I’ve eluded before I thought there was a better way. We needed to host the voice applications where it made the most “sense”. We were primarily using Asterisk and with a little creative provisioning, a kick-ass SIP proxy, and enough Asterisk machines we could build the perfect business PBX - even if that meant virtually none of it existed at the customer premise. Or maybe all of it did. That flexibility was key. After a lot of discussions, whiteboard sessions, and late nights everyone agreed. We needed a do-over.
So we got to work and slowly our new architecture began to take shape. We added a kick-ass SIP proxy (OpenSER). OpenSER would power the core routing between various Asterisk servers each meeting different needs - IVR/Auto Attendant, Conferencing, Voicemail, remote phones (for “hosted” phones/softphones), etc. The beauty was the SIP proxy could route between all of these different systems including the original AstLinux system at the customer premise. Customer needs to call voicemail? No problem - the AstLinux system at the CPE fires an INVITE off to the proxy and the proxy figures out where their voicemail server is. The call is connected and the media goes directly between the two endpoints. Same thing for calls between any two points on the network - AstLinux CPE to AstLinux CPE, PSTN to voicemail, IVR to conference.
This is a good time to take a break and acknowledge what really made this all possible - OpenSER. While it’s difficult to explain the exact history and family tree with any piece of SER software I can tell you one thing - this company would not be possible without it. There is no question in my mind. It’s now 2011 and whether you select Kamailio or OpenSIPS for your SIP project you will not be sorry. Even after five years you will not find a more capable, flexible, scalable piece of SIP server software. It was one of the best decisions we ever made.
Need to add another server to meet demand for IVR? No problem, bring another server online, add the IP to a table and presto - you’re now taking calls on your new IVR. Eventually a new IVR lead to several new IVRs, voicemail servers, conference systems, web portals, mail servers, various monitoring systems, etc.
What about infrastructure? Starting at our small scale, regional footprint, and focus on quality we began to buy our own PRIs and running them on a couple of Cisco AS5350XM gateways. This got us past our initial issues with questionable ITSPs. Bandwidth was becoming another problem... We had an excellent colocation provider that provided blended bandwidth but still we needed more control. Here came BGP, ARIN, AS numbers, a pair of Cisco 7206VXRs w/ G2s, iBGP, multiple upstream providers, etc.
At times I would wonder - whatever happened to spending my time worrying about cross compilers? Looking back I’m not sure which was worse - GNU autoconf cross-compiling hell or SIP interop, BGP, etc. It’s fairly safe to say I’m a sadomasochist either way.
Even with all of the pain, missteps, and work we finally had an architecture to take to market. It would be the architecture that would serve us well for several years. Of course there was more work to be done...
Wednesday, November 2, 2011
Building a Startup
(Continued from Starting a Startup)
After several days of meetings in Sarasota we determined:
1) I was moving to Sarasota to start a company with Norm and Joe.
2) We were going to utilize open source software wherever possible (including AstLinux, obviously).
3) The Internet was the only ubiquitous, high quality network to build a nationwide platform.
4) The Internet was only getting more ubiquitous, more reliable, and faster in coming months/years/decades/etc.
5) We were going to take advantage of as much of this as possible.
These were some pretty lofty goals. Remember, this is early 2006. Gmail was still invitation-only beta. Google docs didn’t exist. Amazon EC2 didn’t exist. “Cloud computing” hadn’t come back into fashion yet. The term itself didn’t exist. The Internet was considered (by many) to be “best effort”, “inherently unreliable”, and “unsuitable” for critical communications (such as real time business telephony). There were many naysayers who were confident this would be a miserable failure. As it turns out, they were almost right.
We thought the “secret sauce” to business grade voice over the internet was monitoring and management. If one could monitor and manage the internet connection business grade voice should be possible. Of course this is very ambiguous but it lead to several great hires. We hired
Joe had already deployed several embedded Asterisk systems to various businesses in the Sarasota area. They used an embedded version of Linux he patched together and a third party (unnamed) “carrier” to connect to the PSTN. The first step was upgrading these machines and getting them on AstLinux. Once this was accomplished we felt confident enough to proceed with our plan. This was Star2Star Communications and in the beginning of 2006 it looked something like this:
1) Soekris net4801 machines running AstLinux on the customer premise.
2) Grandstream GXP-2000 phones at each desk.
3) Connectivity to a third party “ITSP”.
4) Management/monitoring systems (check IP connectivity, phone availability, ITSP reliability, local LAN, etc).
5) Central provisioning of AstLinux systems, phones, etc.
This was Star2Star and there was something I really liked about it - it was simple. Anyone who knows me or knows of my projects (AstLinux, for example) has to know I favor simplicity whenever possible. Keep it simple, keep it simple, keep it simple (stupid).
As time went on we started to learn that maybe this was too simple. We didn’t have enough control. Out monitoring wasn’t as mature as it should be. We didn’t pick the right IP phones. These could be easily fixed. However, we soon realized our biggest mistake was architecture (or lack thereof). This wasn’t going to be an easy fix.
We couldn’t find an ITSP that offered a level of quality we considered to be acceptable. Very few ITSPs had any more experience with VoIP, SIP, and the internet than we did. More disturbing, however, was an almost complete lack of focus on quality and reliability. No process.
What we (quickly) discovered is the extremely low barrier to entry for ITSPs, especially back then. Virtually anyone could install Asterisk on a $100/mo box in a colo somewhere, buy dialtone from someone (who knows) and call themselves an ITSP. After going through several of these we discovered we needed to do it ourselves.
Even assuming we could solve the PSTN connectivity problem we discovered yet another issue. All of the monitoring and management in the world cannot make up for a terrible last mile. If the copper in the ground is rotting and the DSL modem can only negotiate 128kbps/128kbps that’s all you’re going to get. To make matters worse in the event of a cut or outage the customer would be down completely. While that may have always happened with the PSTN and an on premise PBX we considered this to be unacceptable.
So then, in the eleventh hour, just before launch I met with the original founders and posed a radical idea - scrap almost everything. There was a better way.
(Continued in Building a Startup (the right way))
1) I was moving to Sarasota to start a company with Norm and Joe.
2) We were going to utilize open source software wherever possible (including AstLinux, obviously).
3) The Internet was the only ubiquitous, high quality network to build a nationwide platform.
4) The Internet was only getting more ubiquitous, more reliable, and faster in coming months/years/decades/etc.
5) We were going to take advantage of as much of this as possible.
These were some pretty lofty goals. Remember, this is early 2006. Gmail was still invitation-only beta. Google docs didn’t exist. Amazon EC2 didn’t exist. “Cloud computing” hadn’t come back into fashion yet. The term itself didn’t exist. The Internet was considered (by many) to be “best effort”, “inherently unreliable”, and “unsuitable” for critical communications (such as real time business telephony). There were many naysayers who were confident this would be a miserable failure. As it turns out, they were almost right.
We thought the “secret sauce” to business grade voice over the internet was monitoring and management. If one could monitor and manage the internet connection business grade voice should be possible. Of course this is very ambiguous but it lead to several great hires. We hired
Joe had already deployed several embedded Asterisk systems to various businesses in the Sarasota area. They used an embedded version of Linux he patched together and a third party (unnamed) “carrier” to connect to the PSTN. The first step was upgrading these machines and getting them on AstLinux. Once this was accomplished we felt confident enough to proceed with our plan. This was Star2Star Communications and in the beginning of 2006 it looked something like this:
1) Soekris net4801 machines running AstLinux on the customer premise.
2) Grandstream GXP-2000 phones at each desk.
3) Connectivity to a third party “ITSP”.
4) Management/monitoring systems (check IP connectivity, phone availability, ITSP reliability, local LAN, etc).
5) Central provisioning of AstLinux systems, phones, etc.
This was Star2Star and there was something I really liked about it - it was simple. Anyone who knows me or knows of my projects (AstLinux, for example) has to know I favor simplicity whenever possible. Keep it simple, keep it simple, keep it simple (stupid).
As time went on we started to learn that maybe this was too simple. We didn’t have enough control. Out monitoring wasn’t as mature as it should be. We didn’t pick the right IP phones. These could be easily fixed. However, we soon realized our biggest mistake was architecture (or lack thereof). This wasn’t going to be an easy fix.
We couldn’t find an ITSP that offered a level of quality we considered to be acceptable. Very few ITSPs had any more experience with VoIP, SIP, and the internet than we did. More disturbing, however, was an almost complete lack of focus on quality and reliability. No process.
What we (quickly) discovered is the extremely low barrier to entry for ITSPs, especially back then. Virtually anyone could install Asterisk on a $100/mo box in a colo somewhere, buy dialtone from someone (who knows) and call themselves an ITSP. After going through several of these we discovered we needed to do it ourselves.
Even assuming we could solve the PSTN connectivity problem we discovered yet another issue. All of the monitoring and management in the world cannot make up for a terrible last mile. If the copper in the ground is rotting and the DSL modem can only negotiate 128kbps/128kbps that’s all you’re going to get. To make matters worse in the event of a cut or outage the customer would be down completely. While that may have always happened with the PSTN and an on premise PBX we considered this to be unacceptable.
So then, in the eleventh hour, just before launch I met with the original founders and posed a radical idea - scrap almost everything. There was a better way.
(Continued in Building a Startup (the right way))
Tuesday, October 25, 2011
Starting a Startup
I know I’ve apologized for being quiet in the past. This is not one of those times because (as you’ll soon find out) I’ve been hard at work and only now can I finally talk about it.
Six years ago I was spending most of my time working with Asterisk and AstLinux. I spent a lot of time promoting both - working the conference circuit, blogging, magazines, books, etc. Conferences are a great way to network and meet new people. I did just that. With each conference I attended came new business opportunities. Sure, not all of them were a slam dunk and eventually I started to pick and chose which conferences I considered worthy of the time and investment.
For anyone involved with Asterisk Astricon is certainly worthy of your time and energy - the mecca of the Asterisk community. Astricon was always a whirlwind and 2005 was no exception. We were in Anaheim, California and embedded Asterisk was starting to really heat up. I announced my port of AstLinux to Gumstix and announced the “World’s Smallest PBX”, leading to an interview and story in LinuxDevices. I worked a free community booth (thanks Astricon) with Dave Taht and was introduced to Captain Crunch (that’s another post for another day).
It was at Astricon in 2005 that I also met one of my soon to be business partners (although I certainly didn’t know it at the time). While I was promoting embedded Asterisk and AstLinux I met a man from Florida named Joe Rhem. Joe had come up with the idea of using embedded Asterisk systems as the cornerstone of a new way to provide business grade telephone services. Joe and I met for a few minutes and discussed the merits of embedded Asterisk. Unfortunately (and everyone already knows this) I don’t remember meeting with Joe. Like I said Astricon was always a whirlwind and I had these conversations with dozens if not hundreds of people at each show. I made my way through Astricon, made a pit stop in Santa Clara for (the now defunct) ISPCon and then returned home to Lake Geneva, WI with a stack of business cards, a few new stories, and a lot of work to finish (or start, depending on your perspective).
A couple of months later I received an e-mail from Joe Rhem discussing how he’d like to move forward with what we discussed in Anaheim. Joe had recruited another partner to lead the new venture. Norm Worthington was a successful serial entrepreneur and his offer to lead the company was the equivalent of “having General Patton lead your war effort”. After some catch up I was intrigued with Joe’s idea. A few hours on the phone later everyone was pretty comfortable with how this could work.
Now I just needed to fly to Sarasota, FL (where’s that - sounds nice, I thought) to meet with everyone, discuss terms, plan a relocation, and (most importantly) start putting the company, product, and technology together.
A short time later I found myself arriving in Sarasota. It was early January and I coming from Wisconsin I couldn’t believe how nice it was. Looking back on it I’m sure Norm and Joe were very confident I’d be joining them in Sarasota. Working with technology I love “in paradise”, how could I resist?
(Continued in Building a Startup)
Six years ago I was spending most of my time working with Asterisk and AstLinux. I spent a lot of time promoting both - working the conference circuit, blogging, magazines, books, etc. Conferences are a great way to network and meet new people. I did just that. With each conference I attended came new business opportunities. Sure, not all of them were a slam dunk and eventually I started to pick and chose which conferences I considered worthy of the time and investment.
For anyone involved with Asterisk Astricon is certainly worthy of your time and energy - the mecca of the Asterisk community. Astricon was always a whirlwind and 2005 was no exception. We were in Anaheim, California and embedded Asterisk was starting to really heat up. I announced my port of AstLinux to Gumstix and announced the “World’s Smallest PBX”, leading to an interview and story in LinuxDevices. I worked a free community booth (thanks Astricon) with Dave Taht and was introduced to Captain Crunch (that’s another post for another day).
It was at Astricon in 2005 that I also met one of my soon to be business partners (although I certainly didn’t know it at the time). While I was promoting embedded Asterisk and AstLinux I met a man from Florida named Joe Rhem. Joe had come up with the idea of using embedded Asterisk systems as the cornerstone of a new way to provide business grade telephone services. Joe and I met for a few minutes and discussed the merits of embedded Asterisk. Unfortunately (and everyone already knows this) I don’t remember meeting with Joe. Like I said Astricon was always a whirlwind and I had these conversations with dozens if not hundreds of people at each show. I made my way through Astricon, made a pit stop in Santa Clara for (the now defunct) ISPCon and then returned home to Lake Geneva, WI with a stack of business cards, a few new stories, and a lot of work to finish (or start, depending on your perspective).
A couple of months later I received an e-mail from Joe Rhem discussing how he’d like to move forward with what we discussed in Anaheim. Joe had recruited another partner to lead the new venture. Norm Worthington was a successful serial entrepreneur and his offer to lead the company was the equivalent of “having General Patton lead your war effort”. After some catch up I was intrigued with Joe’s idea. A few hours on the phone later everyone was pretty comfortable with how this could work.
Now I just needed to fly to Sarasota, FL (where’s that - sounds nice, I thought) to meet with everyone, discuss terms, plan a relocation, and (most importantly) start putting the company, product, and technology together.
A short time later I found myself arriving in Sarasota. It was early January and I coming from Wisconsin I couldn’t believe how nice it was. Looking back on it I’m sure Norm and Joe were very confident I’d be joining them in Sarasota. Working with technology I love “in paradise”, how could I resist?
(Continued in Building a Startup)
Tuesday, October 26, 2010
Breaking RFC compliance to improve monitoring
A colleague came to me today and had a troubling issue. He's using sipsak and nagios to monitor some SIP endpoints. Pretty standard so far, right? He noticed that when using UDP and checking on an endpoint that was completely offline sipsak would take over 30 seconds to finally return with an error. Meanwhile Nagios would block and wait for sipsak to return...
Without a simple command line option in sipsak that appeared to change this behavior, we had to enter the semi-complicated world of SIP timers. I feared that to change this behavior we'd have to do some things that might not necessarily be RFC compliant...
What's this? For once I'm actually suggesting you do something against the better advice of an RFC?
That's right, I am.
RFC3261 defines multiple timers and timeouts for messages and transactions. It says things like:
"If there is no final response for the original request in 64*T1 seconds"
"The UAC core considers the INVITE transaction completed 64*T1 seconds after the reception of the first 2xx response."
"The 2xx response is passed to the transport with an interval that starts at T1 seconds and doubles for each retransmission until it reaches T2 seconds"
Without even knowing what "T1" is you can start to see that it's a pretty important timing parameter and (more or less) serves as the father of all timeouts in SIP. Let's look at section 17 to find out what T1 is:
"The default value for T1 is 500 ms. T1 is an estimate of the RTT between the client and server transactions. Elements MAY (though it is NOT RECOMMENDED) use smaller values of T1 within closed, private networks that do not permit general Internet connection. T1 MAY be chosen larger, and this is RECOMMENDED if it is known in advance (such as on high latency access links) that the RTT is larger. Whatever the value of T1, the exponential backoffs on retransmissions described in this section MUST be used."
T1 is essentially a variable for RTT between two endpoints that serves as a multiplier for other timeouts. Unless we know better T1 should default to 500ms, which is quite high. Some implementations (such as Asterisk with the SIP peer qualify option) automatically send OPTIONS requests to endpoints in an effort to better determine RTT instead of using the RFC default of 500ms.
In reading through the sipsak source code it appeared to be RFC compliant for timing, using a default T1 value of 500ms and a transaction timeout value of 64*T1. This is why it was taking over 30 seconds (32 seconds to be exact) for sipsak to finally timeout and return the status code to nagios. This comes directly from the RFC:
"For any transport, the client transaction MUST start timer B with a value of 64*T1 seconds (Timer B controls transaction timeouts)."
This is all well and good but what happens when you don't have a way to dynamically determine T1 and you can't wait T1*64 (32s) for your results like my sipsak/nagios check earlier? Simple: you go renegade, throw out the RFC, and hack the sipsak source yourself!
So I had three options:
1) Change the default value of T1.
2) Change the value of T2 by changing the multiplier or setting a static timeout.
3) Some combination of both.
I decided to go with option #3 (RFC be damned). Why?
1) 500ms is crazy high for most of our endpoints. At a glance 100ms would be fine for ~90% of them. I'll pick 150ms.
2) I don't need that many retransmits. If the latency and/or packet loss is that bad I'm not going to wait (my RTP certainly isn't) and I just want to know about it that much quicker.
So I ended up with a quick easy patch to sipsak:
diff -urN sipsak-0.9.6.orig/sipsak.h sipsak-0.9.6/sipsak.h
--- sipsak-0.9.6.orig/sipsak.h 2006-01-28 16:11:50.000000000 -0500
+++ sipsak-0.9.6/sipsak.h 2010-10-26 18:38:45.000000000 -0400
@@ -102,11 +102,7 @@
# define FQDN_SIZE 100
#endif
-#ifdef HAVE_CONFIG_H
-# define SIP_T1 DEFAULT_TIMEOUT
-#else
-# define SIP_T1 500
-#endif
+#define SIP_T1 150
#define SIP_T2 8*SIP_T1
diff -urN sipsak-0.9.6.orig/transport.c sipsak-0.9.6/transport.c
--- sipsak-0.9.6.orig/transport.c 2006-01-28 16:11:34.000000000 -0500
+++ sipsak-0.9.6/transport.c 2010-10-26 18:38:51.000000000 -0400
@@ -286,7 +286,7 @@
}
}
senddiff = deltaT(&(srt->starttime), &(srt->recvtime));
- if (senddiff > (float)64 * (float)SIP_T1) {
+ if (senddiff > inv_final) {
if (timing == 0) {
if (verbose>0)
printf("*** giving up, no final response after %.3f ms\n", senddiff);
This changes the value of T1 to 150ms (more reasonable for most networks) and allows you to specify the number of retransmits (and thus the total timeout) using -D on the sipsak command line:
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D1 -v
** timeout after 150 ms**
*** giving up, no final response after 150.334 ms
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D2 -v
** timeout after 150 ms**
** timeout after 300 ms**
*** giving up, no final response after 460.612 ms
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D4 -v
** timeout after 150 ms**
** timeout after 300 ms**
** timeout after 600 ms**
*** giving up, no final response after 1071.137 ms
kkmac:sipsak-0.9.6-build kris$
Needless to say our monitoring situation is much improved.
Without a simple command line option in sipsak that appeared to change this behavior, we had to enter the semi-complicated world of SIP timers. I feared that to change this behavior we'd have to do some things that might not necessarily be RFC compliant...
What's this? For once I'm actually suggesting you do something against the better advice of an RFC?
That's right, I am.
RFC3261 defines multiple timers and timeouts for messages and transactions. It says things like:
"If there is no final response for the original request in 64*T1 seconds"
"The UAC core considers the INVITE transaction completed 64*T1 seconds after the reception of the first 2xx response."
"The 2xx response is passed to the transport with an interval that starts at T1 seconds and doubles for each retransmission until it reaches T2 seconds"
Without even knowing what "T1" is you can start to see that it's a pretty important timing parameter and (more or less) serves as the father of all timeouts in SIP. Let's look at section 17 to find out what T1 is:
"The default value for T1 is 500 ms. T1 is an estimate of the RTT between the client and server transactions. Elements MAY (though it is NOT RECOMMENDED) use smaller values of T1 within closed, private networks that do not permit general Internet connection. T1 MAY be chosen larger, and this is RECOMMENDED if it is known in advance (such as on high latency access links) that the RTT is larger. Whatever the value of T1, the exponential backoffs on retransmissions described in this section MUST be used."
T1 is essentially a variable for RTT between two endpoints that serves as a multiplier for other timeouts. Unless we know better T1 should default to 500ms, which is quite high. Some implementations (such as Asterisk with the SIP peer qualify option) automatically send OPTIONS requests to endpoints in an effort to better determine RTT instead of using the RFC default of 500ms.
In reading through the sipsak source code it appeared to be RFC compliant for timing, using a default T1 value of 500ms and a transaction timeout value of 64*T1. This is why it was taking over 30 seconds (32 seconds to be exact) for sipsak to finally timeout and return the status code to nagios. This comes directly from the RFC:
"For any transport, the client transaction MUST start timer B with a value of 64*T1 seconds (Timer B controls transaction timeouts)."
This is all well and good but what happens when you don't have a way to dynamically determine T1 and you can't wait T1*64 (32s) for your results like my sipsak/nagios check earlier? Simple: you go renegade, throw out the RFC, and hack the sipsak source yourself!
So I had three options:
1) Change the default value of T1.
2) Change the value of T2 by changing the multiplier or setting a static timeout.
3) Some combination of both.
I decided to go with option #3 (RFC be damned). Why?
1) 500ms is crazy high for most of our endpoints. At a glance 100ms would be fine for ~90% of them. I'll pick 150ms.
2) I don't need that many retransmits. If the latency and/or packet loss is that bad I'm not going to wait (my RTP certainly isn't) and I just want to know about it that much quicker.
So I ended up with a quick easy patch to sipsak:
diff -urN sipsak-0.9.6.orig/sipsak.h sipsak-0.9.6/sipsak.h
--- sipsak-0.9.6.orig/sipsak.h 2006-01-28 16:11:50.000000000 -0500
+++ sipsak-0.9.6/sipsak.h 2010-10-26 18:38:45.000000000 -0400
@@ -102,11 +102,7 @@
# define FQDN_SIZE 100
#endif
-#ifdef HAVE_CONFIG_H
-# define SIP_T1 DEFAULT_TIMEOUT
-#else
-# define SIP_T1 500
-#endif
+#define SIP_T1 150
#define SIP_T2 8*SIP_T1
diff -urN sipsak-0.9.6.orig/transport.c sipsak-0.9.6/transport.c
--- sipsak-0.9.6.orig/transport.c 2006-01-28 16:11:34.000000000 -0500
+++ sipsak-0.9.6/transport.c 2010-10-26 18:38:51.000000000 -0400
@@ -286,7 +286,7 @@
}
}
senddiff = deltaT(&(srt->starttime), &(srt->recvtime));
- if (senddiff > (float)64 * (float)SIP_T1) {
+ if (senddiff > inv_final) {
if (timing == 0) {
if (verbose>0)
printf("*** giving up, no final response after %.3f ms\n", senddiff);
This changes the value of T1 to 150ms (more reasonable for most networks) and allows you to specify the number of retransmits (and thus the total timeout) using -D on the sipsak command line:
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D1 -v
** timeout after 150 ms**
*** giving up, no final response after 150.334 ms
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D2 -v
** timeout after 150 ms**
** timeout after 300 ms**
*** giving up, no final response after 460.612 ms
kkmac:sipsak-0.9.6-build kris$ ./sipsak -p 10.16.0.3 -s sip:ext_callqual@asterisk -D4 -v
** timeout after 150 ms**
** timeout after 300 ms**
** timeout after 600 ms**
*** giving up, no final response after 1071.137 ms
kkmac:sipsak-0.9.6-build kris$
Needless to say our monitoring situation is much improved.
Subscribe to:
Posts (Atom)