Tuesday, December 20, 2011

Performance Testing (Part 1)

Over the past few years (like many other people in this business) I’ve needed to do performance testing.  Open source software is great but this is one place where you need to do your own leg work.  This conundrum first presented itself in the Asterisk community.  There are literally thousands of variables that can affect system the performance of Asterisk, FreeSWITCH, or any other software solution.  In no particular order:

- Configuration.  Which modules do you have loaded?  How are they configured?  If you’re using Kamailio, do you do hundreds of huge, slow, nasty DB queries for each call setup?  How is your logging configured?  Maybe you use Asterisk or FreeSWITCH and so several system calls, DB lookups, LUA scripts, etc?  Even the slightest misstep in configuration (synchronous syslogging with Kamailio, for example) can reduce your performance by 90%.

- Features in use.  Paging groups (unicast) are notorious for destroying performance on standard hardware - every call needs to be setup individually, you need to handle RTP, and some audio mixing is involved.  Hardware that can’t do 10 members in a page group using Asterisk or FreeSWITCH may be capable of hundreds of sessions using Kamailio with no media.

- Standard performance metrics.  “Thousands of calls” you say?  How many calls per second?  Are you transcoding?  Maybe you’re not handling any media at all?  What is the delay in call setup?

- Hardware.  This may seem obvious (MORE HERTZ) but even then there are issues...  If you’re handling RTP, what are you using for timing?  If you have lots of RTP, which network card are you using?  Does it and your kernel support MSI or MSI-X for better interrupt handling?  Can you load balance IRQs across cores?  How efficient (or buggy) is the driver (Realtek I’m looking at you)?!?

- The “guano” effect.  As features are added to the underlying toolkit (Asterisk, FreeSWITCH, etc) and to your configuration, how is performance affected over time?  Add a feature here, and a feature there - and repeat.  Over the months and years (even with faster hardware) you may find that each “little” feature reduced call capacity by 5%.  Or maybe your calls per second went down by two each time.  Not a big deal overall yet over time this adds up - assuming no other optimizations your call capacity could be down by 50% after ten “minor” changes.  It adds up - it really does.

Even when pointing out all of these issues you’d still be surprised how often one is faced with the question “Well yeah but how many calls can I handle on my dual core Dell server?”.

In almost every case the best answer is “Get your hardware, develop your solution, run sipp against it and see what happens”.  That’s really about as good as we can do.

SIPP is a great example of a typical, high quality open source tool.  In true “Unix philosophy” it does one thing and it does it well: SIP performance testing.  SIPP can be configured to initiate (or receive) just about any conceivable SIP scenario - from simple INVITE call handling to full SIMPLE test cases.  In these tests SIPP will tell you call setup time, messages received, successful dialogs, etc.

SIPP even goes a step further and includes some support for RTP.  SIPP has the ability to echo RTP from the remote end or even replay RTP from a PCAP file you have saved to disk.  This is where SIPP starts to show some deficiencies.  Again, you can’t blame SIPP because SIPP is a SIP performance testing tool - it does that and it does it well.  RTP testing leaves a lot to be desired.  First of all, you’re on your own when it comes to manipulating any of the PCAP parameters.  Length, content, codec, payload types, etc, etc need to be configured separately.  This isn’t a problem, necesarily, as there are various open source tools to help you with some of these tasks.  I won’t get into all of them here but they too leave something to be desired.

What about analyzing the quality of the RTP streams?  SIPP provides mechanisms to measure various SIP “quality” metrics - SIP response times, SIP retransmits, etc.  With RTP you’re on your own.  Once again, sure, you could setup tshark on a SPAN port (or something) to do RTP stream analysis on every stream but this would be tedious and (once again) subject you to some of the harsh realities of processing a tremendous amount of small packets in software on commodity hardware.

Let’s face it - for a typical B2BUA handling RTP the numbers add up very quickly - let’s assume 20ms packetization for the following:

Single RTP stream = 50 packets per second (pps)
Bi-directional RTP stream = 100 pps
A-leg bi-directional RTP stream = 100 pps
B-leg bi-directional RTP stream = 100 pps

A leg + B leg = 200 pps PER CALL

What does this look like with 10,000 channels (using g711u)?

952 mbit/s (close to Gigabit wire speed) in each direction
1,000,000 (total) packets per second

Open source software is great - it provides us with the tools to (ultimately) build services and businesses.  Many of us choose what to focus on (our core competency).  At Star2Star we provide business grade communication services and we spend a lot of time and energy to build these services because it’s what we do.  We don’t sell, manufacture, or support testing platforms.

At this point some of you may be getting an idea...  Why don’t I build/design an open source testing solution?  It’s a good question and while I don’t want to crush your dreams there are some harsh realities:

1)  This gets insanely complicated, quickly.  Anyone who follows this blog knows SIP itself is complicated enough.
2)  Scaling becomes a concern (as noted above).
3)  Who would use it?

The last question is probably the most serious - who really needs the ability to initiate 10,000 SIP channels at 100 calls per second while monitoring RTP stream quality, etc?  SIP carriers?  SIP equipment manufacturers?  A few SIP software developers?  How large is the market?  What kind of investment would be required to even get the project off the ground?  What does the competition look like?  While I don’t have the answers to most of these questions I can answer the last one.

Commercial SIP testing equipment is available from a few vendors:

Spirent
Empirix
Ixia

...and I’m sure others.  We evaluated a few of these solutions and I’ll be talking more about them in a follow-up post in the near future.
 
Stay tuned because this series is going to be good!

Friday, December 2, 2011

Star2Star Gets Noticed

Just a quick one today (and some shameless self promotion on my part)...  Star2Star has been recognized on a few "lists" this year, check it out:

Inc 500
Forbes 100 "Most Promising"

I'm lucky enough to tell people the same story all of the time - when I was a little kid I played with all of this stuff because I thought it was fun and I loved it.  Only later did I realize that one day I'd be getting paid for it.  I certainly never thought it could come to this!

Ok, enough of that for now.  I'll be getting back to some tech stuff soon...

Tuesday, November 15, 2011

Building a Startup (the right way)

(Continued from Building a Startup)
 
Our way wasn’t working.  To put it mildly our “business grade” solution didn’t perform much better than Vonage.  We became to exemplify VoIP - jittery calls, dropped calls, one way calls, etc, etc, etc.  Most of this was because of the lack of quality ITSPs at that time.  Either way our customers didn’t care.  It was us.  If we went to market with what we had the first time around we were going to loose.

The problem was the other predominant architecture at the time was “hosted”.  Someone hosts a PBX for you and ships you some phones.  You plug them in behind your router and magically you have a phone system.  They weren’t doing much better.  Sure, their sales looked good but even then it was becoming obvious customer churn was quite high.  People didn’t like hosted either, and for good reason.  Typically they have less control over the call than we do.

As I’ve eluded before I thought there was a better way.  We needed to host the voice applications where it made the most “sense”.  We were primarily using Asterisk and with a little creative provisioning, a kick-ass SIP proxy, and enough Asterisk machines we could build the perfect business PBX - even if that meant virtually none of it existed at the customer premise.  Or maybe all of it did.  That flexibility was key.  After a lot of discussions, whiteboard sessions, and late nights everyone agreed.  We needed a do-over.

So we got to work and slowly our new architecture began to take shape.  We added a kick-ass SIP proxy (OpenSER).  OpenSER would power the core routing between various Asterisk servers each meeting different needs - IVR/Auto Attendant, Conferencing, Voicemail, remote phones (for “hosted” phones/softphones), etc.  The beauty was the SIP proxy could route between all of these different systems including the original AstLinux system at the customer premise.  Customer needs to call voicemail?  No problem - the AstLinux system at the CPE fires an INVITE off to the proxy and the proxy figures out where their voicemail server is.  The call is connected and the media goes directly between the two endpoints.  Same thing for calls between any two points on the network - AstLinux CPE to AstLinux CPE, PSTN to voicemail, IVR to conference.

This is a good time to take a break and acknowledge what really made this all possible - OpenSER.  While it’s difficult to explain the exact history and family tree with any piece of SER software I can tell you one thing - this company would not be possible without it.  There is no question in my mind.  It’s now 2011 and whether you select Kamailio or OpenSIPS for your SIP project you will not be sorry.  Even after five years you will not find a more capable, flexible, scalable piece of SIP server software.  It was one of the best decisions we ever made.

Need to add another server to meet demand for IVR?  No problem, bring another server online, add the IP to a table and presto - you’re now taking calls on your new IVR.  Eventually a new IVR lead to several new IVRs, voicemail servers, conference systems, web portals, mail servers, various monitoring systems, etc.

What about infrastructure?  Starting at our small scale, regional footprint, and focus on quality we began to buy our own PRIs and running them on a couple of Cisco AS5350XM gateways.  This got us past our initial issues with questionable ITSPs.  Bandwidth was becoming another problem...  We had an excellent colocation provider that provided blended bandwidth but still we needed more control.  Here came BGP, ARIN, AS numbers, a pair of Cisco 7206VXRs w/ G2s, iBGP, multiple upstream providers, etc.

At times I would wonder - whatever happened to spending my time worrying about cross compilers?  Looking back I’m not sure which was worse - GNU autoconf cross-compiling hell or SIP interop, BGP, etc.  It’s fairly safe to say I’m a sadomasochist either way.

Even with all of the pain, missteps, and work we finally had an architecture to take to market.  It would be the architecture that would serve us well for several years.  Of course there was more work to be done...

Wednesday, November 2, 2011

Building a Startup

(Continued from Starting a Startup)
After several days of meetings in Sarasota we determined:

1)  I was moving to Sarasota to start a company with Norm and Joe.
2)  We were going to utilize open source software wherever possible (including AstLinux, obviously).
3)  The Internet was the only ubiquitous, high quality network to build a nationwide platform.
4)  The Internet was only getting more ubiquitous, more reliable, and faster in coming months/years/decades/etc.
5)  We were going to take advantage of as much of this as possible.

These were some pretty lofty goals.  Remember, this is early 2006.  Gmail was still invitation-only beta.  Google docs didn’t exist.  Amazon EC2 didn’t exist.  “Cloud computing” hadn’t come back into fashion yet.  The term itself didn’t exist.  The Internet was considered (by many) to be “best effort”, “inherently unreliable”, and “unsuitable” for critical communications (such as real time business telephony).  There were many naysayers who were confident this would be a miserable failure.  As it turns out, they were almost right.

We thought the “secret sauce” to business grade voice over the internet was monitoring and management.  If one could monitor and manage the internet connection business grade voice should be possible.  Of course this is very ambiguous but it lead to several great hires.  We hired

Joe had already deployed several embedded Asterisk systems to various businesses in the Sarasota area.  They used an embedded version of Linux he patched together and a third party (unnamed) “carrier” to connect to the PSTN.  The first step was upgrading these machines and getting them on AstLinux.  Once this was accomplished we felt confident enough to proceed with our plan.  This was Star2Star Communications and in the beginning of 2006 it looked something like this:

1)  Soekris net4801 machines running AstLinux on the customer premise.
2)  Grandstream GXP-2000 phones at each desk.
3)  Connectivity to a third party “ITSP”.
4)  Management/monitoring systems (check IP connectivity, phone availability, ITSP reliability, local LAN, etc).
5)  Central provisioning of AstLinux systems, phones, etc.

This was Star2Star and there was something I really liked about it - it was simple.  Anyone who knows me or knows of my projects (AstLinux, for example) has to know I favor simplicity whenever possible.  Keep it simple, keep it simple, keep it simple (stupid).

As time went on we started to learn that maybe this was too simple.  We didn’t have enough control.  Out monitoring wasn’t as mature as it should be.  We didn’t pick the right IP phones.  These could be easily fixed.  However, we soon realized our biggest mistake was architecture (or lack thereof).  This wasn’t going to be an easy fix.

We couldn’t find an ITSP that offered a level of quality we considered to be acceptable.  Very few ITSPs had any more experience with VoIP, SIP, and the internet than we did.  More disturbing, however, was an almost complete lack of focus on quality and reliability.  No process.

What we (quickly) discovered is the extremely low barrier to entry for ITSPs, especially back then.  Virtually anyone could install Asterisk on a $100/mo box in a colo somewhere, buy dialtone from someone (who knows) and call themselves an ITSP.  After going through several of these we discovered we needed to do it ourselves.

Even assuming we could solve the PSTN connectivity problem we discovered yet another issue.  All of the monitoring and management in the world cannot make up for a terrible last mile.  If the copper in the ground is rotting and the DSL modem can only negotiate 128kbps/128kbps that’s all you’re going to get.  To make matters worse in the event of a cut or outage the customer would be down completely.  While that may have always happened with the PSTN and an on premise PBX we considered this to be unacceptable.

So then, in the eleventh hour, just before launch I met with the original founders and posed a radical idea - scrap almost everything.  There was a better way.
(Continued in Building a Startup (the right way))

Tuesday, October 25, 2011

Starting a Startup


I know I’ve apologized for being quiet in the past.  This is not one of those times because (as you’ll soon find out) I’ve been hard at work and only now can I finally talk about it.

Six years ago I was spending most of my time working with Asterisk and AstLinux.  I spent a lot of time promoting both - working the conference circuit, blogging, magazines, books, etc.  Conferences are a great way to network and meet new people.  I did just that.  With each conference I attended came new business opportunities.  Sure, not all of them were a slam dunk and eventually I started to pick and chose which conferences I considered worthy of the time and investment.

For anyone involved with Asterisk Astricon is certainly worthy of your time and energy - the mecca of the Asterisk community.  Astricon was always a whirlwind and 2005 was no exception.  We were in Anaheim, California and embedded Asterisk was starting to really heat up.  I announced my port of AstLinux to Gumstix and announced the “World’s Smallest PBX”, leading to an interview and story in LinuxDevices.  I worked a free community booth (thanks Astricon) with Dave Taht and was introduced to Captain Crunch (that’s another post for another day).

It was at Astricon in 2005 that I also met one of my soon to be business partners (although I certainly didn’t know it at the time).  While I was promoting embedded Asterisk and AstLinux I met a man from Florida named Joe Rhem.  Joe had come up with the idea of using embedded Asterisk systems as the cornerstone of a new way to provide business grade telephone services.  Joe and I met for a few minutes and discussed the merits of embedded Asterisk.  Unfortunately (and everyone already knows this) I don’t remember meeting with Joe.  Like I said Astricon was always a whirlwind and I had these conversations with dozens if not hundreds of people at each show.  I made my way through Astricon, made a pit stop in Santa Clara for (the now defunct) ISPCon and then returned home to Lake Geneva, WI with a stack of business cards, a few new stories, and a lot of work to finish (or start, depending on your perspective).

A couple of months later I received an e-mail from Joe Rhem discussing how he’d like to move forward with what we discussed in Anaheim.  Joe had recruited another partner to lead the new venture.  Norm Worthington was a successful serial entrepreneur and his offer to lead the company was the equivalent of “having General Patton lead your war effort”.  After some catch up I was intrigued with Joe’s idea.  A few hours on the phone later everyone was pretty comfortable with how this could work.

Now I just needed to fly to Sarasota, FL (where’s that - sounds nice, I thought) to meet with everyone, discuss terms, plan a relocation, and (most importantly) start putting the company, product, and technology together.

A short time later I found myself arriving in Sarasota.  It was early January and I coming from Wisconsin I couldn’t believe how nice it was.  Looking back on it I’m sure Norm and Joe were very confident I’d be joining them in Sarasota.  Working with technology I love “in paradise”, how could I resist?

(Continued in Building a Startup)