Monday, March 24, 2008

Q.931 Problems - Solved, sort of

It's been a long day so I'll keep this one short... Upgrading to Asterisk (libpri|zaptel) 1.4 solved my Cisco AS5350XM problem. However, now I've got a new one.

Zaptel 1.4 required a new version of Wanpipe. The new version of Wanpipe does not seem to work with mixed Ts. My T1 happens to have 12 DS0s for PRI and 12 for PPP.

I'm currently working with Sangoma to get this resolved so I can have it all!

Friday, March 21, 2008

Why I (and presumably other people) hate telecom

Q.931.

That's it. Q-freaking-9-3-1. I suppose it's not bad considering it was developed so long ago, but really for me this protocol (and all it stands for) is why so many people (myself included) get frustrated with telecom.

I spent two hours today trying to get a call up (over PRI) from an Asterisk system to a Cisco gateway. I covered all of the basics:

Asterisk - pri_net - check
Cisco - pri_cpe - check
Asterisk - master clocking - check
T1 params - B8ZS, ESF - check
Switchtype - national/NI2 - check
D chan - 24 - check
T1 crossover cable - check

Voila! D channel up in no time (seriously, five minutes). Try to send a call - SURE, everything looks good (including caller id). The call gets setup and once there is about 1 sec of audio it gets torn down. Hmmm... Granted I am running SIP on the other side of the Cisco gateway (AS5350XM) so I start there. Let's face, if you're reading this blog you know I deal with SIP quite a bit. I'd much rather look at it than the alternative - Q.931...

Sure enough everything looks good on the SIP side. Perfect, actually. Ok fine, I guess we're going to have to go Q.931. I enable Q.931 debugging:

Asterisk:
asterisk -r
pri debug span 3

Cisco:
debug isdn q931
term mon (to log to my SSH session)

Try the call again. Same thing - one second of audio, disconnect. Here is what I get:

Cisco debug:

Mar 21 23:13:45.307: ISDN Se3/0:23 Q931: Applying typeplan for sw-type 0xD is 0x2 0x1, Calling num 5043223199
Mar 21 23:13:45.311: ISDN Se3/0:23 Q931: Applying typeplan for sw-type 0xD is 0x2 0x1, Called num 19412340001
Mar 21 23:13:45.311: ISDN Se3/0:23 Q931: TX -> SETUP pd = 8 callref = 0x00A3
Bearer Capability i = 0x8090A2
Standard = CCITT
Transfer Capability = Speech
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0xA18381
Preferred, Channel 1
Display i = 'Pcom2'
Calling Party Number i = 0x2180, '5043223199'
Plan:ISDN, Type:National
Called Party Number i = 0xA1, '19412340001'
Plan:ISDN, Type:National
Mar 21 23:13:45.323: ISDN Se3/0:23 Q931: RX <- CALL_PROC pd = 8 callref = 0x80A3
Channel ID i = 0xA98381
Exclusive, Channel 1
Mar 21 23:13:46.371: ISDN Se3/0:23 Q931: RX <- CONNECT pd = 8 callref = 0x80A3
Channel ID i = 0xA98381
Exclusive, Channel 1
Progress Ind i = 0x8182 - Destination address is non-ISDN
Mar 21 23:13:46.371: ISDN Se3/0:23 Q931: TX -> CONNECT_ACK pd = 8 callref = 0x00A3
Mar 21 23:13:46.379: ISDN Se3/0:23 Q931: RX <- STATUS pd = 8 callref = 0x80A3
Cause i = 0x80E2 - Message not compatible with call state or not implemented
Call State i = 0x0A
Mar 21 23:13:46.379: ISDN Se3/0:23 Q931: TX -> RELEASE pd = 8 callref = 0x00A3
Cause i = 0x80E408 - Invalid information element contents
Mar 21 23:13:46.403: ISDN Se3/0:23 Q931: RX <- RELEASE_COMP pd = 8 callref = 0x80A3
Cause i = 0x8190 - Normal call clearing

Asterisk debug:
< Protocol Discriminator: Q.931 (8) len=50
< Call Ref: len= 2 (reference 164/0xA4) (Originator)
< Message type: SETUP (5)
< [04 03 80 90 a2]
< Bearer Capability (len= 5) [ Ext: 1 Q.931 Std: 0 Info transfer capability: Speech (0)
< Ext: 1 Trans mode/rate: 64kbps, circuit-mode (16)
< Ext: 1 User information layer 1: u-Law (34)
< [18 03 a1 83 81]
< Channel ID (len= 5) [ Ext: 1 IntID: Implicit, PRI Spare: 0, Preferred Dchan: 0
< ChanSel: Reserved
< Ext: 1 Coding: 0 Number Specified Channel Type: 3
< Ext: 1 Channel: 1 ]
< [28 05 50 63 6f 6d 32]
< Display (len= 5) [ Pcom2 ]
< [6c 0c 21 80 35 30 34 33 32 32 33 31 39 39]
< Calling Number (len=14) [ Ext: 0 TON: National Number (2) NPI: ISDN/Telephony Numbering Plan (E.164/E.163) (1)
< Presentation: Presentation permitted, user number not screened (0) '5043223199' ]
< [70 0c a1 31 39 34 31 32 33 34 30 30 30 31]
< Called Number (len=14) [ Ext: 1 TON: National Number (2) NPI: ISDN/Telephony Numbering Plan (E.164/E.163) (1) '19412340001' ]
-- Making new call for cr 164
-- Processing Q.931 Call Setup
-- Processing IE 4 (cs0, Bearer Capability)
-- Processing IE 24 (cs0, Channel Identification)
-- Processing IE 40 (cs0, Display)
-- Processing IE 108 (cs0, Calling Party Number)
-- Processing IE 112 (cs0, Called Party Number)
> Protocol Discriminator: Q.931 (8) len=10
> Call Ref: len= 2 (reference 164/0xA4) (Terminator)
> Message type: CALL PROCEEDING (2)
> [18 03 a9 83 81]
> Channel ID (len= 5) [ Ext: 1 IntID: Implicit, PRI Spare: 0, Exclusive Dchan: 0
> ChanSel: Reserved
> Ext: 1 Coding: 0 Number Specified Channel Type: 3
> Ext: 1 Channel: 1 ]
-- Accepting call from '5043223199' to '19412340001' on channel 0/1, span 3
-- Executing Wait("Zap/49-1", "1") in new stack
-- Executing Answer("Zap/49-1", "") in new stack
> Protocol Discriminator: Q.931 (8) len=14
> Call Ref: len= 2 (reference 164/0xA4) (Terminator)
> Message type: CONNECT (7)
> [18 03 a9 83 81]
> Channel ID (len= 5) [ Ext: 1 IntID: Implicit, PRI Spare: 0, Exclusive Dchan: 0
> ChanSel: Reserved
> Ext: 1 Coding: 0 Number Specified Channel Type: 3
> Ext: 1 Channel: 1 ]
> [1e 02 81 82]
> Progress Indicator (len= 4) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: Private network serving the local user (1)
> Ext: 1 Progress Description: Called equipment is non-ISDN. (2) ]
-- Executing MusicOnHold("Zap/49-1", "") in new stack
-- Started music on hold, class 'default', on channel 'Zap/49-1'
< Protocol Discriminator: Q.931 (8) len=5
< Call Ref: len= 2 (reference 164/0xA4) (Originator)
< Message type: CONNECT ACKNOWLEDGE (15)
> Protocol Discriminator: Q.931 (8) len=12
> Call Ref: len= 2 (reference 164/0xA4) (Terminator)
> Message type: STATUS (125)
> [08 02 80 e2]
> Cause (len= 4) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: User (0)
> Ext: 1 Cause: Wrong message (98), class = Protocol Error (6) ]
> [14 01 0a]
> Call State (len= 3) [ Ext: 0 Coding: CCITT (ITU) standard (0) Call state: Active (10)
< Protocol Discriminator: Q.931 (8) len=10
< Call Ref: len= 2 (reference 164/0xA4) (Originator)
< Message type: RELEASE (77)
< [08 03 80 e4 08]
< Cause (len= 5) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: User (0)
< Ext: 1 Cause: Invalid information element contents (100), class = Protocol Error (6) ]
< Cause data 1: 08 (8)
-- Processing IE 8 (cs0, Cause)
-- Channel 0/1, span 3 got hangup
-- Stopped music on hold on Zap/49-1
== Spawn extension (pri-in, 19412340001, 3) exited non-zero on 'Zap/49-1'
NEW_HANGUP DEBUG: Calling q931_hangup, ourstate Null, peerstate Release Request
> Protocol Discriminator: Q.931 (8) len=9
> Call Ref: len= 2 (reference 164/0xA4) (Terminator)
> Message type: RELEASE COMPLETE (90)
> [08 02 81 90]
> Cause (len= 4) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: Private network serving the local user (1)
> Ext: 1 Cause: Normal Clearing (16), class = Normal Event (1) ]
NEW_HANGUP DEBUG: Calling q931_hangup, ourstate Null, peerstate Null
NEW_HANGUP DEBUG: Destroying the call, ourstate Null, peerstate Null
-- Hungup 'Zap/49-1'
s2s-srq-co*CLI>

Everytime I have to look at Q931 I just cringe. I MEAN CRINGE. I hate it. Look at it - all of these goofy messages, number plans, and the worst - IEs (Information Elements). Compare this to SIP (there are plenty of examples on this blog). I guess I just take it for granted. I can fire up ngrep on a network interface, turn on interpretation of carriage returns/linefeeds and go to town with something that just makes more sense. Sure to some people it probably still looks like gibberish but anyone that complains certainly hasn't seen much Q.931!

Let's look at my current problem... It appears that once the call is setup, Asterisk sends a CONNECT message, to which the Cisco quickly acknowledges with a CONNECT ACKNOWLEDGMENT (big surprise there). Here's where things get a little strange... The next message is from Asterisk (STATUS) complaining with "Message not compatible with call state or not implemented". That is pretty helpful, I'll give them that. However, what's not compatible about a standard, simple CONNECT ACK?!?!

This is wear I start to get angry. I'd like to see what's going on a little bit more. I know I have some flexibility with my IEs, for example. If I turn up the debug high enough in Asterisk I can see which IEs Asterisk identifies. I just can't see the data. From a debug standpoint, Cisco appears to give me even less. Cisco does, however, give me some pretty decent control of newer and less than standard IEs to send (and to a lesser extent, receive). That's gotta have something to do with it, I'm sure. It's even telling me "invalid information element contents". Too bad I can't actually see the IE content... Being severely limited with the tools at hand, I began to cycle through all of my IE, number plan, etc options on both sides. I even got lazy at one point and tried to set my switchtype to dms100! No dice.

Want to know the craziest part about all of this? I've done this several times before. I've brought up many a PRI to carriers, with all kinds of switchtypes, b/d channel configs, into all kinds of equipment (including Asterisk and Cisco). I've gone from Asterisk PRI -> Cisco PRI, Cisco PRI -> Cisco PRI, Asterisk PRI -> Asterisk PRI and every other combination you could imagine (although I think I covered all of them). I've never really had problems, although whenever there is a problem it means going through lines and lines of less-than-helpful Q.931 messaging to identify the problem, which gets me back to my point.

That's just the problem with telecom. It's old, slow, and inflexible. If this were a problem with SIP I could grab the packet stream with tcpdump and load it into Wireshark if I got really desperate (I'm rarely that desperate). I could watch it in real time with ngrep, complete with regex matching on payload and BPF syntax. I could try multiple SIP libraries and multiple clients. I could even tap into the WEALTH of information about SIP, including the various RFCs. Sure I know there are a lot of them but hey, at least you know where to look. I could even try different hardware very easily because hey, you don't need a $500 T1 card to play with SIP. Heck you don't even need a network. VmWare or even good ol' fashioned lo0 work just fine.

How can I do this with ISDN? Buy a T-Berd? No thanks. So here it is, Friday night, and I'm obsessing over the PRI that kicked my ass today. Anyone have any ideas?

Wednesday, March 19, 2008

More OpenSER Work

I've been working with OpenSER a lot over the last couple of weeks. I upgraded to 1.2.3, enabled TLS support, and implemented some really cool new features:

- Serial forking class
- English error messages
- Numerous code cleanups
- Much more

I'll try to find out how much of this I can share and get back to you. In the meantime I'm having a problem calling next_gw() from the LCR module. I suspect it's a db problem but I can't be sure yet...