Summary: | ASTERISK-23460: ooh323 channel stuck if call is placed directly and gatekeeper is not available | ||
Reporter: | Dmitry Melekhov (slesru) | Labels: | |
Date Opened: | 2014-03-11 03:43:23 | Date Closed: | 2014-03-28 13:03:36 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Addons/chan_ooh323 |
Versions: | 11.7.0 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Centos 6/x86-64 | Attachments: | ( 0) ASTERISK-23460.patch ( 1) ASTERISK-23460-1.patch ( 2) ASTERISK-23460-2.patch ( 3) ASTERISK-23460-stacklookup.patch ( 4) DebugLog ( 5) h323_log ( 6) h323_log |
Description: | Hello!
This issue is related to ASTERISK-23033 , but is worse. Just had power outage, and I turned on asterisk server before server with gnugk. I got stuck channel with direct, i.e. to peer , not registered in gatekeeper call: OOH323/neftisa-4 6973@h323:1 Down (None) DAHDI/i1/6401-14 116973@dahdi:9 Ring Dial(OOH323/6973@neftisa) I was not able to place call , only solution is to restart asterisk... Looks like the same problem was solved for calls to peers registered in gatekeeper, but not for direct calls. Thank you! | ||
Comments: | By: Rusty Newton (rnewton) 2014-03-11 18:38:01.552-0500 Can you reproduce this reliably by making the gatekeeper unavailable and placing a call directly to it? Or did it only happen during your power outage the one time? By: Dmitry Melekhov (slesru) 2014-03-11 23:02:38.152-0500 Hello! It is not easy to reproduce this , because of ASTERISK-23033, which makes impossible to place call directly if gatekeeper is not available :-( btw, this is what I have in log: 12:02:35:138 Error:Transport failure while reading Q931 message (incoming, ooh323c_1) 12:10:26:512 Error:Transport failure while reading Q931 message (incoming, ooh323c_3) 12:21:18:032 Error:Gatekeeper not responding to ARQ 12:21:18:032 Error: Gatekeeper error. Either Gk not responding or Gk sending invalid messages 12:21:18:032 Error: Gatekeeper error detected. Closing GkClient as Gk mode is UseSpecifcGatekeeper By: Alexander Anikin (may213) 2014-03-13 12:10:15.907-0500 Hello, Looks like to i known where is problem, will try to make patch few later. By: Alexander Anikin (may213) 2014-03-13 17:12:11.793-0500 Dmitry, please test attached patch when you do it. I'm not sure that is solve and I could not reproduce problem at the moment but will do more testing. By: Dmitry Melekhov (slesru) 2014-03-13 22:47:41.463-0500 Hello! Thank you! I'll ,hopefully, do test in monday, just because I can do such tests only at low load time.... By: Dmitry Melekhov (slesru) 2014-03-16 22:37:39.832-0500 Hello! With this patch there is no complete stack lockup, but call still is in stale state. How I reproduced this: I blocked gatekeeper ip by iptables, then placed calls. first call was dropped by asterisk after about 30 seconds ( I guess tcp timeout), second call stalled, this is what I got after some time: asterisk*CLI> core show channels Channel Location State Application(Data) DAHDI/i1/6401-1 116973@dahdi:9 Ring Dial(OOH323/6973@neftisa) ... OOH323/neftisa-0 6973@h323:1 Down (None) then I removed iptables filter and copied h323 log, I'll attach it. And even after I hanged up from pri side call still "exists" asterisk*CLI> core show channels Channel Location State Application(Data) DAHDI/i1/6401-1 116973@dahdi:9 Ring Dial(OOH323/6973@neftisa) OOH323/neftisa-0 6973@h323:1 Down (None) but other h323 calls can be placed, this is good, but, I think it's better to fix these stalled calls. Thank you! By: Dmitry Melekhov (slesru) 2014-03-16 22:38:34.869-0500 here is stalled call log By: Alexander Anikin (may213) 2014-03-17 03:11:34.060-0500 Hello Dmitry, Thank you, will analyze log. By: Dmitry Melekhov (slesru) 2014-03-17 03:30:25.817-0500 Thank you! Log contains only last call, if I remember correctly, not two as I wrote. I turned tracelevel=6 and restarted asterisk after first call... By: Alexander Anikin (may213) 2014-03-17 08:23:21.191-0500 Dmitry, the problem is same as ASTERISK-21960 i.e. asterisk channel lockup if there is no channel create in ooh323 stack. Please try attached patch ASTERISK-23460-stacklookup.patch, it change infinite waiting to time limited. But it's not final patch here, i will do more correct way with call back from ooh323 stack to asterisk channel if there is error on channel creation in the stack. By: Dmitry Melekhov (slesru) 2014-03-17 22:25:58.042-0500 Hello! You are right about problem cause- this patch makes problem disappear- channels always hangs up after some time. Waiting for final patch and ready to test :-) Thank you! By: Alexander Anikin (may213) 2014-03-26 13:27:44.301-0500 Hello, Dmitry please try attached ASTERISK-23460-1.patch. It send signal on locked condition variable from call clearing callback that cause immediately if call can't go without GK. Patch is for clean 11 branch without previous patches from here. By: Dmitry Melekhov (slesru) 2014-03-26 22:37:30.129-0500 Hello! Thank you very much, I hope to test at tomorrow early morning, or, unfortunately, I guess , more realistic, at Monday morning... By: Dmitry Melekhov (slesru) 2014-03-26 23:47:09.478-0500 Oops! I forgot I have server which is still not in production :-) So I tested on it, I can use it only from sip, but there is no difference- I still can reproduce problem on 11.8.1 with only ASTERISK-23460-1.patch applied. I blocked gatekeeper : iptables -A INPUT -s 192.168.22.254 -j DROP then restarted asterisk and placed call: -- Executing [6401@sipphones:5] Dial("SIP/6052-00000000", "OOH323/6401@cisco ast-blk*CLI> core show channels Channel Location State Application(Data) SIP/6052-00000000 6401@sipphones:5 Ring Dial(OOH323/6401@cisco) OOH323/cisco-0 6401@h323:1 Down (None) 2 active channels 1 active call 1 call processed and it stucks forever. I'll attach h323 log with debug level =6 . Thank you! By: Dmitry Melekhov (slesru) 2014-03-26 23:49:59.756-0500 sip to h323 stuck call By: Alexander Anikin (may213) 2014-03-27 02:56:58.144-0500 It's interesting that there is no any call logs in h323_log only gk ras packets. Dmitry, could you attach asterisk console log for the call with ooh323 set debug enabled? By: Dmitry Melekhov (slesru) 2014-03-27 03:37:49.988-0500 Hello! Here is log including console log... By: Dmitry Melekhov (slesru) 2014-03-27 03:40:30.737-0500 As I said, I'm doing call directly to peer... btw, if I remove iptables filter and do call it looks in console as: --- find_peer "cisco" comparing with "192.168.22.253" found matching peer +++ find_peer "cisco" --- ooh323_new - cisco +++ h323_new --- onNewCallCreated 7f4d0c000d18: ooh323c_o_1 .... By: Alexander Anikin (may213) 2014-03-27 06:56:20.960-0500 Dmitry, please try attached ASTERISK-23460-2.patch. As previous it must be applied to clean asterisk. I forget remove limitation on ooh323 stack command processing in there is no registered gk. By: Dmitry Melekhov (slesru) 2014-03-27 07:39:57.447-0500 Thank you! Now all looks good, I get hangup if asterisk was registered in gatekeeper -- Called OOH323/6401@cisco Restart stopped gatekeeper client == Everyone is busy/congested at this time (1:0/0/1) -- Executing [6401@sipphones:6] Hangup("SIP/6052-00000012", "") in new stack == Spawn extension (sipphones, 6401, 6) exited non-zero on 'SIP/6052-00000012' and immediately if not: -- Called OOH323/6401@cisco == Everyone is busy/congested at this time (1:0/0/1) -- Executing [6401@sipphones:6] Hangup("SIP/6052-00000013", "") in new stack == Spawn extension (sipphones, 6401, 6) exited non-zero on 'SIP/6052-00000013' I'll apply patch to production system tomorrow morning and test :-) Thank you! By: Dmitry Melekhov (slesru) 2014-03-27 22:21:49.646-0500 Good morning! :-) Just tested on production- can't reproduce problem with ASTERISK-23460-2.patch So, problem is fixed. Thank you! And it still will be good to call peers directly when gatekeeper is not available ( ASTERISK-23033 ) , but, anyway, system is in working state in such situation and this is excellent :-) By: Alexander Anikin (may213) 2014-03-28 08:00:34.108-0500 Dmitry, thank you for testing, fine result. I will close this issue and go to switch to 23033 ;) By: Alexander Anikin (may213) 2014-03-28 13:03:36.609-0500 close as resolved |