| Summary: | ASTERISK-21193: IAX/2 fails to destroy channel on max retransmits exceeded | ||
| Reporter: | Jaco Kroon (jkroon) | Labels: | |
| Date Opened: | 2013-02-28 14:32:32.000-0600 | Date Closed: | 2013-04-10 15:00:39 | 
| Priority: | Major | Regression? | |
| Status: | Closed/Complete | Components: | Channels/chan_iax2 | 
| Versions: | 11.2.1 | Frequency of Occurrence | Occasional | 
| Related Issues: | |||
| Environment: | Attachments: | ||
| Description: | Hi, Every now and again an IAX/2 channel will get locked up, with the following being sent to the CLI and logs repeatedly: {noformat} [Feb 28 21:38:47] WARNING[23040]: chan_iax2.c:3552 __attempt_transmit: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 11, ts=17269334, seqno=249) [Feb 28 21:38:57] WARNING[23045]: chan_iax2.c:3552 __attempt_transmit: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 11, ts=17279334, seqno=250) [Feb 28 21:39:00] WARNING[23047]: chan_iax2.c:3552 __attempt_transmit: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 2, ts=17282693, seqno=251) [Feb 28 21:39:07] WARNING[23039]: chan_iax2.c:3552 __attempt_transmit: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 11, ts=17289333, seqno=252) {noformat} Currently running iax2 show peer arctica shows the following: {noformat} morpheus*CLI> iax2 show peer arctica * Name : arctica Description : Secret : <Set> Context : uls-from-trunk Parking lot : Mailbox : Dynamic : Yes Callnum limit: 1024 Calltoken req: Yes Trunk : Yes Encryption : No Callerid : "" <> Expire : 3212988 ACL : No Addr->IP : 105.236.192.85 Port 4569 Defaddr->IP : 0.0.0.0 Port 4569 Username : ulsvoip Codecs : (gsm|g729) Codec Order : (g729|gsm) Status : OK (9 ms) Qualify : every 60000ms when OK, every 10000ms when UNREACHABLE (sample smoothing Off) {noformat} The specific channel from core show channels concise: {noformat} IAX2/arctica-3307!uls-from-trunk!0761916763!9!Ring!Dial!Local/0799048876@uls-place-call/n!0873514879!arctica!!2!17643!(None)!morpheus-1362063026.53899 {noformat} Some logs from the channel creation (came in from remote peer, I did filter for the string arctica and then worked from there - there is a LOT of information in the logs around this, total log is about 270M atm): {noformat} [Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c: -- Executing [0761916763@uls-from-trunk:1] NoCDR("IAX2/arctica-3307", "") in new stack [Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c: -- Executing [0761916763@uls-from-trunk:2] Set("IAX2/arctica-3307", "_BLINDTRANSFER=") in new stack [Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c: -- Executing [0761916763@uls-from-trunk:3] GotoIf("IAX2/arctica-3307", "0?godial") in new stack [Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c: -- Executing [0761916763@uls-from-trunk:4] Set("IAX2/arctica-3307", "_reqip=105.236.192.85") in new stack [Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c: -- Executing [0761916763@uls-from-trunk:5] Set("IAX2/arctica-3307", "_callref=") in new stack [Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c: -- Executing [0761916763@uls-from-trunk:6] GotoIf("IAX2/arctica-3307", "1?godial") in new stack [Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c: -- Executing [0761916763@uls-from-trunk:9] Dial("IAX2/arctica-3307", "Local/0761916763@uls-place-call/n") in new stack [Feb 28 16:50:26] VERBOSE[2712][C-0000317a] pbx.c: -- Executing [0761916763@uls-place-call:1] NoOp("Local/0761916763@uls-place-call-0000377b;2", "Trunk call request from "" <0873514879> to 0761916763 (RDNIS=, BLINDTRANSFER=, realuser=arctica)") in new stack [Feb 28 16:50:26] VERBOSE[2712][C-0000317a] pbx.c: -- Executing [0761916763@uls-place-call:2] GotoIf("Local/0761916763@uls-place-call-0000377b;2", "1?loaduser") in new stack {noformat} ... more dialplan ... lot's of it ... and off to just before the max retransmits ... {noformat} [Feb 28 16:50:26] VERBOSE[2712][C-0000317a] pbx.c: -- Executing [0761916763@uls-callprovider:19] Set("Local/0761916763@uls-place-call-0000377b;2", "c=1") in new stack [Feb 28 16:50:26] VERBOSE[2712][C-0000317a] pbx.c: -- Executing [0761916763@uls-callprovider:20] Dial("Local/0761916763@uls-place-call-0000377b;2", "IAX2/uls-node-r2d2/prov*is*0761916763,,") in new stack [Feb 28 16:50:26] VERBOSE[2712][C-0000317a] app_dial.c: -- IAX2/uls-node-r2d2-17667 is making progress passing it to Local/0761916763@uls-place-call-0000377b;2 [Feb 28 16:50:26] VERBOSE[2711][C-0000317a] app_dial.c: -- Local/0761916763@uls-place-call-0000377b;1 is making progress passing it to IAX2/arctica-3307 [Feb 28 16:50:31] VERBOSE[2723] chan_iax2.c: -- Seeding 'arctica' at 105.236.192.85:4569 for 60 [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Registered extension context 'dial-trunk-arctica'; registrar: pbx_config [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension '_X!' priority 1 to dial-trunk-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension '_X!' priority 2 to dial-trunk-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Registered extension context 'speeddial-tr-arctica'; registrar: pbx_config [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Including context 'speeddial-cl-ULS-A13' in context 'speeddial-tr-arctica' [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Registered extension context 'chanvars-arctica'; registrar: pbx_config [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension 's' priority 1 to chanvars-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension 's' priority 2 to chanvars-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension 's' priority 3 to chanvars-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension 's' priority 4 to chanvars-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension 's' priority 5 to chanvars-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension 's' priority 6 to chanvars-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension 's' priority 7 to chanvars-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension 's' priority 8 to chanvars-arctica [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Registered extension context 'cli-arctica-client'; registrar: pbx_config [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Including context 'cli-arctica-global' in context 'cli-arctica-client' [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Registered extension context 'cli-arctica-global'; registrar: pbx_config [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension '0873534134' priority 1 to cli-arctica-global [Feb 28 16:50:31] VERBOSE[2723] pbx.c: -- Added extension 'arctica' priority 1 to uls-trunks [Feb 28 16:50:33] WARNING[23039] channel.c: Exceptionally long voice queue length queuing to IAX2/arctica-3307 [Feb 28 16:50:33] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1 [Feb 28 16:50:34] WARNING[23037] channel.c: Exceptionally long voice queue length queuing to IAX2/arctica-3307 [Feb 28 16:50:34] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1 [Feb 28 16:50:35] WARNING[23039] channel.c: Exceptionally long voice queue length queuing to IAX2/arctica-3307 [Feb 28 16:50:35] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1 [Feb 28 16:50:36] WARNING[23041] channel.c: Exceptionally long voice queue length queuing to IAX2/arctica-3307 [Feb 28 16:50:36] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1 [Feb 28 16:50:36] VERBOSE[2712][C-0000317a] app_dial.c: -- IAX2/uls-node-r2d2-17667 answered Local/0761916763@uls-place-call-0000377b;2 [Feb 28 16:50:38] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1 [Feb 28 16:50:39] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1 [Feb 28 16:50:40] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1 [Feb 28 16:50:42] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1 [Feb 28 16:50:43] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1 [Feb 28 16:50:44] NOTICE[2712][C-0000317a] src/cdr_uls.C: cdr_uls: cdr entry: clid=0873514879,src=0873514879,dst=0761916763,dcontext=uls-callprovider,channel=Local/0761916763@uls-place-call-0000377b;2,dstchannel=IAX2/uls-node-r2d2-17667,lastapp=Dial,lastdata=IAX2/uls-node-r2d2/prov*is*0761916763,,,start=2013-02-28 16:50:26,answer=2013-02-28 16:50:36,end=2013-02-28 16:50:44,duration=18,billsec=8,disposition=ANSWERED,amaflags=DOCUMENTATION,accountcode=arctica,uniqueid=morpheus-1362063026.53901,userfield=0873534134,dialed=0761916763,ulstype=normal,peerip=105.236.192.85,destnetwork=Vodacom (Pty) Ltd,calldesc=Vodacom (Pty) Ltd,circuit=quality,provider=is,cli=yes,dialed=0761916763,ulstype=normal,peerip=105.236.192.85 {noformat} ... This indicates that the call got hung up ... {noformat} [Feb 28 16:50:44] VERBOSE[2712][C-0000317a] pbx.c: == Spawn extension (uls-callprovider, 0761916763, 20) exited non-zero on 'Local/0761916763@uls-place-call-0000377b;2' [Feb 28 16:51:18] WARNING[23044] chan_iax2.c: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 11, ts=20015, seqno=6) [Feb 28 16:51:19] WARNING[23045] chan_iax2.c: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 2, ts=21017, seqno=7) {noformat} and from here on this is all I see. Looking at the code for some reason the channel never gets destroyed. At the location where the message is output (channels/chan_iax2.c, lines 3550-3569) it would seem that as long as iaxs[callno]=>owner is not NULL that the channel will never get destroyed and we will infinitely keep retransmitting a HANGUP request - which is likely to be ignored by the remote end due to the callid not existing. Looking at frame.h it would seem that type=6 => AST_FRAME_IAX, and subclass=2 and 11 represents AST_CONTROL_RING and AST_CONTROL_OPTION respectively. The logs on the remote server (not under my direct control) shows the following for the same call (time is offset by about an hour): {noformat} [2013-02-28 15:58:20] VERBOSE[23968] pbx.c: -- Executing [s@macro-dialout-trunk:20] Dial("SIP/100-000003ea", "IAX2/ulsvoip/0761916763,300,") in new stack [2013-02-28 15:58:20] VERBOSE[23968] app_dial.c: -- Called ulsvoip/0761916763 [2013-02-28 15:58:20] VERBOSE[23969] app_mixmonitor.c: == Begin MixMonitor Recording SIP/100-000003ea [2013-02-28 15:58:20] VERBOSE[2996] chan_iax2.c: -- Call accepted by 196.37.63.68 (format g729) [2013-02-28 15:58:20] VERBOSE[2996] chan_iax2.c: -- Format for call is g729 [2013-02-28 15:58:20] VERBOSE[23968] app_dial.c: -- IAX2/ulsvoip-16511 is making progress passing it to SIP/100-000003ea [2013-02-28 15:58:31] VERBOSE[23968] chan_iax2.c: -- Hungup 'IAX2/ulsvoip-16511' [2013-02-28 15:58:31] VERBOSE[23968] app_macro.c: == Spawn extension (macro-dialout-trunk, s, 20) exited non-zero on 'SIP/100-000003ea' in macro 'dialout-trunk' [2013-02-28 15:58:31] VERBOSE[23968] pbx.c: == Spawn extension (from-internal, 0761916763, 5) exited non-zero on 'SIP/100-000003ea' [2013-02-28 15:58:31] VERBOSE[23968] pbx.c: -- Executing [h@from-internal:1] Hangup("SIP/100-000003ea", "") in new stack [2013-02-28 15:58:31] VERBOSE[23968] pbx.c: == Spawn extension (from-internal, h, 1) exited non-zero on 'SIP/100-000003ea' {noformat} So I'm not sure what to make of this ... | ||
| Comments: | By: Rusty Newton (rnewton) 2013-03-06 15:50:23.834-0600 Sounds like a bug, but still need further information. 1.Is it only calls going through the "arctica" peer that get locked up? 1A. Does this peer handle only inbound, or both inbound and outbound calls? If so, do only one or the other lock up? That could lead us to a dialplan path that affects the issue. 2.It may not be possible considering you probably don't know which channel will lock up, but I should ask anyway - can you provide a PCAP of a complete call where its locked up, plus a comparison with a working call? 3.Can you provide the same log info, but along with the DEBUG message type? 4. Can you provide the same log information, but for a working call and attach to the issue? 5. can you provide the iax.conf configuration for the "arctica" peer? along with the [general] config? By: Rusty Newton (rnewton) 2013-03-25 13:04:30.913-0500 Jaco, going to let this one sit for another two weeks so you have time to gather the debug if possible. By: Jaco Kroon (jkroon) 2013-04-03 07:46:10.665-0500 Hi, 1. No, other IAX/2 peers also does this from time to time. 1A. Bi-directional call directions, usually only one side. 2. The most logical place for me to make this capture would require around 30-40GB of storage for pcap purposes per day. 3. Busy looking into this now. Need this for a SIP lockup in 11.3.0 too. Odly enough, this bug hasn't triggered in a whilst, but I can leave my switches running with debug messages on (only around 2GB of logs currently per day, so this will hopefully not push it over the limit). 4. 99% of the calls work, if I get the data for "3" I'll provide the full logs for a period around the error period. 5. See below: {noformat} [general] bindport=4569 iaxcompat=no delayreject=yes language=en tos=ef autokill=5000 iaxthreadcount=10 iaxmaxthreadcount=500 ;jitterbuffer=yes ;forcejitterbuffer=yes ;maxjitterbuffer=1000 ;maxjitterinterps=10 ;resyncthreshold=1000 codecpriority=host disallow=all allow=g729 ;allow=speex allow=gsm trunkfreq=10 trunktimestamps=yes trunkmtu=1240 calltokenoptional=0.0.0.0/0.0.0.0 maxcallnumbers=256 ; You may specify a global default AMA flag for iaxtel calls. It must be ; one of 'default', 'omit', 'billing', or 'documentation'. These flags ; are used in the generation of call detail records. ; amaflags=billing authdebug=no transfer=no regcontext=uls-trunks [arctica] type=friend context=uls-from-trunk secret=no_your_not_going_to_get_that_lucky username=ulsvoip host=dynamic qualify=yes accountcode=arctica trunk=yes requirecalltoken=auto maxcallnumbers=1024 transfer=no setvar=_realuser=arctica setvar=_codecannounce=1 {noformat} The config the client is using is a standard freepbx install in this case with the following (not my preferred setup, and we've seen this to other peers before too): {noformat} [general] disallow=all allow=ulaw allow=alaw allow=gsm mailboxdetail=yes tos=ef trunkfreq=10 trunktimestamps=yes register=arctica:no_your_not_going_to_get_that_lucky@voip.uls.co.za [ulsvoip] disallow=all username=arctica type=friend secret=no_your_not_going_to_get_that_lucky qualify=yes ; possibly make this a value like 200 host=voip.uls.co.za trunk=yes allow=g729 context=from-trunk {noformat} I must add that we've had to add a few other patches to IAX/2 to fix other issues (patches was all uploaded to issues.asterisk.org) this seems to have at the very least reduced and we haven't seen a case where it keeps escalating indefinitely again. By: Rusty Newton (rnewton) 2013-04-09 19:03:16.063-0500 Okay - we'll need the additional DEBUG info that corresponds with PCAPS for a call in question to really see what is happening here unless we have a way to reproduce. I'll put this in Waiting on Feedback for now. By: Jaco Kroon (jkroon) 2013-04-10 08:51:48.583-0500 Rusty, I'm not seeing this happen any more. And I hate that. I really do not like it if problems just go away. Don't get me wrong, I'm thankful that the problem is gone, but it means there is an underlying problem sitting in limbo somewhere just waiting to trigger and bite. For now I cannot see how we can possibly troubleshoot this, so let's close. If I get more information I'll log a new issue. By: Rusty Newton (rnewton) 2013-04-10 15:00:20.789-0500 Roger that. I hear you. That's crazy frustrating. Closing this for now. By: Bobby Hakimi (bobbymc) 2015-10-27 02:06:28.053-0500 this happens to me every day, i cant reproduce it on demand but it happens at random times during the day, its a production system we have and im at a loss, i dont even mind paying someone to help us fix this. can anyone please help or direct me to a place we can pay to fix this? i am getting very desperate and we are losing clients due to this | ||