Summary: | ASTERISK-06298: The Busy (or the Congestion) command can switch the channel in a zombie state | ||
Reporter: | TÓTH, Csaba (tcsaba) | Labels: | |
Date Opened: | 2006-02-12 21:49:42.000-0600 | Date Closed: | 2011-06-07 14:03:13 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | I have a callcenter with asterisk version 1.2.3. The callcenter is always under load. There are ~25 operator whos uses it all days to makes outgoing calls. After a few weeks there are 100-200 zombie channel in the system. Each channel frozed in the Busy() application. I checked it at night when nobody use the callcenter. ****** ADDITIONAL INFORMATION ****** I try to exlore this bug in the running system with GDB. I attached to a frozen thread, and I found that the thread is in a poll() libc function. The timeout of the poll was (-1)! It means the poll never return if the channel not shows any activity. I found a defense against this problem, but it isn't to able correct this situation. I admit I don't understand excatly what makes this code (channel.c line 1574): if (sizeof(int) == 4) { do { int kbrms = rms; if (kbrms > 600000) kbrms = 600000; res = poll(pfds, max, kbrms); if (!res) rms -= kbrms; } while (!res && (rms > 0)); } else { res = poll(pfds, max, rms); } In a 32 bit system it's not able to defend against the infinite timeout, in a 64 bit system there isn't any protector code. I have a workaround idee. The condition would be the following: if ( (kbrms > 600000) || (kbrms < 0 ) ) kbrms = 600000; My runnging system is a version 1.2.3 asterisk, but probably the problem is in the version 1.2.4 too (info from the source). -------------------------------------- GDB stack trace at the specified thread: (gdb) thread 50 [Switching to thread 50 (Thread -1257985104 (LWP 5738))]#0 0xffffe410 in ?? () (gdb) info stack #0 0xffffe410 in ?? () #1 0xb5043de8 in ?? () #2 0xffffffff in ?? () #3 0x00000003 in ?? () #4 0x4a674a5c in poll () from /lib/libc.so.6 ASTERISK-1 0x08064583 in ast_waitfor_nandfds (c=0xb5043ef0, n=1, fds=0x0, nfds=0, exception=0x0, outfd=0x0, ms=0xb5043ef4) at channel.c:1580 ASTERISK-2 0x08064a8d in ast_waitfor (c=0x8c51378, ms=-1) at channel.c:1657 ASTERISK-3 0x08082d55 in wait_for_hangup (chan=0x8c51378, data=Variable "data" is not available. ) at pbx.c:5363 ASTERISK-4 0x08082e15 in pbx_builtin_busy (chan=0x8c51378, data=0xb5047fe8) at pbx.c:5397 ASTERISK-5 0x0808d103 in pbx_extension_helper (c=0x8c51378, con=Variable "con" is not available. ) at pbx.c:544 ASTERISK-6 0x0808e5a4 in __ast_pbx_run (c=0x8c51378) at pbx.c:2218 ASTERISK-7 0x0808f1ac in pbx_thread (data=0x8c51378) at pbx.c:2505 ASTERISK-8 0x4a739b80 in start_thread () from /lib/libpthread.so.0 ASTERISK-9 0x4a67e9ce in clone () from /lib/libc.so.6 (gdb) frame 5 ASTERISK-1 0x08064583 in ast_waitfor_nandfds (c=0xb5043ef0, n=1, fds=0x0, nfds=0, exception=0x0, outfd=0x0, ms=0xb5043ef4) at channel.c:1580 warning: Source file is more recent than executable. 1580 res = poll(pfds, max, kbrms); (gdb) info locals kbrms = -1 res = Variable "res" is not available. (gdb) | ||
Comments: | By: BJ Weschke (bweschke) 2006-02-13 21:25:24.000-0600 Busy([timeout]): This application will indicate the busy condition to the calling channel. If the optional timeout is specified, the calling channel will be hung up after the specified number of seconds. Otherwise, this application will wait until the calling channel hangs up. If you don't specify a timeout, then an infinite timeout is appropriate because we're waiting for the calling channel to hangup indefinitely and when it does, that will break the poll statement. If it never does, then it will hang there indefinitely. Congestion behaves the same way. If you feel that this bug was closed in error and there was something I missed, please let me know and we'll take a second look. Thanks. By: TÓTH, Csaba (tcsaba) 2006-02-21 14:05:57.000-0600 I accept, that it isn't the Busy application's problem, but the problem is exists. I got the following message from the asterisk after the `show channels` command: callcenter*CLI> show channels Channel Location State Application(Data) SIP/2008-c927 0659352212@localpool Busy Busy() SIP/2010-ea97 0612842703@localpool Busy Busy() SIP/2008-7155 0612350788@localpool Busy Busy() SIP/2008-9ad4 0614107576@localpool Busy Busy() SIP/2010-649d 0694440314@localpool Busy Busy() SIP/2010-3f4b 0626375099@localpool Busy Busy() SIP/2010-1795 0652232979@localpool Busy Busy() SIP/1005-5035 0642460280@localpool Busy Busy() SIP/2008-fd70 0696216318@localpool Busy Busy() ... A closer watch to a channel: callcenter*CLI> show channel SIP/2008-c927 -- General --> Name: SIP/2008-c927 Type: SIP UniqueID: 1140539055.70826 Caller ID: (N/A) Caller ID Name: (N/A) DNID Digits: (N/A) State: Busy (7) Rings: 0 NativeFormat: 2 WriteFormat: 64 ReadFormat: 2 1st File Descriptor: 136 Frames in: 177 Frames out: 144 Time to Hangup: 0 Elapsed Time: 4h28m5s <<<<<< !!! Direct Bridge: <none> Indirect Bridge: <none> -- PBX -- Context: localpool Extension: 0659352212 Priority: 106 Call Group: 0 Pickup Group: 0 Application: Busy Data: (Empty) Blocking in: ast_waitfor_nandfds Variables: DIALSTATUS=BUSY MIXMONITOR_FILENAME=/mnt/data/monitor/2006-02-21/20060221_172415_0659352212.WAV SYSTEMSTATUS=SUCCESS monitorDirectoryName=2006-02-21 monitorFileName=20060221_172415_0659352212 operatorId=125 operatorName=S29uZG9yb3NpIEzDoXN6bMOzbsOpIFphbGFlZ2Vyc3plZw== SIPCALLID=1726469d4a450e2a6acc2c4742f98a02@192.168.0.20 CDR Variables: level 1: dst=0659352212 level 1: dcontext=localpool level 1: channel=SIP/2008-c927 level 1: dstchannel=IAX2/iax2trunk-2 level 1: lastapp=Busy level 1: start=2006-02-21 17:24:17 level 1: answer=2006-02-21 17:24:17 level 1: end=2006-02-21 17:24:17 level 1: duration=0 level 1: billsec=0 level 1: disposition=ANSWERED level 1: amaflags=DOCUMENTATION level 1: uniqueid=1140539055.70826 We using softphones from Xten (the light version). Thanks for your help... By: Tilghman Lesher (tilghman) 2006-03-10 22:47:18.000-0600 This isn't Asterisk's problem, either. If the existence of these hanging channels bothers you, I suggest that you use the timeout parameter to the Busy application, followed by a Hangup. |