Summary:ASTERISK-06298: The Busy (or the Congestion) command can switch the channel in a zombie state
Reporter:TÓTH, Csaba (tcsaba)Labels:
Date Opened:2006-02-12 21:49:42.000-0600Date Closed:2011-06-07 14:03:13
Versions:Frequency of
Description:I have a callcenter with asterisk version 1.2.3.
The callcenter is always under load. There are ~25 operator whos uses it all days to makes outgoing calls.
After a few weeks there are 100-200 zombie channel in the system. Each channel frozed in the Busy() application. I checked it at night when nobody use the callcenter.


I try to exlore this bug in the running system with GDB.
I attached to a frozen thread, and I found that the thread is in a poll() libc function.
The timeout of the poll was (-1)! It means the poll never return if the channel not shows any activity.
I found a defense against this problem, but it isn't to able correct this situation.
I admit I don't understand excatly what makes this code (channel.c line 1574):

if (sizeof(int) == 4) {
               do {
                       int kbrms = rms;
                       if (kbrms > 600000)
                               kbrms = 600000;
                       res = poll(pfds, max, kbrms);
                       if (!res)
                               rms -= kbrms;
               } while (!res && (rms > 0));
       } else {
               res = poll(pfds, max, rms);

In a 32 bit system it's not able to defend against the infinite timeout, in a 64 bit system there isn't any protector code.

I have a workaround idee. The condition would be the following:

if ( (kbrms > 600000) || (kbrms < 0 ) ) kbrms = 600000;

My runnging system is a version 1.2.3 asterisk, but probably the problem is in the version 1.2.4 too (info from the source).

GDB stack trace at the specified thread:

(gdb) thread 50
[Switching to thread 50 (Thread -1257985104 (LWP 5738))]#0  0xffffe410 in ?? ()
(gdb) info stack
#0  0xffffe410 in ?? ()
#1  0xb5043de8 in ?? ()
#2  0xffffffff in ?? ()
#3  0x00000003 in ?? ()
#4  0x4a674a5c in poll () from /lib/libc.so.6
ASTERISK-1  0x08064583 in ast_waitfor_nandfds (c=0xb5043ef0, n=1, fds=0x0, nfds=0,
   exception=0x0, outfd=0x0, ms=0xb5043ef4) at channel.c:1580
ASTERISK-2  0x08064a8d in ast_waitfor (c=0x8c51378, ms=-1) at channel.c:1657
ASTERISK-3  0x08082d55 in wait_for_hangup (chan=0x8c51378, data=Variable "data" is not available.
) at pbx.c:5363
ASTERISK-4  0x08082e15 in pbx_builtin_busy (chan=0x8c51378, data=0xb5047fe8)
   at pbx.c:5397
ASTERISK-5  0x0808d103 in pbx_extension_helper (c=0x8c51378, con=Variable "con" is not available.
) at pbx.c:544
ASTERISK-6 0x0808e5a4 in __ast_pbx_run (c=0x8c51378) at pbx.c:2218
ASTERISK-7 0x0808f1ac in pbx_thread (data=0x8c51378) at pbx.c:2505
ASTERISK-8 0x4a739b80 in start_thread () from /lib/libpthread.so.0
ASTERISK-9 0x4a67e9ce in clone () from /lib/libc.so.6

(gdb) frame 5
ASTERISK-1  0x08064583 in ast_waitfor_nandfds (c=0xb5043ef0, n=1, fds=0x0, nfds=0,
   exception=0x0, outfd=0x0, ms=0xb5043ef4) at channel.c:1580
warning: Source file is more recent than executable.

1580                            res = poll(pfds, max, kbrms);
(gdb) info locals
kbrms = -1
res = Variable "res" is not available.
Comments:By: BJ Weschke (bweschke) 2006-02-13 21:25:24.000-0600

Busy([timeout]): This application will indicate the busy condition to the calling channel. If the optional timeout is specified, the calling channel will be hung up after the specified number of seconds. Otherwise, this application will wait until the calling channel hangs up.

If you don't specify a timeout, then an infinite timeout is appropriate because we're waiting for the calling channel to hangup indefinitely and when it does, that will break the poll statement. If it never does, then it will hang there indefinitely. Congestion behaves the same way. If you feel that this bug was closed in error and there was something I missed, please let me know and we'll take a second look.


By: TÓTH, Csaba (tcsaba) 2006-02-21 14:05:57.000-0600

I accept, that it isn't the Busy application's problem, but the problem is exists.
I got the following message from the asterisk after the `show channels` command:

callcenter*CLI> show channels
Channel              Location             State   Application(Data)
SIP/2008-c927        0659352212@localpool Busy    Busy()
SIP/2010-ea97        0612842703@localpool Busy    Busy()
SIP/2008-7155        0612350788@localpool Busy    Busy()
SIP/2008-9ad4        0614107576@localpool Busy    Busy()
SIP/2010-649d        0694440314@localpool Busy    Busy()
SIP/2010-3f4b        0626375099@localpool Busy    Busy()
SIP/2010-1795        0652232979@localpool Busy    Busy()
SIP/1005-5035        0642460280@localpool Busy    Busy()
SIP/2008-fd70        0696216318@localpool Busy    Busy()

A closer watch to a channel:

callcenter*CLI> show channel SIP/2008-c927
-- General -->
          Name: SIP/2008-c927
          Type: SIP
      UniqueID: 1140539055.70826
     Caller ID: (N/A)
Caller ID Name: (N/A)
   DNID Digits: (N/A)
         State: Busy (7)
         Rings: 0
  NativeFormat: 2
   WriteFormat: 64
    ReadFormat: 2
1st File Descriptor: 136
     Frames in: 177
    Frames out: 144
Time to Hangup: 0
  Elapsed Time: 4h28m5s                        <<<<<< !!!
 Direct Bridge: <none>
Indirect Bridge: <none>
--   PBX   --
       Context: localpool
     Extension: 0659352212
      Priority: 106
    Call Group: 0
  Pickup Group: 0
   Application: Busy
          Data: (Empty)
   Blocking in: ast_waitfor_nandfds

 CDR Variables:
level 1: dst=0659352212
level 1: dcontext=localpool
level 1: channel=SIP/2008-c927
level 1: dstchannel=IAX2/iax2trunk-2
level 1: lastapp=Busy
level 1: start=2006-02-21 17:24:17
level 1: answer=2006-02-21 17:24:17
level 1: end=2006-02-21 17:24:17
level 1: duration=0
level 1: billsec=0
level 1: disposition=ANSWERED
level 1: amaflags=DOCUMENTATION
level 1: uniqueid=1140539055.70826

We using softphones from Xten (the light version).

Thanks for your help...

By: Tilghman Lesher (tilghman) 2006-03-10 22:47:18.000-0600

This isn't Asterisk's problem, either.  If the existence of these hanging channels bothers you, I suggest that you use the timeout parameter to the Busy application, followed by a Hangup.