[Home]

Summary:ASTERISK-13119: AGI Leaves zombies behind it
Reporter:Eldad Ran (eldadran)Labels:
Date Opened:2008-11-25 08:01:22.000-0600Date Closed:2009-01-25 14:31:24.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Resources/res_agi
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20081125__bug13968.diff.txt
( 1) 20090114__bug13968.diff.txt
( 2) 20090115__bug13968__1.4.22.1.diff.txt
( 3) bt_thread.txt
( 4) res_agi.c.rej
Description:AGI using php leaves zombie process.
it started when switching to 1.4.18, and after upgrading to 1.4.22 it still happens.
The PHP version 5.1.6
Comments:By: Vadim Sherbakov (vinsik) 2008-11-25 16:19:34.000-0600

The same thing happends to me too. I get a lot of zombies.
It seems to be related to some type of call transfer.
And it's very hard to reproduce.

I will try to log this somehow and post here.

Cheerz.

By: Tilghman Lesher (tilghman) 2008-11-25 18:18:26.000-0600

If your process intercepts the SIGHUP and does any cleanup at all, it's possible for the zombie to stick around until all AGI processes are gone (which could be a long time, depending).  This patch should fix it.  Please test and give feedback on your results.

By: Vadim Sherbakov (vinsik) 2008-11-26 00:56:31.000-0600

Great, i will test this ASAP and report back.

By: Eldad Ran (eldadran) 2008-11-27 05:01:41.000-0600

The patch just crashed asterisk:
(gdb) bt
#0  0x00f32402 in __kernel_vsyscall ()
#1  0x00352ba0 in raise () from /lib/libc.so.6
#2  0x003544b1 in abort () from /lib/libc.so.6
#3  0x00388dfb in __libc_message () from /lib/libc.so.6
#4  0x00390aa6 in _int_free () from /lib/libc.so.6
ASTERISK-1  0x00393fc0 in free () from /lib/libc.so.6
ASTERISK-2  0x0088f09b in grim_reaper (data=0x0) at res_agi.c:2209
ASTERISK-3  0x080fe2eb in dummy_start (data=0x8501360) at utils.c:912
ASTERISK-4  0x004a045b in start_thread () from /lib/libpthread.so.0
ASTERISK-5  0x003f824e in clone () from /lib/libc.so.6
(gdb) bt full
#0  0x00f32402 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00352ba0 in raise () from /lib/libc.so.6
No symbol table info available.
#2  0x003544b1 in abort () from /lib/libc.so.6
No symbol table info available.
#3  0x00388dfb in __libc_message () from /lib/libc.so.6
No symbol table info available.
#4  0x00390aa6 in _int_free () from /lib/libc.so.6
No symbol table info available.
ASTERISK-1  0x00393fc0 in free () from /lib/libc.so.6
No symbol table info available.
ASTERISK-2  0x0088f09b in grim_reaper (data=0x0) at res_agi.c:2209
       __list_next = (struct zombie *) 0x467120
       __list_prev = (struct zombie *) 0x0
       __new_prev = <value optimized out>
       cur = (struct zombie *) 0x86b7060
       status = 1
ASTERISK-3  0x080fe2eb in dummy_start (data=0x8501360) at utils.c:912
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {139469016, 0, -1209324656, -1209326648, -981143866, 2053123938},
     __mask_was_saved = 0}}, __pad = {0xb7eb2480, 0x0, 0x0, 0x0}}
       __cancel_arg = (void *) 0xb7eb2b90
       not_first_call = <value optimized out>
       ret = <value optimized out>
ASTERISK-4  0x004a045b in start_thread () from /lib/libpthread.so.0
No symbol table info available.
ASTERISK-5  0x003f824e in clone () from /lib/libc.so.6
No symbol table info available.

By: Eldad Ran (eldadran) 2008-11-27 05:09:10.000-0600

On real time i got this:
[root@localhost asterisk]# *** glibc detected *** asterisk: double free or corruption (!prev): 0x086b7060 ***
======= Backtrace: =========
/lib/libc.so.6[0x390aa6]
/lib/libc.so.6(cfree+0x90)[0x393fc0]
/usr/lib/asterisk/modules/res_agi.so[0x88f09b]
asterisk[0x80fe2eb]
/lib/libpthread.so.0[0x4a045b]
/lib/libc.so.6(clone+0x5e)[0x3f824e]
======= Memory map: ========
00110000-00135000 r-xp 00000000 fd:00 19709510   /usr/lib/libk5crypto.so.3.1
00135000-00136000 rwxp 00025000 fd:00 19709510   /usr/lib/libk5crypto.so.3.1
00136000-00171000 r-xp 00000000 fd:00 8651832    /lib/libsepol.so.1
00171000-00172000 rwxp 0003a000 fd:00 8651832    /lib/libsepol.so.1
00172000-0017c000 rwxp 00172000 00:00 0
0017c000-0018a000 r-xp 00000000 fd:00 20447631   /usr/lib/asterisk/modules/res_features.so
0018a000-0018c000 rwxp 0000d000 fd:00 20447631   /usr/lib/asterisk/modules/res_features.so
0018c000-0018f000 r-xp 00000000 fd:00 20447672   /usr/lib/asterisk/modules/app_talkdetect.so
0018f000-00190000 rwxp 00002000 fd:00 20447672   /usr/lib/asterisk/modules/app_talkdetect.so
00190000-00191000 r-xp 00000000 fd:00 20447702   /usr/lib/asterisk/modules/func_base64.so
00191000-00192000 rwxp 00000000 fd:00 20447702   /usr/lib/asterisk/modules/func_base64.so
00192000-00194000 r-xp 00000000 fd:00 20447703   /usr/lib/asterisk/modules/func_callerid.so
00194000-00195000 rwxp 00001000 fd:00 20447703   /usr/lib/asterisk/modules/func_callerid.so
00195000-00197000 r-xp 00000000 fd:00 20447675   /usr/lib/asterisk/modules/app_userevent.so
00197000-00198000 rwxp 00001000 fd:00 20447675   /usr/lib/asterisk/modules/app_userevent.so
00198000-00199000 r-xp 00000000 fd:00 20447713   /usr/lib/asterisk/modules/func_language.so
00199000-0019a000 rwxp 00000000 fd:00 20447713   /usr/lib/asterisk/modules/func_language.so
0019a000-001a0000 r-xp 00000000 fd:00 20447721   /usr/lib/asterisk/modules/func_strings.so
001a0000-001a1000 rwxp 00005000 fd:00 20447721   /usr/lib/asterisk/modules/func_strings.so
001a1000-001a3000 r-xp 00000000 fd:00 19714274   /usr/lib/libtonezone.so.1.0
001a3000-001d0000 rwxp 00002000 fd:00 19714274   /usr/lib/libtonezone.so.1.0
001d0000-001d2000 r-xp 00000000 fd:00 20447664   /usr/lib/asterisk/modules/app_read.so
001d2000-001d3000 rwxp 00002000 fd:00 20447664   /usr/lib/asterisk/modules/app_read.so
001d3000-001d4000 r-xp 00000000 fd:00 20447735   /usr/lib/asterisk/modules/app_cdr.so
001d4000-001d5000 rwxp 00000000 fd:00 20447735   /usr/lib/asterisk/modules/app_cdr.so
001d5000-001d7000 r-xp 00000000 fd:00 20447655   /usr/lib/asterisk/modules/app_dumpchan.so
001d7000-001d8000 rwxp 00001000 fd:00 20447655   /usr/lib/asterisk/modules/app_dumpchan.so
001d8000-001e6000 r-xp 00000000 fd:00 20447642   /usr/lib/asterisk/modules/pbx_config.so
001e6000-001e8000 rwxp 0000d000 fd:00 20447642   /usr/lib/asterisk/modules/pbx_config.so
001e8000-0020d000 r-xp 00000000 fd:00 19714275   /usr/lib/libpri.so.1.4
0020d000-00212000 rwxp 00024000 fd:00 19714275   /usr/lib/libpri.so.1.4
00212000-00214000 r-xp 00000000 fd:00 20447669   /usr/lib/asterisk/modules/app_setcallerid.so
00214000-00215000 rwxp 00001000 fd:00 20447669   /usr/lib/asterisk/modules/app_setcallerid.so
00215000-00218000 r-xp 00000000 fd:00 20447725   /usr/lib/asterisk/modules/chan_features.so
00218000-00219000 rwxp 00002000 fd:00 20447725   /usr/lib/asterisk/modules/chan_features.so
00219000-0021c000 r-xp 00000000 fd:00 20447698   /usr/lib/asterisk/modules/format_wav.so
0021c000-0021d000 rwxp 00002000 fd:00 20447698   /usr/lib/asterisk/modules/format_wav.so
0021d000-0021e000 r-xp 00000000 fd:00 20447711   /usr/lib/asterisk/modules/func_global.so
0021e000-0021f000 rwxp 00000000 fd:00 20447711   /usr/lib/asterisk/modules/func_global.so
0021f000-00220000 r-xp 00000000 fd:00 20447684   /usr/lib/asterisk/modules/codec_alaw.so
00220000-00221000 rwxp 00001000 fd:00 20447684   /usr/lib/asterisk/modules/codec_alaw.so
00221000-00223000 r-xp 00000000 fd:00 20447722   /usr/lib/asterisk/modules/func_timeout.so
00223000-00224000 rwxp 00001000 fd:00 20447722   /usr/lib/asterisk/modules/func_timeout.so/usr/sbin/safe_asterisk: line 42: 27033 Aborted                 (core dumped) asterisk ${CLIARGS} ${ASTARGS} >&/dev/${TTY} </dev/${TTY}
Asterisk ended with exit status 134
Asterisk exited on signal 6.
Automatically restarting Asterisk.

By: Tilghman Lesher (tilghman) 2009-01-14 14:47:15.000-0600

d'oh!  Silly memory initialization error.

By: Eldad Ran (eldadran) 2009-01-15 01:47:07.000-0600

failed to patch on the newest release 1.4.22.1:
[root@localhost asterisk]# wget 'http://bugs.digium.com/file_download.php?file_id=21230&type=bug' -O - | patch -p0
--07:44:48--  http://bugs.digium.com/file_download.php?file_id=21230&type=bug
Resolving bugs.digium.com... 76.164.171.226
Connecting to bugs.digium.com|76.164.171.226|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2767 (2.7K) [text/plain]
Saving to: `STDOUT'

100%[========================================================================>] 2,767       --.-K/s   in 0s    

07:44:49 (161 MB/s) - `-' saved [2767/2767]

patching file res/res_agi.c
Hunk #1 succeeded at 104 (offset -1 lines).
Hunk #2 FAILED at 120.
Hunk #3 succeeded at 1961 (offset -28 lines).
Hunk #4 succeeded at 2224 (offset -1 lines).
1 out of 4 hunks FAILED -- saving rejects to file res/res_agi.c.rej

Adding the rej file to attchments

By: Tilghman Lesher (tilghman) 2009-01-15 12:34:20.000-0600

Patch updated for 1.4.22.1.  The original patch is against 1.4 SVN.

By: Steve Poirier (mousepad99) 2009-01-19 21:38:41.000-0600

Patched 1.4.22.1 - Still leave zombies until no AGI's are up, which never happens with a busy Asterisk server. Record of 3800 zombie process yesterday. Can't go back to previous versions.  Only alternative is to use DeadAGI and use dial status check all the time. ready to pay for a bounty for this to get fixed.

By: Eldad Ran (eldadran) 2009-01-20 01:40:45.000-0600

I still have zombies, but much less, the ones that I do have are cleared after less then 10 minutes. it is a busy setup (about 20K calls a day) so I can say its not perfect but its working.
I used to have 25K zombies before the patch after 7 days of load but now after 3 days I have none (apart from the temporary ones).

By: Steve Poirier (mousepad99) 2009-01-20 01:43:17.000-0600

Will test furthermore and see if they are now only temporary as reported by eldadran. Will report back.

By: Eldad Ran (eldadran) 2009-01-25 04:00:18.000-0600

a week passed and its working and stable, I had 180K calls and no zombies on the system, can this patch can be pushed to the next release?

By: Digium Subversion (svnbot) 2009-01-25 14:30:08.000-0600

Repository: asterisk
Revision: 171120

U   branches/1.4/res/res_agi.c

------------------------------------------------------------------------
r171120 | tilghman | 2009-01-25 14:30:07 -0600 (Sun, 25 Jan 2009) | 8 lines

Add thread to kill zombies, when child processes don't die immediately on
SIGHUP.
(closes issue ASTERISK-13119)
Reported by: eldadran
Patches:
      20090114__bug13968.diff.txt uploaded by Corydon76 (license 14)
Tested by: eldadran

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=171120

By: Digium Subversion (svnbot) 2009-01-25 14:31:23.000-0600

Repository: asterisk
Revision: 171121

_U  trunk/

------------------------------------------------------------------------
r171121 | tilghman | 2009-01-25 14:31:23 -0600 (Sun, 25 Jan 2009) | 14 lines

Blocked revisions 171120 via svnmerge

........
 r171120 | tilghman | 2009-01-25 14:30:41 -0600 (Sun, 25 Jan 2009) | 8 lines
 
 Add thread to kill zombies, when child processes don't die immediately on
 SIGHUP.
 (closes issue ASTERISK-13119)
  Reported by: eldadran
  Patches:
        20090114__bug13968.diff.txt uploaded by Corydon76 (license 14)
  Tested by: eldadran
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=171121