Summary: | ASTERISK-13119: AGI Leaves zombies behind it | ||
Reporter: | Eldad Ran (eldadran) | Labels: | |
Date Opened: | 2008-11-25 08:01:22.000-0600 | Date Closed: | 2009-01-25 14:31:24.000-0600 |
Priority: | Minor | Regression? | No |
Status: | Closed/Complete | Components: | Resources/res_agi |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 20081125__bug13968.diff.txt ( 1) 20090114__bug13968.diff.txt ( 2) 20090115__bug13968__1.4.22.1.diff.txt ( 3) bt_thread.txt ( 4) res_agi.c.rej | |
Description: | AGI using php leaves zombie process. it started when switching to 1.4.18, and after upgrading to 1.4.22 it still happens. The PHP version 5.1.6 | ||
Comments: | By: Vadim Sherbakov (vinsik) 2008-11-25 16:19:34.000-0600 The same thing happends to me too. I get a lot of zombies. It seems to be related to some type of call transfer. And it's very hard to reproduce. I will try to log this somehow and post here. Cheerz. By: Tilghman Lesher (tilghman) 2008-11-25 18:18:26.000-0600 If your process intercepts the SIGHUP and does any cleanup at all, it's possible for the zombie to stick around until all AGI processes are gone (which could be a long time, depending). This patch should fix it. Please test and give feedback on your results. By: Vadim Sherbakov (vinsik) 2008-11-26 00:56:31.000-0600 Great, i will test this ASAP and report back. By: Eldad Ran (eldadran) 2008-11-27 05:01:41.000-0600 The patch just crashed asterisk: (gdb) bt #0 0x00f32402 in __kernel_vsyscall () #1 0x00352ba0 in raise () from /lib/libc.so.6 #2 0x003544b1 in abort () from /lib/libc.so.6 #3 0x00388dfb in __libc_message () from /lib/libc.so.6 #4 0x00390aa6 in _int_free () from /lib/libc.so.6 ASTERISK-1 0x00393fc0 in free () from /lib/libc.so.6 ASTERISK-2 0x0088f09b in grim_reaper (data=0x0) at res_agi.c:2209 ASTERISK-3 0x080fe2eb in dummy_start (data=0x8501360) at utils.c:912 ASTERISK-4 0x004a045b in start_thread () from /lib/libpthread.so.0 ASTERISK-5 0x003f824e in clone () from /lib/libc.so.6 (gdb) bt full #0 0x00f32402 in __kernel_vsyscall () No symbol table info available. #1 0x00352ba0 in raise () from /lib/libc.so.6 No symbol table info available. #2 0x003544b1 in abort () from /lib/libc.so.6 No symbol table info available. #3 0x00388dfb in __libc_message () from /lib/libc.so.6 No symbol table info available. #4 0x00390aa6 in _int_free () from /lib/libc.so.6 No symbol table info available. ASTERISK-1 0x00393fc0 in free () from /lib/libc.so.6 No symbol table info available. ASTERISK-2 0x0088f09b in grim_reaper (data=0x0) at res_agi.c:2209 __list_next = (struct zombie *) 0x467120 __list_prev = (struct zombie *) 0x0 __new_prev = <value optimized out> cur = (struct zombie *) 0x86b7060 status = 1 ASTERISK-3 0x080fe2eb in dummy_start (data=0x8501360) at utils.c:912 __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {139469016, 0, -1209324656, -1209326648, -981143866, 2053123938}, __mask_was_saved = 0}}, __pad = {0xb7eb2480, 0x0, 0x0, 0x0}} __cancel_arg = (void *) 0xb7eb2b90 not_first_call = <value optimized out> ret = <value optimized out> ASTERISK-4 0x004a045b in start_thread () from /lib/libpthread.so.0 No symbol table info available. ASTERISK-5 0x003f824e in clone () from /lib/libc.so.6 No symbol table info available. By: Eldad Ran (eldadran) 2008-11-27 05:09:10.000-0600 On real time i got this: [root@localhost asterisk]# *** glibc detected *** asterisk: double free or corruption (!prev): 0x086b7060 *** ======= Backtrace: ========= /lib/libc.so.6[0x390aa6] /lib/libc.so.6(cfree+0x90)[0x393fc0] /usr/lib/asterisk/modules/res_agi.so[0x88f09b] asterisk[0x80fe2eb] /lib/libpthread.so.0[0x4a045b] /lib/libc.so.6(clone+0x5e)[0x3f824e] ======= Memory map: ======== 00110000-00135000 r-xp 00000000 fd:00 19709510 /usr/lib/libk5crypto.so.3.1 00135000-00136000 rwxp 00025000 fd:00 19709510 /usr/lib/libk5crypto.so.3.1 00136000-00171000 r-xp 00000000 fd:00 8651832 /lib/libsepol.so.1 00171000-00172000 rwxp 0003a000 fd:00 8651832 /lib/libsepol.so.1 00172000-0017c000 rwxp 00172000 00:00 0 0017c000-0018a000 r-xp 00000000 fd:00 20447631 /usr/lib/asterisk/modules/res_features.so 0018a000-0018c000 rwxp 0000d000 fd:00 20447631 /usr/lib/asterisk/modules/res_features.so 0018c000-0018f000 r-xp 00000000 fd:00 20447672 /usr/lib/asterisk/modules/app_talkdetect.so 0018f000-00190000 rwxp 00002000 fd:00 20447672 /usr/lib/asterisk/modules/app_talkdetect.so 00190000-00191000 r-xp 00000000 fd:00 20447702 /usr/lib/asterisk/modules/func_base64.so 00191000-00192000 rwxp 00000000 fd:00 20447702 /usr/lib/asterisk/modules/func_base64.so 00192000-00194000 r-xp 00000000 fd:00 20447703 /usr/lib/asterisk/modules/func_callerid.so 00194000-00195000 rwxp 00001000 fd:00 20447703 /usr/lib/asterisk/modules/func_callerid.so 00195000-00197000 r-xp 00000000 fd:00 20447675 /usr/lib/asterisk/modules/app_userevent.so 00197000-00198000 rwxp 00001000 fd:00 20447675 /usr/lib/asterisk/modules/app_userevent.so 00198000-00199000 r-xp 00000000 fd:00 20447713 /usr/lib/asterisk/modules/func_language.so 00199000-0019a000 rwxp 00000000 fd:00 20447713 /usr/lib/asterisk/modules/func_language.so 0019a000-001a0000 r-xp 00000000 fd:00 20447721 /usr/lib/asterisk/modules/func_strings.so 001a0000-001a1000 rwxp 00005000 fd:00 20447721 /usr/lib/asterisk/modules/func_strings.so 001a1000-001a3000 r-xp 00000000 fd:00 19714274 /usr/lib/libtonezone.so.1.0 001a3000-001d0000 rwxp 00002000 fd:00 19714274 /usr/lib/libtonezone.so.1.0 001d0000-001d2000 r-xp 00000000 fd:00 20447664 /usr/lib/asterisk/modules/app_read.so 001d2000-001d3000 rwxp 00002000 fd:00 20447664 /usr/lib/asterisk/modules/app_read.so 001d3000-001d4000 r-xp 00000000 fd:00 20447735 /usr/lib/asterisk/modules/app_cdr.so 001d4000-001d5000 rwxp 00000000 fd:00 20447735 /usr/lib/asterisk/modules/app_cdr.so 001d5000-001d7000 r-xp 00000000 fd:00 20447655 /usr/lib/asterisk/modules/app_dumpchan.so 001d7000-001d8000 rwxp 00001000 fd:00 20447655 /usr/lib/asterisk/modules/app_dumpchan.so 001d8000-001e6000 r-xp 00000000 fd:00 20447642 /usr/lib/asterisk/modules/pbx_config.so 001e6000-001e8000 rwxp 0000d000 fd:00 20447642 /usr/lib/asterisk/modules/pbx_config.so 001e8000-0020d000 r-xp 00000000 fd:00 19714275 /usr/lib/libpri.so.1.4 0020d000-00212000 rwxp 00024000 fd:00 19714275 /usr/lib/libpri.so.1.4 00212000-00214000 r-xp 00000000 fd:00 20447669 /usr/lib/asterisk/modules/app_setcallerid.so 00214000-00215000 rwxp 00001000 fd:00 20447669 /usr/lib/asterisk/modules/app_setcallerid.so 00215000-00218000 r-xp 00000000 fd:00 20447725 /usr/lib/asterisk/modules/chan_features.so 00218000-00219000 rwxp 00002000 fd:00 20447725 /usr/lib/asterisk/modules/chan_features.so 00219000-0021c000 r-xp 00000000 fd:00 20447698 /usr/lib/asterisk/modules/format_wav.so 0021c000-0021d000 rwxp 00002000 fd:00 20447698 /usr/lib/asterisk/modules/format_wav.so 0021d000-0021e000 r-xp 00000000 fd:00 20447711 /usr/lib/asterisk/modules/func_global.so 0021e000-0021f000 rwxp 00000000 fd:00 20447711 /usr/lib/asterisk/modules/func_global.so 0021f000-00220000 r-xp 00000000 fd:00 20447684 /usr/lib/asterisk/modules/codec_alaw.so 00220000-00221000 rwxp 00001000 fd:00 20447684 /usr/lib/asterisk/modules/codec_alaw.so 00221000-00223000 r-xp 00000000 fd:00 20447722 /usr/lib/asterisk/modules/func_timeout.so 00223000-00224000 rwxp 00001000 fd:00 20447722 /usr/lib/asterisk/modules/func_timeout.so/usr/sbin/safe_asterisk: line 42: 27033 Aborted (core dumped) asterisk ${CLIARGS} ${ASTARGS} >&/dev/${TTY} </dev/${TTY} Asterisk ended with exit status 134 Asterisk exited on signal 6. Automatically restarting Asterisk. By: Tilghman Lesher (tilghman) 2009-01-14 14:47:15.000-0600 d'oh! Silly memory initialization error. By: Eldad Ran (eldadran) 2009-01-15 01:47:07.000-0600 failed to patch on the newest release 1.4.22.1: [root@localhost asterisk]# wget 'http://bugs.digium.com/file_download.php?file_id=21230&type=bug' -O - | patch -p0 --07:44:48-- http://bugs.digium.com/file_download.php?file_id=21230&type=bug Resolving bugs.digium.com... 76.164.171.226 Connecting to bugs.digium.com|76.164.171.226|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2767 (2.7K) [text/plain] Saving to: `STDOUT' 100%[========================================================================>] 2,767 --.-K/s in 0s 07:44:49 (161 MB/s) - `-' saved [2767/2767] patching file res/res_agi.c Hunk #1 succeeded at 104 (offset -1 lines). Hunk #2 FAILED at 120. Hunk #3 succeeded at 1961 (offset -28 lines). Hunk #4 succeeded at 2224 (offset -1 lines). 1 out of 4 hunks FAILED -- saving rejects to file res/res_agi.c.rej Adding the rej file to attchments By: Tilghman Lesher (tilghman) 2009-01-15 12:34:20.000-0600 Patch updated for 1.4.22.1. The original patch is against 1.4 SVN. By: Steve Poirier (mousepad99) 2009-01-19 21:38:41.000-0600 Patched 1.4.22.1 - Still leave zombies until no AGI's are up, which never happens with a busy Asterisk server. Record of 3800 zombie process yesterday. Can't go back to previous versions. Only alternative is to use DeadAGI and use dial status check all the time. ready to pay for a bounty for this to get fixed. By: Eldad Ran (eldadran) 2009-01-20 01:40:45.000-0600 I still have zombies, but much less, the ones that I do have are cleared after less then 10 minutes. it is a busy setup (about 20K calls a day) so I can say its not perfect but its working. I used to have 25K zombies before the patch after 7 days of load but now after 3 days I have none (apart from the temporary ones). By: Steve Poirier (mousepad99) 2009-01-20 01:43:17.000-0600 Will test furthermore and see if they are now only temporary as reported by eldadran. Will report back. By: Eldad Ran (eldadran) 2009-01-25 04:00:18.000-0600 a week passed and its working and stable, I had 180K calls and no zombies on the system, can this patch can be pushed to the next release? By: Digium Subversion (svnbot) 2009-01-25 14:30:08.000-0600 Repository: asterisk Revision: 171120 U branches/1.4/res/res_agi.c ------------------------------------------------------------------------ r171120 | tilghman | 2009-01-25 14:30:07 -0600 (Sun, 25 Jan 2009) | 8 lines Add thread to kill zombies, when child processes don't die immediately on SIGHUP. (closes issue ASTERISK-13119) Reported by: eldadran Patches: 20090114__bug13968.diff.txt uploaded by Corydon76 (license 14) Tested by: eldadran ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=171120 By: Digium Subversion (svnbot) 2009-01-25 14:31:23.000-0600 Repository: asterisk Revision: 171121 _U trunk/ ------------------------------------------------------------------------ r171121 | tilghman | 2009-01-25 14:31:23 -0600 (Sun, 25 Jan 2009) | 14 lines Blocked revisions 171120 via svnmerge ........ r171120 | tilghman | 2009-01-25 14:30:41 -0600 (Sun, 25 Jan 2009) | 8 lines Add thread to kill zombies, when child processes don't die immediately on SIGHUP. (closes issue ASTERISK-13119) Reported by: eldadran Patches: 20090114__bug13968.diff.txt uploaded by Corydon76 (license 14) Tested by: eldadran ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=171121 |