[Home]

Summary:ASTERISK-12018: [patch] Asterisk leaves zombie agi processes when running under linux 2.6
Reporter:gkloepfer (gkloepfer)Labels:
Date Opened:2008-05-14 12:06:17Date Closed:2008-05-14 16:33:58
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Resources/res_agi
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20080514__bug12648.diff.txt
( 1) zombie-issue.patch
Description:[Note: re-open, duplicate of Problem ID 0005238]

I have actually found a way to duplicate this bug.  If a long-running AGI script is executing by one user and another user hangs-up on an AGI script, a zombie will be left running.  This can be duplicated by using the following dialplan code:

exten =>_X.,1,Answer
exten =>_X.,n,AGI(time-request.agi)
exten =>_X.,n,Hangup

(time-request.agi is an AGI script that says the time every 10 seconds until the user hangs-up or presses a button on the keypad ... any long-running AGI script like this should work).

****** ADDITIONAL INFORMATION ******

When the signal handler is eventually returned to the "reaper" the zombie will go away. However, this can take a long time on a heavily-loaded PBX where lots of AGI scripts are running.  This is because the "reaper" (child_handler) will not get replaced until ALL AGI applications are completed.

The problem is that waitpid() is not getting called in run_agi() (res_agi.c) when a caller hangs-up on an AGI script.

I was able to prevent the zombies from happening by patching agi_exec_full() in res_agi.c to always call waitpid() even if it is redundant to assure that the child process status is handled.  This is not the proper way to fix this though.  I will attach the patch for clarity, though.
Comments:By: Tilghman Lesher (tilghman) 2008-05-14 14:57:44

Here's an alternate approach.  What do you think about this?  It still won't catch everything (any process that decides to continue past the SIGHUP, basically), but it should collect all zombies, while permitting some processes to continue, in parallel.

By: gkloepfer (gkloepfer) 2008-05-14 15:05:30

That seems to work, and I definitely think it's a better fix than mine since it actually addresses (rather than hides) the problem.

The only downside to this I can think of is if the AGI script traps SIGHUP and tries to do something more than 1ms long before exiting.  I can't think of too many AGI scripts that would do this, and really we wouldn't want this kind of thing holding open an Asterisk thread anyway...

By: Tilghman Lesher (tilghman) 2008-05-14 16:07:12

Actually, the AGI script could continue to do something for up to 100ms long (the length of a standard Unix context switch).  usleep(1) is a trick that we use that simply means "yield the processor and don't come back immediately".

By: gkloepfer (gkloepfer) 2008-05-14 16:24:00

Just keep multiprocessor issues in mind.  The person who wrote sysvinit made an assumption that a process context switch would provide some delay for the child to do some work, and it turns out that both processes were running simultaneously with >1 CPU which broke that assumption (http://bugs.gentoo.org/show_bug.cgi?id=188262).

The good news is that in the worst case we'll just have a few zombie processes that will eventually get cleaned-up.

By: Digium Subversion (svnbot) 2008-05-14 16:32:10

Repository: asterisk
Revision: 116466

U   branches/1.4/res/res_agi.c

------------------------------------------------------------------------
r116466 | tilghman | 2008-05-14 16:32:09 -0500 (Wed, 14 May 2008) | 7 lines

Avoid zombies when the channel exits before the AGI.
(closes issue ASTERISK-12018)
Reported by: gkloepfer
Patches:
      20080514__bug12648.diff.txt uploaded by Corydon76 (license 14)
Tested by: gkloepfer

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=116466

By: Digium Subversion (svnbot) 2008-05-14 16:33:05

Repository: asterisk
Revision: 116467

_U  trunk/
U   trunk/res/res_agi.c

------------------------------------------------------------------------
r116467 | tilghman | 2008-05-14 16:33:04 -0500 (Wed, 14 May 2008) | 15 lines

Merged revisions 116466 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r116466 | tilghman | 2008-05-14 16:38:09 -0500 (Wed, 14 May 2008) | 7 lines

Avoid zombies when the channel exits before the AGI.
(closes issue ASTERISK-12018)
Reported by: gkloepfer
Patches:
      20080514__bug12648.diff.txt uploaded by Corydon76 (license 14)
Tested by: gkloepfer

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=116467

By: Digium Subversion (svnbot) 2008-05-14 16:33:58

Repository: asterisk
Revision: 116468

_U  branches/1.6.0/
U   branches/1.6.0/res/res_agi.c

------------------------------------------------------------------------
r116468 | tilghman | 2008-05-14 16:33:57 -0500 (Wed, 14 May 2008) | 23 lines

Merged revisions 116467 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r116467 | tilghman | 2008-05-14 16:39:06 -0500 (Wed, 14 May 2008) | 15 lines

Merged revisions 116466 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r116466 | tilghman | 2008-05-14 16:38:09 -0500 (Wed, 14 May 2008) | 7 lines

Avoid zombies when the channel exits before the AGI.
(closes issue ASTERISK-12018)
Reported by: gkloepfer
Patches:
      20080514__bug12648.diff.txt uploaded by Corydon76 (license 14)
Tested by: gkloepfer

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=116468