|Summary:||ASTERISK-05098: Asterisk leaves zombie agi processes when running under linux 2.6|
|Reporter:||Ivan Tikhonov (ivan tikhonov)||Labels:|
|Date Opened:||2005-09-16 09:16:08||Date Closed:||2011-06-07 14:00:30|
|Description:||Under 2.4 kernel threads are separate processes (with differend pids) in their nature, but under 2.6 kernel threads are native - they share same pid and other process stuff.|
As a side effect under 2.4 agi zombies cleaned when parent process (asteris thread) exits. But under 2.6 this never happend (all treads running under single process and this process won't die while asterisk running).
As another side effect under 2.4 threads easily distinguished by pid number in logs but under 2.6 all records marked with main pid.
Each zombie process holds some file descriptors. When OS/process running out of available fd (1024 per process in my debian) service will be blocked because no new socket can not be allocated or files opened.
(see ASTERISK-3763794 also and put this guy's karma back lol)
****** ADDITIONAL INFORMATION ******
Gentelmens wipe zombies by explicitly calling wait() or waitpid().
To fix it simply add call of waitpid(pid, NULL, 0) after kill() in function run_agi (res/res_agi.c).
also lines if(pid > -1) in this function are not looking very good.
Probably if(pid > 0) would be better (any way killing 0 will kill all processes in process group current process belong to - bad thing).
don't forget to put #include <sys/wait.h> or cc will complain.
i'm using CVSv1-0 but as far as i can see code in latest cvs misses this too.
|Comments:||By: Tilghman Lesher (tilghman) 2005-09-16 10:16:40|
We already wait for all children with wait4(2) in asterisk.c:child_handler()
By: Ivan Tikhonov (ivan tikhonov) 2005-09-18 06:04:50
Sorry you are right. Zombie childs are leaved only when opened files count exceeds rlimit. ulimit -n 5000 solved this problem.
By: Ivan Tikhonov (ivan tikhonov) 2005-09-21 03:38:37
i revert my changes and got a lot of zombies without waitpid() in run_agi :(
By: Ivan Tikhonov (ivan tikhonov) 2005-09-21 12:39:06
looks like asterisk don't like NTPL (Native Thread POSIX Library).
Exporting LD_ASSUME_KERNEL=2.4.1 before starting it solves my problems.
By: Kevin P. Fleming (kpfleming) 2005-09-25 21:39:01
Asterisk runs fine on plenty of systems using NPTL, and I'm sure many of them are using AGI. In fact, that should make no difference here at all, since the AGI child is a standalone process, and threading libraries should not be an issue.
Corydon is correct, we already have a child handler in place to handle reaping of child processes, although there was recently a bug fix in CVS HEAD to correct a situation where System() (and related) apps could leave the child handler disabled.
Please update your system to the latest CVS HEAD to see if this issue is still present.
By: Matt O'Gorman (mogorman) 2005-10-04 12:15:10
Friendly reminder Ivan Tikhonov is this still an issue.
By: Mark Spencer (markster) 2005-10-12 01:49:46
Suspending from lack of user input.