|Summary:||ASTERISK-02215: [report + patch] System() application troubles.|
|Date Opened:||2004-08-12 16:17:00||Date Closed:||2008-01-15 15:05:29.000-0600|
|Environment:||Attachments:||( 0) 20040820__proper_sigchld_handling.diff.txt|
( 1) asterisk.diff
|Description:||I had this in * dialplan:|
exten = s,1,System(ls /my/file/name)
to check whether or not /my/file/name exists. In nearly 80%
of the cases when /my/file/name did not exist, System() returned 0 meaning that the file does exist.
Further research showed that "wait4" call in ast_safe_system returns with ECHILD and asterisk is unable
to get the return code of the child process.
This is apparently a race condition between child to complete "ls" call and parent to execute "wait4"
call. Does anybody know the right way to fix it? Is there a way on Linux to - may be - create a child
in a suspended state and then somehow release it inside of wait4 call? Or may be a SIGCHLD handler can
|Comments:||By: constfilin (constfilin) 2004-08-12 16:44:51|
I replaced "wait4" with "waitpid". This should take care of the bug
By: constfilin (constfilin) 2004-08-12 17:21:57
It turns out that SIGCHLD handler needs to be installed or else we get ECHILD on own own children... A new patch is tested and submitted.
By: Tilghman Lesher (tilghman) 2004-08-12 23:54:58
1. Why are you excluding BSD with this fix?
2. Instead of creating an empty handler, why not simply set the signal handler to SIG_IGN for the brief time that you need to intercept the return status of the child process?
3. The manpage specifies identical behavior for wait4(2) and waitpid(2) when rusage is set to 0. In fact, waitpid(2) is *implemented* using wait4(2), so there is _zero_ difference between using the two of them.
By: constfilin (constfilin) 2004-08-13 12:35:16
1. wait4 is a BSD thing and not to break anything on BSD (when waitpid is
unavailable or something) I left the BSD code intact.
2. This is a quote from my manpage for waitpid:
ECHILD if the process specified in pid does not exist or is not a child
of the calling process. (This can happen for oneÃ¢s own child if
the action for SIGCHLD is set to SIG_IGN. See also the LINUX
NOTES section about threads.)
I am on Fedora Core 1.
3. On another machine (
Linux version 2.4.18-3smp (firstname.lastname@example.org) (gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)) #1 SMP Thu Apr 18 07:27:31 EDT 2002)
I saw wait4 returning errno ECHILD when child finished before wait4 is
called. The fix I provided fixed it. So there is evidence that wait4 and
waitpid are not identical.
By: Tilghman Lesher (tilghman) 2004-08-13 13:26:35
1. waitpid(2) and wait4(2) can both be found on BSD. You don't need to worry about compatibility here. And you didn't leave the BSD code intact; you _introduced_ an exclusion for BSD. Unless you have a concrete reason for adding exclusion code (i.e. you tested it and waitpid(2) doesn't work on BSD), you shouldn't be adding exclusion code. Let the BSD porters add that if necessary.
3. There is no such evidence. I specifically looked up the KERNEL SOURCE CODE and the implementation for waitpid(2) _CALLS_ wait4(2) with rusage set to 0. This is true on both Linux and BSD. Go check the kernel source code if you don't believe me. (Actually, the kernel both implements waitpid(2), as well as refers you to libc. Libc also implements waitpid(2) in exactly the same way.) I suspect that the code you added to change the child handler is what fixed Redhat 7.3.
By: Mark Spencer (markster) 2004-08-13 15:16:24
What if we just usleep(1) before we wait4?
By: Mark Spencer (markster) 2004-08-13 15:23:48
sorry, i mean in the child portion.
By: constfilin (constfilin) 2004-08-13 16:09:28
I probably have an old Linux kernel on the system I saw the bug at, but
I cannot talk to my sysadmins to renew the kernel.
I dont' mind usleep(1) in the child portion, but this is not bulletproof.
Since waitpid and wait4 are both available on BSD, why don't we remove
"#ifdef BSD" and leave only with waitpid. At least this will make asterisk System() work on old kernels..
What do you think?
By: Tilghman Lesher (tilghman) 2004-08-20 08:48:47
What kernel version has this behavior?
By: constfilin (constfilin) 2004-08-20 12:17:15
The kernel version was
Linux version 2.4.18-3smp (email@example.com) (gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)) 1 SMP Thu Apr 18 07:27:31 EDT 2002
I think that it was a 2 processor machine.
By: Tilghman Lesher (tilghman) 2004-08-20 12:45:42
[root@dual600 src]# tail linux-2.4.18/kernel/exit.c
* sys_waitpid() remains for compatibility. waitpid() should be
* implemented by calling sys_wait4() from libc.a.
asmlinkage long sys_waitpid(pid_t pid,unsigned int * stat_addr, int options)
return sys_wait4(pid, stat_addr, options, NULL);
By: Tilghman Lesher (tilghman) 2004-08-20 12:56:37
Patch updated to fix just the bug, not this wait4/waitpid nonsense.
By: Mark Spencer (markster) 2004-08-21 15:16:26
How exactly does setting up this empty signal handler make a difference?
By: Tilghman Lesher (tilghman) 2004-08-22 14:12:44
The return status of SIGCHLD currently cannot be returned from System by virtue of the fact that the current SIGCHLD handler looks for and collects each child's return status, as each child terminates, but does not save that
return status for acquisition by ast_safe_system(). By temporarily suspending
the indiscriminate reaping of child exit statuses with an empty signal handler, it becomes possible for ast_safe_system() to reliably return the exit status of the child process.
By: Mark Spencer (markster) 2004-08-22 14:32:20
Understood and applied to CVS...
By: Digium Subversion (svnbot) 2008-01-15 15:05:29.000-0600
r3633 | markster | 2008-01-15 15:05:28 -0600 (Tue, 15 Jan 2008) | 2 lines
Fix sigchld handling (bug ASTERISK-2215)