[Home]

Summary:ASTERISK-01238: app_system not working under fedora FC1
Reporter:revk (revk)Labels:
Date Opened:2004-03-18 06:21:20.000-0600Date Closed:2004-09-25 02:18:20
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:Using app_system appears to work very intermittently but mosting not working. One problem appears to partly be that the handling of system() in app-system assumes it sees the exit code and not a return status as used by wait(2), and so it does not look at WIFEXITED(res), etc anyway. However the specific problem is that I normally get a -1 return with errno set to ECHLD. This appears to be a known return of wait(2) which system uses and can happen when SIGCHLD is caught. i.e. system is using wait(2) to get an exit status and SIGCHLD has picked it up already. This is supported by listener reporting interrupted system call before the app-system reports failure.

****** ADDITIONAL INFORMATION ******

It makes app_system completely unusable. The code needs changing to handle the result of system() being -1, or a wait(2) style exit code, and somehow we need to not trap the exit code using SIGCHLD if we want to see the result of the system call. If system() was changed to an explict fork() and change of signal handling to not trap SIGCHLD then it could work. I can write the change if you like.
Comments:By: revk (revk) 2004-03-18 07:00:07.000-0600

OK, reading the man page on system(3) it advises that SIGCHLD is blocked during exectution, which means that the ECHLD error from system(2) makes no sense. Maybe system(3) is broken in FC1... Damn annoying none the less, but possibly not actually a bug in Asterisk then.

By: revk (revk) 2004-03-18 07:42:57.000-0600

Doing an strace on a program using system() it runs sigprocmask(). There appears to be various comments on the behaviour of this in a multi-threaded environment. It looks like it may only block the signal to that thread, which is no use if another thread can get the SIGCHLD and clear the child. I built asterisk with no SIGCHLD handler and the problem went away - this provides it is a signal issue, but is not a fix as it leaves dead child processes lurking... Not sure of the answer to this one - maybe system() is unsafe in a multiple treaded environment with SIGCHLD being handled. I am trying to see if I can make any sort of work around.

By: twisted (twisted) 2004-03-21 23:54:21.000-0600

This seems familiar... from bug ASTERISK-1255 perhaps...   Maybe we should try using ast_safe_system() in app_system.c as well.  

Try this, see if it works:

--- app_system.c        2004-03-04 16:03:28.000000000 -0600
+++ app_system.c        2004-03-21 22:51:36.000000000 -0600
@@ -53,7 +53,7 @@
       }
       LOCAL_USER_ADD(u);
       /* Do our thing here */
-       res = system((char *)data);
+       res = ast_safe_system((char *)data);
       if ((res < 0) && (errno != ECHILD)) {
               ast_log(LOG_WARNING, "Unable to execute '%s'\n", (char *)data);
               res = -1;

edited on: 03-21-04 22:52

By: James Golovich (jamesgolovich) 2004-03-22 12:41:56.000-0600

The call to ast_safe_system from app_system.c has been commited to CVS -HEAD.

By: twisted (twisted) 2004-03-22 15:27:20.000-0600

Good.  revk, can you update on wether or not this fixes your issue?

By: twisted (twisted) 2004-03-23 04:35:11.000-0600

Reminder sent to revk

cvs has been updated to reflect ast_safe_system() call.  Please advise in bugtracker if this has resolved the issues so i can close it, if indeed it has, or we can investigate further if it has not.

By: Mark Spencer (markster) 2004-03-23 16:59:34.000-0600

I've also just made an improvement to ast_safe_system so that it should properly return the return status, accurately.

By: Mark Spencer (markster) 2004-03-31 03:16:45.000-0600

Well this should be fixed in CVS *head* and we haven't heard anything more from the bug poster so I'm going to mark this resolved, and they can re-open if it's not.