Summary:ASTERISK-04598: [patch] Signal handling on Solaris loses handler after one signal
Reporter:Lee Essen (essele)Labels:
Date Opened:2005-07-15 05:17:33Date Closed:2011-06-07 14:10:40
Versions:Frequency of
Environment:Attachments:( 0) signal.patch
Description:Signals on solaris get reset each time a signal is received, so they need to be reset in the handlers otherwise you only get one shot at trapping each signal.

Looking through the asterisk.c code, SIGURG is already reset, however SIGHUP and SIGCHLD are not. I'm not sure what the implications are for ast_safe_system, but a cursory look makes me think that it doesn't need altering.

I've been testing AGI and noticing a load of zombies, basically caused by the SIGCHLD signal not being resetup.

Sample patch attached (apologies if it's not the right format)
Comments:By: Kevin P. Fleming (kpfleming) 2005-07-15 18:05:52

I've committed a version of your patch that actually compiles, thanks.

In the future, please review the bug posting guidelines carefully as to patch format and testing, and if you post a patch of any significant size the disclaimer will not be 'N/A' :-)

By: Lee Essen (essele) 2005-08-01 10:42:50

I've just managed to re-test this on Solaris and it's still leaving defunct processes around.

The signal handler is being re-setup which is fine, but delving a bit deeper it seems that wait4() has different semantics on Solaris (-1 for pid has a different meaning.)

Looking at the OpenSolaris.org code it seems that a negative number for pid waits for anything from that particular process group. Using 0 is the right approach for Solaris, so I think this will require an #ifdef.

   202 /*
   203 * Emulate undocumented 4.x semantics for 1186845
   204 */
   205 if (pid < 0) {
   206 pid = -pid;
   207 idtype = P_PGID;
   208 } else if (pid == 0)
   209 idtype = P_ALL;
   210 else
   211 idtype = P_PID;



By: Mark Spencer (markster) 2005-08-07 10:43:51

The value of pid can be one of:

      < -1   which means to wait for any child process whose  process  group
             ID is equal to the absolute value of pid.

      -1     which  means  to wait for any child process; this is equivalent
             to calling wait3.

      0      which means to wait for any child process whose  process  group
             ID is equal to that of the calling process.

      > 0    which  means to wait for the child whose process ID is equal to
             the value of pid.

Is what is listed for Linux.  Is that accurage for Solaris as well?

By: Lee Essen (essele) 2005-08-08 04:08:17

No. The semantics between Solaris and Linux appear to be different (see the code section I posted earlier from OpenSolaris.org).

Basically the usage in Asterisk is waiting for any child (i.e. using a pid of -1), in Solaris this waits for any child whose process group is 1 -- completely the wrong behaviour.

For Solaris, to wait for any child you need to use a pid of 0.  This would probably also work for Linux given that our child processes should be in our process group, but the safest bet would probably be an #ifdef to alter the pid value accordingly.

By: Michael Jerris (mikej) 2005-08-24 01:54:26

can you please provide an appropriately ifdefed patch to implement the appropriate symantecs for solaris.  What other platforms does this affect, BSD?  If this also affects other supported platforms, I would prefer for the patch to ifdef for them as well if you are able.  Thanks

By: Olle Johansson (oej) 2005-08-25 14:39:57

...and a confirmation of the disclaimer.

By: Michael Jerris (mikej) 2005-09-15 06:43:33

essele-  Have you lost interest in this or is this still in the works?  Somone who can test this on solaris needs to produce and test a patch or we will be unable to fix this issue.

By: Kevin P. Fleming (kpfleming) 2005-09-15 10:24:30

This seems to be completely wrong behavior on Solaris' part. The semantics of wait4() are from 4.3BSD, which is ancient history. There is no reason whatsoever for them to be changing the semantics at this point... Grrr.

By: Kevin P. Fleming (kpfleming) 2005-09-29 00:21:21

Suspending due to lack of response; hopefully in the near future we will have a Solaris machine here in the lab and we will be able to reproduce the problem.

By: Digium Subversion (svnbot) 2008-01-15 15:41:37.000-0600

Repository: asterisk
Revision: 6144

U   trunk/asterisk.c

r6144 | kpfleming | 2008-01-15 15:41:37 -0600 (Tue, 15 Jan 2008) | 2 lines

re-enable SIGHUP and SIGCHLD after they fire on platforms that require it (bug ASTERISK-4598)