[Home]

Summary:ASTERISK-06123: asterisk multiple dead processes
Reporter:Christian Benke (christianbee)Labels:
Date Opened:2006-01-18 10:49:23.000-0600Date Closed:2006-02-15 08:00:39.000-0600
Priority:TrivialRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) disa_script
( 1) list
Description:*note:zombie-process in this report is not ment as the technical term for a daughter process but my diction for the dead asterisk processes that don't seem to be part of the running, alive asterisk-instance

i upgraded my production-server from 1.2.0 to 1.2.1(svn checkout of asterisk 1.2.1, branch 1.2) two weeks ago(plus kernel update from 2.6.14-gentoo-r4 to r5). since then i experience a strange behaviour when i run top -U asterisk: there's not only one asterisk process as there always used to be, but there are several zombie*-processes whose numbers are increasing by the time(but not more than approx. 20 after few days). when i stop asterisk cleanly('stop now' or 'stop when convenient'), the main process is killed but the zombie processes are left and can only be killed with '-s 9'.
since i wanted to make sure that it is not a problem related to the svn-checkout, i installed 1.2.2 from the tar-archives today - but the issue reappeared after 2 hours:

top - 17:43:33 up  2:41,  3 users,  load average: 0.22, 0.11, 0.08
Tasks:  77 total,   1 running,  76 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3% us,  0.5% sy,  0.0% ni, 99.1% id,  0.0% wa,  0.0% hi,  0.2% si
Mem:   2075856k total,   674788k used,  1401068k free,    29956k buffers
Swap:  3903784k total,        0k used,  3903784k free,   555524k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                
5947 asterisk -11   0 27892  12m 4432 S  3.3  0.6   4:07.84 asterisk                                                                                                
5998 asterisk -11   0  5608 4380 4196 S  0.0  0.2   0:01.07 mpg123                                                                                                  
5999 asterisk -11   0  4332 1840 1656 S  0.0  0.1   0:00.56 mpg123                                                                                                  
6000 asterisk -11   0 11852 5824 5640 S  0.0  0.3   0:00.56 mpg123                                                                                                  
6001 asterisk -11   0  3716  420  248 S  0.0  0.0   0:00.00 mpg123                                                                                                  
6004 asterisk -11   0  3712  416  248 S  0.0  0.0   0:00.00 mpg123                                                                                                  
6005 asterisk -11   0  3716  428  248 S  0.0  0.0   0:00.00 mpg123                                                                                                  
14326 asterisk -11   0 23632 6788  396 S  0.0  0.3   0:00.00 asterisk                                                                                                
14362 asterisk -11   0 23632 6788  396 S  0.0  0.3   0:00.00 asterisk                                                                                                
15651 asterisk -11   0 24408 7292  396 S  0.0  0.4   0:00.01 asterisk

i have a second machine with nearly the same hardware, the raid firmware-version differs and i have a sangoma isdn-card instead of the wct410p in the first machine, everything else is completely the same.
the software base is not exactly the same, system software versions may differ by 2 weeks, but nothing serious(imho). the asterisk version is the same, i have also upgraded to 1.2.2 tar-version today.
i never had the same problem on this machine.

i know that there could be many reasons for this problem, though i hope that someone knows this phenomena...

****** STEPS TO REPRODUCE ******

happens after some hours after restarting asterisk
Comments:By: Tilghman Lesher (tilghman) 2006-01-19 01:34:58.000-0600

What makes you think these are zombie processes?  They don't show a zombie state in your process table.

By: Christian Benke (christianbee) 2006-01-19 01:52:38.000-0600

zombie processes not as the technical term, but in my diction as they are dead processes. when i stop asterisk cleanly, these processes are not stopped and can only be killed with '-s 9'.

By: Christian Benke (christianbee) 2006-01-19 03:29:29.000-0600

i've turned of the first asterisk server(the one with the problems) since some hours so the second server gets all the load(received only sip-calls from first server before which was half the load of the first server). now also the second server shows a additional, dead, asterisk process...:

top - 11:27:33 up 19:24,  1 user,  load average: 0.10, 0.99, 0.72
Tasks:  65 total,   1 running,  64 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3% us,  0.1% sy,  0.0% ni, 92.9% id,  0.0% wa,  6.4% hi,  0.3% si
Mem:   2075856k total,  2023836k used,    52020k free,    26728k buffers
Swap:  3903784k total,      604k used,  3903180k free,  1896992k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                
6548 asterisk -11   0 27592  13m 4568 S  1.3  0.7  18:36.53 asterisk                                                                                                
6556 asterisk -11   0  5608 4376 4196 S  0.0  0.2   0:00.45 mpg123                                                                                                  
6557 asterisk -11   0  4332 1840 1656 S  0.0  0.1   0:00.16 mpg123                                                                                                  
6558 asterisk -11   0 11848 5820 5640 S  0.0  0.3   0:00.57 mpg123                                                                                                  
6559 asterisk -11   0  3712  412  244 S  0.0  0.0   0:00.01 mpg123                                                                                                  
6560 asterisk -11   0  3716  844  672 S  0.0  0.0   0:00.00 mpg123                                                                                                  
6561 asterisk -11   0  3712  420  248 S  0.0  0.0   0:00.00 mpg123                                                                                                  
19845 asterisk -11   0 26184 8868  660 S  0.0  0.4   0:00.00 asterisk

i didn't recognize any telephony-problems related to these multiple processes, however, i think it's an indicator that something is wrong...

By: Kevin P. Fleming (kpfleming) 2006-02-14 13:14:08.000-0600

Yes, clearly something is wrong on your server, as there is no reason that a second Asterisk process should be able to be running at the same time.

Notice that it also appears to not using any CPU time at all... How are you running Asterisk: manually, safe_asterisk, some other script?

By: Christian Benke (christianbee) 2006-02-15 02:18:33.000-0600

i've been able to track down the issue to the agi-scripts i call for 99% of the calls. it seems to be a deadlock-problem. i've setup a testdialplan that calls the agi's in a loop so we have several hundred calls in a second. when i start the loop with one call, it works without problems, when i start a second loop with a second call, the asterisk-processes appear(< 10 in one hour). this seems to be a problem in my (basic) bash-script when two processes try to access the same file at the same time(which doesn't happen too often) - since i move the query from the file to a database soon, i don't really care about the problem anymore.
thanks for your attention!

By: Tilghman Lesher (tilghman) 2006-02-15 08:00:39.000-0600

Reporter lost interest.