ASTERISK-06123: asterisk multiple dead processes

[Home]

Summary: ASTERISK-06123: asterisk multiple dead processes

Reporter: Christian Benke (christianbee) Labels:

Date Opened: 2006-01-18 10:49:23.000-0600 Date Closed: 2006-02-15 08:00:39.000-0600

Priority: Trivial Regression? No

Status: Closed/Complete Components: Core/General

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments: ( 0) disa_script
( 1) list

Description: *note:zombie-process in this report is not ment as the technical term for a daughter process but my diction for the dead asterisk processes that don't seem to be part of the running, alive asterisk-instance

i upgraded my production-server from 1.2.0 to 1.2.1(svn checkout of asterisk 1.2.1, branch 1.2) two weeks ago(plus kernel update from 2.6.14-gentoo-r4 to r5). since then i experience a strange behaviour when i run top -U asterisk: there's not only one asterisk process as there always used to be, but there are several zombie*-processes whose numbers are increasing by the time(but not more than approx. 20 after few days). when i stop asterisk cleanly('stop now' or 'stop when convenient'), the main process is killed but the zombie processes are left and can only be killed with '-s 9'.
since i wanted to make sure that it is not a problem related to the svn-checkout, i installed 1.2.2 from the tar-archives today - but the issue reappeared after 2 hours:

top - 17:43:33 up 2:41, 3 users, load average: 0.22, 0.11, 0.08
Tasks: 77 total, 1 running, 76 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3% us, 0.5% sy, 0.0% ni, 99.1% id, 0.0% wa, 0.0% hi, 0.2% si
Mem: 2075856k total, 674788k used, 1401068k free, 29956k buffers
Swap: 3903784k total, 0k used, 3903784k free, 555524k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5947 asterisk -11 0 27892 12m 4432 S 3.3 0.6 4:07.84 asterisk
5998 asterisk -11 0 5608 4380 4196 S 0.0 0.2 0:01.07 mpg123
5999 asterisk -11 0 4332 1840 1656 S 0.0 0.1 0:00.56 mpg123
6000 asterisk -11 0 11852 5824 5640 S 0.0 0.3 0:00.56 mpg123
6001 asterisk -11 0 3716 420 248 S 0.0 0.0 0:00.00 mpg123
6004 asterisk -11 0 3712 416 248 S 0.0 0.0 0:00.00 mpg123
6005 asterisk -11 0 3716 428 248 S 0.0 0.0 0:00.00 mpg123
14326 asterisk -11 0 23632 6788 396 S 0.0 0.3 0:00.00 asterisk
14362 asterisk -11 0 23632 6788 396 S 0.0 0.3 0:00.00 asterisk
15651 asterisk -11 0 24408 7292 396 S 0.0 0.4 0:00.01 asterisk

i have a second machine with nearly the same hardware, the raid firmware-version differs and i have a sangoma isdn-card instead of the wct410p in the first machine, everything else is completely the same.
the software base is not exactly the same, system software versions may differ by 2 weeks, but nothing serious(imho). the asterisk version is the same, i have also upgraded to 1.2.2 tar-version today.
i never had the same problem on this machine.

i know that there could be many reasons for this problem, though i hope that someone knows this phenomena...

****** STEPS TO REPRODUCE ******

happens after some hours after restarting asterisk

Comments: By: Tilghman Lesher (tilghman) 2006-01-19 01:34:58.000-0600

What makes you think these are zombie processes? They don't show a zombie state in your process table.
By: Christian Benke (christianbee) 2006-01-19 01:52:38.000-0600

zombie processes not as the technical term, but in my diction as they are dead processes. when i stop asterisk cleanly, these processes are not stopped and can only be killed with '-s 9'.
By: Christian Benke (christianbee) 2006-01-19 03:29:29.000-0600

i've turned of the first asterisk server(the one with the problems) since some hours so the second server gets all the load(received only sip-calls from first server before which was half the load of the first server). now also the second server shows a additional, dead, asterisk process...:

top - 11:27:33 up 19:24, 1 user, load average: 0.10, 0.99, 0.72
Tasks: 65 total, 1 running, 64 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3% us, 0.1% sy, 0.0% ni, 92.9% id, 0.0% wa, 6.4% hi, 0.3% si
Mem: 2075856k total, 2023836k used, 52020k free, 26728k buffers
Swap: 3903784k total, 604k used, 3903180k free, 1896992k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6548 asterisk -11 0 27592 13m 4568 S 1.3 0.7 18:36.53 asterisk
6556 asterisk -11 0 5608 4376 4196 S 0.0 0.2 0:00.45 mpg123
6557 asterisk -11 0 4332 1840 1656 S 0.0 0.1 0:00.16 mpg123
6558 asterisk -11 0 11848 5820 5640 S 0.0 0.3 0:00.57 mpg123
6559 asterisk -11 0 3712 412 244 S 0.0 0.0 0:00.01 mpg123
6560 asterisk -11 0 3716 844 672 S 0.0 0.0 0:00.00 mpg123
6561 asterisk -11 0 3712 420 248 S 0.0 0.0 0:00.00 mpg123
19845 asterisk -11 0 26184 8868 660 S 0.0 0.4 0:00.00 asterisk

i didn't recognize any telephony-problems related to these multiple processes, however, i think it's an indicator that something is wrong...
By: Kevin P. Fleming (kpfleming) 2006-02-14 13:14:08.000-0600

Yes, clearly something is wrong on your server, as there is no reason that a second Asterisk process should be able to be running at the same time.

Notice that it also appears to not using any CPU time at all... How are you running Asterisk: manually, safe_asterisk, some other script?
By: Christian Benke (christianbee) 2006-02-15 02:18:33.000-0600

i've been able to track down the issue to the agi-scripts i call for 99% of the calls. it seems to be a deadlock-problem. i've setup a testdialplan that calls the agi's in a loop so we have several hundred calls in a second. when i start the loop with one call, it works without problems, when i start a second loop with a second call, the asterisk-processes appear(< 10 in one hour). this seems to be a problem in my (basic) bash-script when two processes try to access the same file at the same time(which doesn't happen too often) - since i move the query from the file to a database soon, i don't really care about the problem anymore.
thanks for your attention!
By: Tilghman Lesher (tilghman) 2006-02-15 08:00:39.000-0600

Reporter lost interest.