Summary: | ASTERISK-06123: asterisk multiple dead processes | ||
Reporter: | Christian Benke (christianbee) | Labels: | |
Date Opened: | 2006-01-18 10:49:23.000-0600 | Date Closed: | 2006-02-15 08:00:39.000-0600 |
Priority: | Trivial | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) disa_script ( 1) list | |
Description: | *note:zombie-process in this report is not ment as the technical term for a daughter process but my diction for the dead asterisk processes that don't seem to be part of the running, alive asterisk-instance i upgraded my production-server from 1.2.0 to 1.2.1(svn checkout of asterisk 1.2.1, branch 1.2) two weeks ago(plus kernel update from 2.6.14-gentoo-r4 to r5). since then i experience a strange behaviour when i run top -U asterisk: there's not only one asterisk process as there always used to be, but there are several zombie*-processes whose numbers are increasing by the time(but not more than approx. 20 after few days). when i stop asterisk cleanly('stop now' or 'stop when convenient'), the main process is killed but the zombie processes are left and can only be killed with '-s 9'. since i wanted to make sure that it is not a problem related to the svn-checkout, i installed 1.2.2 from the tar-archives today - but the issue reappeared after 2 hours: top - 17:43:33 up 2:41, 3 users, load average: 0.22, 0.11, 0.08 Tasks: 77 total, 1 running, 76 sleeping, 0 stopped, 0 zombie Cpu(s): 0.3% us, 0.5% sy, 0.0% ni, 99.1% id, 0.0% wa, 0.0% hi, 0.2% si Mem: 2075856k total, 674788k used, 1401068k free, 29956k buffers Swap: 3903784k total, 0k used, 3903784k free, 555524k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5947 asterisk -11 0 27892 12m 4432 S 3.3 0.6 4:07.84 asterisk 5998 asterisk -11 0 5608 4380 4196 S 0.0 0.2 0:01.07 mpg123 5999 asterisk -11 0 4332 1840 1656 S 0.0 0.1 0:00.56 mpg123 6000 asterisk -11 0 11852 5824 5640 S 0.0 0.3 0:00.56 mpg123 6001 asterisk -11 0 3716 420 248 S 0.0 0.0 0:00.00 mpg123 6004 asterisk -11 0 3712 416 248 S 0.0 0.0 0:00.00 mpg123 6005 asterisk -11 0 3716 428 248 S 0.0 0.0 0:00.00 mpg123 14326 asterisk -11 0 23632 6788 396 S 0.0 0.3 0:00.00 asterisk 14362 asterisk -11 0 23632 6788 396 S 0.0 0.3 0:00.00 asterisk 15651 asterisk -11 0 24408 7292 396 S 0.0 0.4 0:00.01 asterisk i have a second machine with nearly the same hardware, the raid firmware-version differs and i have a sangoma isdn-card instead of the wct410p in the first machine, everything else is completely the same. the software base is not exactly the same, system software versions may differ by 2 weeks, but nothing serious(imho). the asterisk version is the same, i have also upgraded to 1.2.2 tar-version today. i never had the same problem on this machine. i know that there could be many reasons for this problem, though i hope that someone knows this phenomena... ****** STEPS TO REPRODUCE ****** happens after some hours after restarting asterisk | ||
Comments: | By: Tilghman Lesher (tilghman) 2006-01-19 01:34:58.000-0600 What makes you think these are zombie processes? They don't show a zombie state in your process table. By: Christian Benke (christianbee) 2006-01-19 01:52:38.000-0600 zombie processes not as the technical term, but in my diction as they are dead processes. when i stop asterisk cleanly, these processes are not stopped and can only be killed with '-s 9'. By: Christian Benke (christianbee) 2006-01-19 03:29:29.000-0600 i've turned of the first asterisk server(the one with the problems) since some hours so the second server gets all the load(received only sip-calls from first server before which was half the load of the first server). now also the second server shows a additional, dead, asterisk process...: top - 11:27:33 up 19:24, 1 user, load average: 0.10, 0.99, 0.72 Tasks: 65 total, 1 running, 64 sleeping, 0 stopped, 0 zombie Cpu(s): 0.3% us, 0.1% sy, 0.0% ni, 92.9% id, 0.0% wa, 6.4% hi, 0.3% si Mem: 2075856k total, 2023836k used, 52020k free, 26728k buffers Swap: 3903784k total, 604k used, 3903180k free, 1896992k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6548 asterisk -11 0 27592 13m 4568 S 1.3 0.7 18:36.53 asterisk 6556 asterisk -11 0 5608 4376 4196 S 0.0 0.2 0:00.45 mpg123 6557 asterisk -11 0 4332 1840 1656 S 0.0 0.1 0:00.16 mpg123 6558 asterisk -11 0 11848 5820 5640 S 0.0 0.3 0:00.57 mpg123 6559 asterisk -11 0 3712 412 244 S 0.0 0.0 0:00.01 mpg123 6560 asterisk -11 0 3716 844 672 S 0.0 0.0 0:00.00 mpg123 6561 asterisk -11 0 3712 420 248 S 0.0 0.0 0:00.00 mpg123 19845 asterisk -11 0 26184 8868 660 S 0.0 0.4 0:00.00 asterisk i didn't recognize any telephony-problems related to these multiple processes, however, i think it's an indicator that something is wrong... By: Kevin P. Fleming (kpfleming) 2006-02-14 13:14:08.000-0600 Yes, clearly something is wrong on your server, as there is no reason that a second Asterisk process should be able to be running at the same time. Notice that it also appears to not using any CPU time at all... How are you running Asterisk: manually, safe_asterisk, some other script? By: Christian Benke (christianbee) 2006-02-15 02:18:33.000-0600 i've been able to track down the issue to the agi-scripts i call for 99% of the calls. it seems to be a deadlock-problem. i've setup a testdialplan that calls the agi's in a loop so we have several hundred calls in a second. when i start the loop with one call, it works without problems, when i start a second loop with a second call, the asterisk-processes appear(< 10 in one hour). this seems to be a problem in my (basic) bash-script when two processes try to access the same file at the same time(which doesn't happen too often) - since i move the query from the file to a database soon, i don't really care about the problem anymore. thanks for your attention! By: Tilghman Lesher (tilghman) 2006-02-15 08:00:39.000-0600 Reporter lost interest. |