Summary: | ASTERISK-19867: asterisk fails to lower its priority when astcanary dies | ||||||
Reporter: | Xavier Hienne (xhienne) | Labels: | |||||
Date Opened: | 2012-05-14 09:35:34 | Date Closed: | 2016-10-26 08:59:16 | ||||
Priority: | Major | Regression? | No | ||||
Status: | Closed/Complete | Components: | Utilities/astcanary | ||||
Versions: | 1.8.7.2 1.8.11.1 1.8.12.0 11.23.1 | Frequency of Occurrence | |||||
Related Issues: |
| ||||||
Environment: | Debian GNU/Linux 6 (squeeze) Linux 2.6.32-5-686 and 2.6.32-5-686-bigmem GNU libc 2.11 | Attachments: | |||||
Description: | When astcanary dies, asterisk is supposed to lower its priority and return to the SCHED_OTHER scheduling policy.
This is not what I'm observing on Linux: apparently, the death of astcanary is detected by the watching thread, but it fails to change the priority of all other asterisk threads. I suspect that only the watching thread sees its priority changed (just before dying) and this might be the expected behavior of sched_setscheduler() with pid=0. Here is how to reproduce this issue: $ ps u $(pidof astcanary asterisk) USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 31854 6.8 3.3 34732 17120 ? Ssl 15:59 0:00 /usr/sbin/asterisk -p -M 1000 root 31855 0.0 0.1 2892 772 ? S 15:59 0:00 astcanary /var/run/asterisk/alt.asterisk.canary.tweet.tweet.tweet 31854 $ chrt -p $(pidof asterisk) pid 31854's current scheduling policy: SCHED_RR pid 31854's current scheduling priority: 10 $ killall astcanary (after some time => WARNING[31856] asterisk.c: The canary is no more...) $ chrt -p $(pidof asterisk) pid 31854's current scheduling policy: SCHED_RR pid 31854's current scheduling priority: 10 => asterisk is still at its highest priority | ||||||
Comments: | By: Rusty Newton (rnewton) 2016-09-15 17:26:43.068-0500 [~xhienne], I don't think we were ever able to reproduce this. Does this issue still occur for you on supported versions of Asterisk? By: Walter Doekes (wdoekes) 2016-09-16 08:44:13.301-0500 Same happens over here. The canary detection takes quite some time too.. {noformat} # asterisk -V Asterisk 11.23.1-osso1~jessie # chrt -p $(pidof asterisk) pid 16560's current scheduling policy: SCHED_RR pid 16560's current scheduling priority: 10 # date; pkill astcanary Fri Sep 16 15:39:58 CEST 2016 # tail -n1 /var/log/asterisk/messages.log [Sep 16 15:41:38] WARNING[16566] asterisk.c: The canary is no more. He has ceased to be! He's expired and gone to meet his maker! He's a stiff! Bereft of life, he rests in peace. His metabolic processes are now history! He's off the twig! He's kicked the bucket. He's shuffled off his mortal coil, run down the curtain, and joined the bleeding choir invisible!! THIS is an EX-CANARY. (Reducing priority) # date; chrt -p $(pidof asterisk) Fri Sep 16 15:42:20 CEST 2016 pid 16560's current scheduling policy: SCHED_RR pid 16560's current scheduling priority: 10 {noformat} Note that I'm running as user asterisk, so that may affect how easily the process can switch priority.. By: Walter Doekes (wdoekes) 2016-09-16 09:17:49.231-0500 Okay.. the problem was that sched_setscheduler fixed the astcanary-checking thread, and not the entire process. If I change it to change {{getpid()}} instead of {{0}} it appears to behave well, but it still doesn't really: {noformat} # chrt -p `pidof asterisk` pid 7220's current scheduling policy: SCHED_RR pid 7220's current scheduling priority: 10 # date; pkill astcanary Fri Sep 16 16:01:01 CEST 2016 ... [Sep 16 16:02:49] WARNING[7222] asterisk.c: The canary is no more. He has ceased to be! He's expired and gone to meet his maker! He's a stiff! Bereft of life, he rests in peace. His metabolic processes are now history! He's off the twig! He's kicked the bucket. He's shuffled off his mortal coil, run down the curtain, and joined the bleeding choir invisible!! THIS is an EX-CANARY. (Reducing priority) # chrt -p `pidof asterisk` pid 7220's current scheduling policy: SCHED_OTHER pid 7220's current scheduling priority: 0 # ps -eLf | grep sbin/asterisk | awk '{print $4}' | xargs -I{} chrt -p {} pid 7220's current scheduling policy: SCHED_OTHER pid 7220's current scheduling priority: 0 pid 7223's current scheduling policy: SCHED_RR pid 7223's current scheduling priority: 10 pid 7224's current scheduling policy: SCHED_RR pid 7224's current scheduling priority: 10 pid 7225's current scheduling policy: SCHED_RR pid 7225's current scheduling priority: 10 ... {noformat} {quote} Specifying pid as 0 will operate on the attributes of the calling thread, and passing the value returned from a call to getpid(2) will operate on the attributes of the main thread of the thread group. {quote} So, neither 0/gettid() nor getpid() is good enough for our purposes. Perhaps it's easiest to cheat and execve {{chrt -a -p GETPID}}, as I don't see an easy way to find the thread IDs of the running program. Or, if this is portable across linux kernels, check {{ls /proc/GETPID/task}} and call sched_setscheduler for every tid/numeric-dir found there. By: Walter Doekes (wdoekes) 2016-09-16 10:15:17.922-0500 This should probably fix things: https://gerrit.asterisk.org/#/c/3916 By: Walter Doekes (wdoekes) 2016-09-17 11:33:46.042-0500 I'm not sure this has ever worked... By: Walter Doekes (wdoekes) 2016-10-21 01:44:13.332-0500 [~rnewton] (repeated after lost msg on irc): ASTERISK-26352, ASTERISK-19867, ASTERISK-26358, ASTERISK-23989 aren't closed and tagged as fixed in 11.24.0; probably auto-close-by-jira failure and hence a lack of tagging for release. |