|Summary:||ASTERISK-16731: Asterisk freezes when reloading dialplan|
|Reporter:||Michael Gaudette (bluefox)||Labels:|
|Date Opened:||2010-09-25 09:00:36||Date Closed:||2010-12-20 22:30:20.000-0600|
I had this occurs twice in the last 3 days. When doing 2 "dialplan reload", asterisk just freezes. It doesn't crash (and therefore doesn't restart). It just stays there until I rstart it manually, and it doesn't process calls or SIP registrations.
****** STEPS TO REPRODUCE ******
Cannot reproduce it reliabily, but a dialplan reload (or many of them quickly) might do it.
|Comments:||By: Leif Madsen (lmadsen) 2010-09-27 15:06:44|
This sounds like a deadlock to me. Please follow the instructions below. It might be also useful to provide a backtrace of the running process when this happens.
Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install)
This will then give you the console command:
core show locks
When the symptoms of the deadlock present themselves again, please provide output of the deadlock via:
# asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt
# gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt
gdb> bt full
gdb> thread apply all bt
Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks!
Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory.
Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:
after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt.
When complete, attach that file to this issue report. Thanks!
By: Michael Gaudette (bluefox) 2010-09-29 14:51:29
Sorry, trying to do this during off-hours, but can't right now.
By: Michael Gaudette (bluefox) 2010-09-29 14:58:46
Sorry, maybe it's been answer before, but what effect will DEBUG_THREADS and DONT_OPTIMIZE have on my server (I can only reproduce this on a busy server).
Are we talking about performance loss of 10-15%? 50%? Nothing?
By: Stefan Schmidt (schmidts) 2010-09-29 15:54:38
if your server is that busy that you will have a side effect of debug threads and dont optimize its time to buy biger hardware ;)
i have not tried how much % you will loose but its not that big deal, cause it would be more memory and less cpu time which is needed from this changes.
By: Michael Gaudette (bluefox) 2010-10-13 09:37:04
Could these attached files potentially show passwords, etc? If so, could you make this issue private so only asterisk dev see the files I will eventually upload?
I will try to reproduce this, it's very random but it did happen once in the last 2 weeks.
In en effort to automate this information gathering, is there a reliable way to programmatically see if asterisk is frozen?
By: Stefan Schmidt (schmidts) 2010-10-13 09:57:21
if you dont remove the passwords they would be readable by everyone, but we can make this private if you wish so.
you can set up a cronjob on another host to check if asterisk respond to an option message sent by sipsak for example.
By: Michael Gaudette (bluefox) 2010-10-13 09:58:58
I usually remove passwords from config files when I attach them, but I'm not sure how to do that with a backtrace without loosing relevant information. I guess we'll see when the backtrace is produced.
Thanks, I will set up something with sipsak.
By: Stefan Schmidt (schmidts) 2010-10-13 13:52:58
sorry i get you wrong. in a backtrace a password would not be shown so you can add a backtrace without any problems.
By: Michael Gaudette (bluefox) 2010-10-13 21:09:17
I'm willing to help, but changing those two compiler flags (DEBUG_THREADS and DONT_OPTIMIZE) turned my system into something unworkable. "SIP qualifies" went from 10-50ms to 1000+ms. I had about 700 peers on that system and no calls.
Major culprit was user CPU usage.
Had to roll-back to a normal build.
What can I do to help now? Anything that for all practical purposes won`t shut down the system?
By: Leif Madsen (lmadsen) 2010-10-14 13:34:58
Unfortunately without that information there is very little, if anything, we can do to move this issue forward. The information is required.
By: Michael Gaudette (bluefox) 2010-10-14 13:58:28
I can't put my system down for a week in the hope of it freezing, unfortunately.
I'll try to find a reliable way to reproduce it, and if I can I will do what you asked and create the problem.Until then you can close this, I will re-open it if I can go further.
By: Stefan Schmidt (schmidts) 2010-10-14 15:34:08
i have make this private so you can attach your dialplan and tell me exactly what you do when this happens. maybe i am able to reproduce this with generated load to the system
By: Michael Gaudette (bluefox) 2010-10-14 18:32:17
Well, I do nothing. I simply "dialplan reload" through the CLI. Then a complete
I can do this 50 times without it freezing, but then for no reason it freezes.
This never happens just out of the blue, always after a dialplan reload (in the CLI or using the manager interface)
By: Michael Gaudette (bluefox) 2010-10-14 18:44:44
BTW, if it matters: this is the size of the dialplan
= 1623 extensions (5477 priorities) in 361 contexts. =-
By: Stefan Schmidt (schmidts) 2010-10-14 18:56:05
how many hints do you have in your dialplan? hints could be a bastard in locking things even if you do this in runtime by dialplan reload i could think of a race condition if hints are reloaded, out of the dialplan and meanwhile a call is started which want to use one of this hints.
By: Michael Gaudette (bluefox) 2010-10-14 18:57:43
656 hints, 291 subscriptions. Might be it.
By: Michael Gaudette (bluefox) 2010-10-14 19:07:29
Extra note: some of them, maybe 15, are parking hints (hints to see if a parked call is at a specific spot)
By: Stefan Schmidt (schmidts) 2010-10-15 02:02:50
i am gone try this on my system with some load testings, if it would be the hints it should be easy to reproduce.
By: Stefan Schmidt (schmidts) 2010-10-15 02:36:59
i have tested this with 2500 hints, 522 subscriptions out of this hints and 150 calls per second and doing several dialplan reloads.
what i can see is that sip processing does not work reliable during a dialplan reload. If i do 10 dialplan reloads directly after another i see on sipp several retransmits and also many lost packets but the system does not stuck at all. but that my patched version which has a little more power than plain 220.127.116.11
i will try it again with a fresh 18.104.22.168 without any patches.
By: Stefan Schmidt (schmidts) 2010-10-15 04:34:00
ok testing this is not soo easy cause i have found a change in between 22.214.171.124 and 126.96.36.199 which slows down statehandling (which block hints).
cause with plain 188.8.131.52 i can do 10 dialplan reloads and loose only 50 calls per second. with 184.108.40.206-rc1 i dont have to do a dialplan reload do loose this amount of calls :(
but i still think there could be a race condition which cause the lock you ran into.
By: Michael Gaudette (bluefox) 2010-10-15 07:35:57
I'll definitely try 220.127.116.11 when it comes out. Until then, here might be a lead: one of the 4-5 times it happened, I was under some sort of bot attack (you know, trying exten 1001,1002, 1003....) and I reloaded the dialplan.
The other times I wasn't, but that might be an easy way to reproduce (you have any SIP-attack software handy?)
By: Stefan Schmidt (schmidts) 2010-10-15 08:31:19
sipvicous is the scanner you mean and this is just slow if you compare to sipp ;)
with sipp you can send several hundreds sip messages like invite per second and i didnt hit any problems with dialplan reload only some lost messages but no locking at all.
By: Michael Gaudette (bluefox) 2010-12-20 14:39:33.000-0600
It hasn`t happened ever since I moved to a version with a deadlock fix (18.104.22.168SVN, 22.214.171.124rc-1 is fine too)
You can close this I imagine
By: snuffy (snuffy) 2010-12-20 22:30:19.000-0600
Reporter claims its fixed in a later revision