Summary: | ASTERISK-16731: Asterisk freezes when reloading dialplan | ||
Reporter: | Michael Gaudette (bluefox) | Labels: | |
Date Opened: | 2010-09-25 09:00:36 | Date Closed: | 2010-12-20 22:30:20.000-0600 |
Priority: | Minor | Regression? | No |
Status: | Closed/Complete | Components: | Applications/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | Hi, I had this occurs twice in the last 3 days. When doing 2 "dialplan reload", asterisk just freezes. It doesn't crash (and therefore doesn't restart). It just stays there until I rstart it manually, and it doesn't process calls or SIP registrations. ****** STEPS TO REPRODUCE ****** Cannot reproduce it reliabily, but a dialplan reload (or many of them quickly) might do it. | ||
Comments: | By: Leif Madsen (lmadsen) 2010-09-27 15:06:44 This sounds like a deadlock to me. Please follow the instructions below. It might be also useful to provide a backtrace of the running process when this happens. ~~~~~~~~~~~~~~~~~~~ Debugging deadlocks: Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install) This will then give you the console command: core show locks When the symptoms of the deadlock present themselves again, please provide output of the deadlock via: # asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt # gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt gdb> bt gdb> bt full gdb> thread apply all bt Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks! ~~~~~~~~~~~~~~ Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory. Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then: make install after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt. When complete, attach that file to this issue report. Thanks! By: Michael Gaudette (bluefox) 2010-09-29 14:51:29 Sorry, trying to do this during off-hours, but can't right now. By: Michael Gaudette (bluefox) 2010-09-29 14:58:46 Sorry, maybe it's been answer before, but what effect will DEBUG_THREADS and DONT_OPTIMIZE have on my server (I can only reproduce this on a busy server). Are we talking about performance loss of 10-15%? 50%? Nothing? By: Stefan Schmidt (schmidts) 2010-09-29 15:54:38 if your server is that busy that you will have a side effect of debug threads and dont optimize its time to buy biger hardware ;) i have not tried how much % you will loose but its not that big deal, cause it would be more memory and less cpu time which is needed from this changes. By: Michael Gaudette (bluefox) 2010-10-13 09:37:04 Could these attached files potentially show passwords, etc? If so, could you make this issue private so only asterisk dev see the files I will eventually upload? I will try to reproduce this, it's very random but it did happen once in the last 2 weeks. In en effort to automate this information gathering, is there a reliable way to programmatically see if asterisk is frozen? By: Stefan Schmidt (schmidts) 2010-10-13 09:57:21 if you dont remove the passwords they would be readable by everyone, but we can make this private if you wish so. you can set up a cronjob on another host to check if asterisk respond to an option message sent by sipsak for example. By: Michael Gaudette (bluefox) 2010-10-13 09:58:58 I usually remove passwords from config files when I attach them, but I'm not sure how to do that with a backtrace without loosing relevant information. I guess we'll see when the backtrace is produced. Thanks, I will set up something with sipsak. By: Stefan Schmidt (schmidts) 2010-10-13 13:52:58 sorry i get you wrong. in a backtrace a password would not be shown so you can add a backtrace without any problems. By: Michael Gaudette (bluefox) 2010-10-13 21:09:17 I'm willing to help, but changing those two compiler flags (DEBUG_THREADS and DONT_OPTIMIZE) turned my system into something unworkable. "SIP qualifies" went from 10-50ms to 1000+ms. I had about 700 peers on that system and no calls. Major culprit was user CPU usage. Had to roll-back to a normal build. What can I do to help now? Anything that for all practical purposes won`t shut down the system? By: Leif Madsen (lmadsen) 2010-10-14 13:34:58 Unfortunately without that information there is very little, if anything, we can do to move this issue forward. The information is required. By: Michael Gaudette (bluefox) 2010-10-14 13:58:28 I can't put my system down for a week in the hope of it freezing, unfortunately. I'll try to find a reliable way to reproduce it, and if I can I will do what you asked and create the problem.Until then you can close this, I will re-open it if I can go further. By: Stefan Schmidt (schmidts) 2010-10-14 15:34:08 i have make this private so you can attach your dialplan and tell me exactly what you do when this happens. maybe i am able to reproduce this with generated load to the system By: Michael Gaudette (bluefox) 2010-10-14 18:32:17 Well, I do nothing. I simply "dialplan reload" through the CLI. Then a complete freeze. I can do this 50 times without it freezing, but then for no reason it freezes. This never happens just out of the blue, always after a dialplan reload (in the CLI or using the manager interface) By: Michael Gaudette (bluefox) 2010-10-14 18:44:44 BTW, if it matters: this is the size of the dialplan = 1623 extensions (5477 priorities) in 361 contexts. =- By: Stefan Schmidt (schmidts) 2010-10-14 18:56:05 how many hints do you have in your dialplan? hints could be a bastard in locking things even if you do this in runtime by dialplan reload i could think of a race condition if hints are reloaded, out of the dialplan and meanwhile a call is started which want to use one of this hints. By: Michael Gaudette (bluefox) 2010-10-14 18:57:43 656 hints, 291 subscriptions. Might be it. By: Michael Gaudette (bluefox) 2010-10-14 19:07:29 Extra note: some of them, maybe 15, are parking hints (hints to see if a parked call is at a specific spot) By: Stefan Schmidt (schmidts) 2010-10-15 02:02:50 i am gone try this on my system with some load testings, if it would be the hints it should be easy to reproduce. By: Stefan Schmidt (schmidts) 2010-10-15 02:36:59 i have tested this with 2500 hints, 522 subscriptions out of this hints and 150 calls per second and doing several dialplan reloads. what i can see is that sip processing does not work reliable during a dialplan reload. If i do 10 dialplan reloads directly after another i see on sipp several retransmits and also many lost packets but the system does not stuck at all. but that my patched version which has a little more power than plain 1.6.2.13 i will try it again with a fresh 1.6.2.13 without any patches. By: Stefan Schmidt (schmidts) 2010-10-15 04:34:00 ok testing this is not soo easy cause i have found a change in between 1.6.2.13 and 1.6.2.14 which slows down statehandling (which block hints). cause with plain 1.6.2.13 i can do 10 dialplan reloads and loose only 50 calls per second. with 1.6.2.14-rc1 i dont have to do a dialplan reload do loose this amount of calls :( but i still think there could be a race condition which cause the lock you ran into. By: Michael Gaudette (bluefox) 2010-10-15 07:35:57 I'll definitely try 1.6.2.14 when it comes out. Until then, here might be a lead: one of the 4-5 times it happened, I was under some sort of bot attack (you know, trying exten 1001,1002, 1003....) and I reloaded the dialplan. The other times I wasn't, but that might be an easy way to reproduce (you have any SIP-attack software handy?) By: Stefan Schmidt (schmidts) 2010-10-15 08:31:19 sipvicous is the scanner you mean and this is just slow if you compare to sipp ;) with sipp you can send several hundreds sip messages like invite per second and i didnt hit any problems with dialplan reload only some lost messages but no locking at all. By: Michael Gaudette (bluefox) 2010-12-20 14:39:33.000-0600 It hasn`t happened ever since I moved to a version with a deadlock fix (1.6.2.15SVN, 1.6.2.16rc-1 is fine too) You can close this I imagine By: snuffy (snuffy) 2010-12-20 22:30:19.000-0600 Reporter claims its fixed in a later revision |