Summary: | ASTERISK-13473: [patch] lockout after AEL reload | ||
Reporter: | pj (pj) | Labels: | |
Date Opened: | 2009-01-28 13:01:29.000-0600 | Date Closed: | 2009-02-18 09:59:35.000-0600 |
Priority: | Minor | Regression? | No |
Status: | Closed/Complete | Components: | PBX/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 14357.diff ( 1) gdb.txt | |
Description: | My asterisk get locked, after AEL reload, It seems, that it happen, if I use pattern match for hints in dialplan, like: hint(SIP/${EXTEN}) _ZXX! => NoOP; CLI console output with some errors before lock happen is attached, also 'core show locks' output is attached ****** ADDITIONAL INFORMATION ****** [Jan 28 19:52:42] ERROR[5265]: /root/src/asterisk170835/include/asterisk/lock.h:1236 _ast_rwlock_rdlock: pbx.c line 5236 (handle_show_hints): Error obtaining read lock: Resource temporarily unavailable [Jan 28 19:52:42] ERROR[5265]: /root/src/asterisk170835/include/asterisk/lock.h:1105 _ast_rwlock_unlock: pbx.c line 5258 (handle_show_hints): rwlock '&(&hints)->lock' freed more times than we've locked! [Jan 28 19:52:43] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1236 _ast_rwlock_rdlock: pbx.c line 9104 (ast_rdlock_contexts): Error obtaining read lock: Resource temporarily unavailable [Jan 28 19:52:43] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1105 _ast_rwlock_unlock: pbx.c line 9109 (ast_unlock_contexts): rwlock '&conlock' freed more times than we've locked! [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1236 _ast_rwlock_rdlock: pbx.c line 9104 (ast_rdlock_contexts): Error obtaining read lock: Resource temporarily unavailable [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1093 _ast_rwlock_unlock: pbx.c line 9109 (ast_unlock_contexts): attempted unlock rwlock '&conlock' without owning it! [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1095 _ast_rwlock_unlock: pbx.c line 9104 (ast_rdlock_contexts): '&conlock' was last locked here. [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk(ast_bt_get_addresses+0x19) [0x8100bb0] [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk [0x81132ec] [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk(ast_rdlock_contexts+0x32) [0x812e540] [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk [0x811c062] [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk [0x816659b] [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk [0x8178879] [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: /lib/i686/libpthread.so.0 [0xb7af3315] [Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: /lib/i686/libc.so.6(clone+0x5e) [0xb7d6edde] === Thread ID: -1214993520 (tps_processing_function started at [ 451] taskprocessor.c ast_taskprocessor_get()) === ---> Lock #0 (pbx.c): RDLOCK 9104 ast_rdlock_contexts &conlock 0x82023c0 (1) asterisk(ast_bt_get_addresses+0x19) [0x8100bb0] asterisk [0x81132ec] asterisk(ast_rdlock_contexts+0x32) [0x812e540] asterisk [0x811b658] asterisk(ast_canmatch_extension+0x55) [0x811cff1] /usr/lib/asterisk/modules/chan_sip.so [0xb77f9250] /usr/lib/asterisk/modules/chan_sip.so [0xb7816296] /usr/lib/asterisk/modules/chan_sip.so [0xb7821801] /usr/lib/asterisk/modules/chan_sip.so [0xb7822a93] /usr/lib/asterisk/modules/chan_sip.so [0xb782221e] asterisk(ast_io_wait+0x14d) [0x80f67ad] /usr/lib/asterisk/modules/chan_sip.so [0xb7823e77] asterisk [0x8178879] /lib/i686/libpthread.so.0 [0xb7af3315] /lib/i686/libc.so.6(clone+0x5e) [0xb7d6edde] === ------------------------------------------------------------------- === Thread ID: -1215485040 (do_monitor started at [21149] chan_sip.c restart_monitor()) === ---> Lock #0 (astobj2.c): MUTEX 642 __ao2_callback c 0x83d9608 (1) asterisk(ast_bt_get_addresses+0x19) [0x8100bb0] asterisk [0x80830d4] asterisk(_ao2_lock+0x4a) [0x8082f97] asterisk [0x8084b2a] asterisk(_ao2_callback+0x56) [0x8084f1b] /usr/lib/asterisk/modules/chan_sip.so [0xb7823e36] asterisk [0x8178879] /lib/i686/libpthread.so.0 [0xb7af3315] /lib/i686/libc.so.6(clone+0x5e) [0xb7d6edde] === ------------------------------------------------------------------- === | ||
Comments: | By: pj (pj) 2009-02-05 14:10:44.000-0600 If it can help, I can notice, that when nobody has subscribed to hints, reloading ael dialplan does not cause asterisk lockout, so it locks only in case, when somebody had subscribed hints and I do 'ael reload' By: pj (pj) 2009-02-13 12:45:50.000-0600 uploading gdb.txt if it call help with solving this lock issue it's from rpid branch: Asterisk SVN-group-issue8824-r173514 but it's not related to rpid work, it can be easy reproduced from vanilla trunk asterisk By: Steve Murphy (murf) 2009-02-13 15:04:41.000-0600 I apologize for not spotting this sooner. I set up hints and could not reproduce your results. Then I re-read your description and saw the ${EXTEN} var used.... Sorry, but you can't do that! I haven't tested this, but I don't think you could even use global variables in this case either. There are some places you can use global vars, which would be substituted at load time, but I don't think this is one of them. I put in vars myself, and still I cannot reproduce this! Please attach your complete dialplan(s), both extensions.conf and extensions.ael; if you have to clean out any proprietary extensions, etc, then please verify that the 'clean' version still gives you problems. What I'd like to do is throw in a check for the hint contents, such that if they contain any ${..} constructs, you will get an ERROR. This should be helpful, I'd think! I am also interested in what particular set of circumstances lead to the lockup. I'm not seeing this problem here with my dialplan; I need yours to see what's happening; perhaps there is something more sinister going on. By: pj (pj) 2009-02-13 15:47:19.000-0600 murf, thank you for answer, but I think, using pattern match for hints is now completelly legal and supported. It was added into asterisk trunk some time ago and it's working fine, except this lock, that happens when someone had subscribed to hints (that was automatically made using pattern match) and do 'ael reload'. As I said, this lock doesn't occur, when nobody has subscribed to hints. here is sentence from CHANGES about dialplan pattern for hints: It is now possible to specify a pattern match as a hint. Once a phone subscribes to something that matches the pattern a hint will be created using the contents and variables evaluated. By: pj (pj) 2009-02-13 15:52:48.000-0600 this is my complet context for hints: context linestates { hint(SIP/${EXTEN}) _ZXX! => ; } it automatically creates this hints: -= Registered Asterisk Dial Plan Hints =- 888phone@linestates : SIP/888phone State:Idle Watchers 1 726@linestates : SIP/726 State:Idle Watchers 1 408@linestates : SIP/408 State:Idle Watchers 1 324p@linestates : SIP/324p State:Idle Watchers 1 322@linestates : SIP/322 State:Idle Watchers 1 628@linestates : SIP/628 State:Idle Watchers 0 734@linestates : SIP/734 State:Unavailable Watchers 0 743@linestates : SIP/743 State:Unavailable Watchers 1 455@linestates : SIP/455 State:Unavailable Watchers 1 324@linestates : SIP/324 State:Unavailable Watchers 1 _ZXX!@linestates : SIP/${EXTEN} State:Idle Watchers 0 dialplan show linestates [ Context 'linestates' created by 'pbx_ael' ] '322' => hint: SIP/322 [pbx_ael] '324' => hint: SIP/324 [pbx_ael] '324p' => hint: SIP/324p [pbx_ael] '408' => hint: SIP/408 [pbx_ael] '455' => hint: SIP/455 [pbx_ael] '628' => hint: SIP/628 [pbx_ael] '726' => hint: SIP/726 [pbx_ael] '734' => hint: SIP/734 [pbx_ael] '743' => hint: SIP/743 [pbx_ael] '888phone' => hint: SIP/888phone [pbx_ael] '_ZXX!' => hint: SIP/${EXTEN} [pbx_ael] ... when subscribtions are in effect... sip show subscriptions Peer User Call ID Extension Last state Type Mailbox Expiry 193.85.164.154 324p MGU4NTVlZTFiZjk 888phone@linest Idle pidf+xml <none> 000400 193.85.164.154 324p MmY4YjJmN2FhZDc 455@linestates Idle pidf+xml <none> 000400 193.85.164.154 324p MGI1MjgxNTEyZDk 726@linestates Idle pidf+xml <none> 000400 193.85.164.154 324p OTYyNGViNjRkMmY 324@linestates Idle pidf+xml <none> 000400 193.85.164.154 324p ZGE4ZjJiNWQ0YTA 324p@linestates Idle pidf+xml <none> 000400 193.85.164.154 324p Y2Y0NDQ1MDI2NzA 322@linestates Idle pidf+xml <none> 000400 193.85.164.154 324p N2EzN2MxN2ViZTQ 408@linestates Idle pidf+xml <none> 000400 193.85.164.154 324p NTY3ZjliNTIyMDY 743@linestates Idle pidf+xml <none> 000400 8 active SIP subscriptions By: Steve Murphy (murf) 2009-02-13 16:16:11.000-0600 Duh! you were so fast responding, I thought my message disappeared! By: Steve Murphy (murf) 2009-02-13 17:24:27.000-0600 OK, I stand corrected. I found the code addition that was made for this. The variable references have to be global vars. The code will not try to evaluate the ${...} in terms of a channel, where stuff like ${EXTEN} is stored. So, most likely, the ${EXTEN} will evaluate to a null string, so you'll be left with hint(SIP/). I don't know if this will give you the intended results. By: Steve Murphy (murf) 2009-02-13 17:34:47.000-0600 I correct myself (again)-- yes, I see! It will allow a few chosen variables on the dummy chan, context and extension; then do the globals. OK. I'll shuffle my dialplan and see if I can reproduce this. more later. By: Steve Murphy (murf) 2009-02-13 18:19:00.000-0600 No, I'm having trouble reproducing this. I have 4 dahdi ports, dahdi/25, dahdi/26, dahdi/27, and dahdi/28. I put this line into my dialplan, in a subcontext that is included by the context that handles when these phones go off-hook: hint(dahdi/${EXTEN}) _1ZX! => { Set(CDR(myvar)=zingo); Dial(Dahdi/${EXTEN:1},30,eTtWwKkHhXx); } and core show hints says: *CLI> core show hints -= Registered Asterisk Dial Plan Hints =- 958@notmymessage : Meetme:100&park:701& State:Idle Watchers 0 _1ZX!@dialextens : dahdi/${EXTEN} State:Idle Watchers 0 foo4@default : SIP/foo4&IAX2/foo4 State:Unavailable Watchers 0 ---------------- - 3 hints registered (The other 2 are from other parts of the dialplan). Anything I'm missing? By: pj (pj) 2009-02-14 02:56:28.000-0600 I tried again, using minimalistic extensions.ael (I haven't extensions.conf at all), here I have only one context, as I wrote above: context linestates { hint(SIP/${EXTEN}) _ZXX! => ; } I'm using eyebeam softphone to subscribe to 322, 324p, 743 etc.... now, I do 'ael reload' it causes asterisk lock, see below.. currently I'm using Asterisk SVN-group-issue8824-r173514 but same happened in the past, when using vanilla trunk, I will try again vanilla and will report again... === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: -1216672880 (do_monitor started at [21721] chan_sip.c restart_monitor()) === ---> Lock #0 (astobj2.c): MUTEX 642 __ao2_callback c 0x92797e0 (1) asterisk(ast_bt_get_addresses+0x19) [0x8104de8] asterisk [0x8083a04] asterisk(_ao2_lock+0x4a) [0x80838c7] asterisk [0x808545a] asterisk(_ao2_callback+0x56) [0x808584b] /usr/lib/asterisk/modules/chan_sip.so [0xb783e3ca] asterisk [0x817cc8d] /lib/i686/libpthread.so.0 [0xb7b10315] /lib/i686/libc.so.6(clone+0x5e) [0xb7d8f26e] === ------------------------------------------------------------------- === By: pj (pj) 2009-02-14 03:41:01.000-0600 tested right now Asterisk SVN-trunk-r175699 and can confirm, that lock happen even in case when using minimalistic extensions.ael from my example above. for completeness, here is my sip.conf template, that I'm using for my phones [phone](!) type=peer host=dynamic qualify=4000 qualifyfreq=20 nat=yes canreinvite=no disallow=all allow=g729,gsm,ilbc callcounter=yes busylevel=1 allowsubscribe=yes subscribecontext=linestates sendrpid=yes trustrpid=yes context=zamestnanci By: Steve Murphy (murf) 2009-02-17 15:36:25.000-0600 Yay! At last, I was able to reproduce this behavior! I have attached a patch for trunk that should correct the deadlocking. Please test and let me know of success/failure ASAP. Many thanks! By: pj (pj) 2009-02-18 07:37:19.000-0600 Hi, tested your patch with Asterisk SVN-trunk-r175699 and it seems, that locking issue after ael reload is gone. I haven't found any negative side effects of this patch. thanks! By: Digium Subversion (svnbot) 2009-02-18 09:35:28.000-0600 Repository: asterisk Revision: 176943 U trunk/main/pbx.c ------------------------------------------------------------------------ r176943 | murf | 2009-02-18 09:35:27 -0600 (Wed, 18 Feb 2009) | 45 lines This patch fixes merge_contexts_and_delete so it does not deadlock when hints are present. Reason: when I re-engineered the merge_and_delete func to reduce its lock time, I failed to notice that the functions it calls still also do locking as before. This leads to deadlocks on dialplan reloads, when there are actually living, subscribed hints registered in the system. While the reporter come across this problem while using AEL, I might note that these deadlocks should also happen if extensions.conf were used. Here I added these routines to pbx.c: ast_add_extension_nolock add_pri_lockopt ast_add_extension2_lockopt find_context add_hint_nolock All of the above routines are static and restricted to be used only within pbx.c, and more specifically within the merge_contexts_and_delete routine. They are pretty much the same as their counterparts except they don't lock contexts or hints. Most of them now do the real work of their name-alike, with optional locking via extra arguments, and are called by their name-alike. The goal was to have the original functions so they would behave exactly as before. Both PJ and I tested these fixes, and the deadlocking problem is no longer encountered. (closes issue ASTERISK-13473) Reported by: pj Patches: 14357.diff uploaded by murf (license 17) Tested by: pj, murf ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=176943 By: Digium Subversion (svnbot) 2009-02-18 09:49:11.000-0600 Repository: asterisk Revision: 176944 _U branches/1.6.0/ U branches/1.6.0/main/pbx.c ------------------------------------------------------------------------ r176944 | murf | 2009-02-18 09:49:11 -0600 (Wed, 18 Feb 2009) | 53 lines Merged revisions 176943 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r176943 | murf | 2009-02-18 08:35:26 -0700 (Wed, 18 Feb 2009) | 45 lines This patch fixes merge_contexts_and_delete so it does not deadlock when hints are present. Reason: when I re-engineered the merge_and_delete func to reduce its lock time, I failed to notice that the functions it calls still also do locking as before. This leads to deadlocks on dialplan reloads, when there are actually living, subscribed hints registered in the system. While the reporter come across this problem while using AEL, I might note that these deadlocks should also happen if extensions.conf were used. Here I added these routines to pbx.c: ast_add_extension_nolock add_pri_lockopt ast_add_extension2_lockopt find_context add_hint_nolock All of the above routines are static and restricted to be used only within pbx.c, and more specifically within the merge_contexts_and_delete routine. They are pretty much the same as their counterparts except they don't lock contexts or hints. Most of them now do the real work of their name-alike, with optional locking via extra arguments, and are called by their name-alike. The goal was to have the original functions so they would behave exactly as before. Both PJ and I tested these fixes, and the deadlocking problem is no longer encountered. (closes issue ASTERISK-13473) Reported by: pj Patches: 14357.diff uploaded by murf (license 17) Tested by: pj, murf ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=176944 By: Digium Subversion (svnbot) 2009-02-18 09:59:34.000-0600 Repository: asterisk Revision: 176946 _U branches/1.6.1/ U branches/1.6.1/main/pbx.c ------------------------------------------------------------------------ r176946 | murf | 2009-02-18 09:59:34 -0600 (Wed, 18 Feb 2009) | 53 lines Merged revisions 176943 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r176943 | murf | 2009-02-18 08:35:26 -0700 (Wed, 18 Feb 2009) | 45 lines This patch fixes merge_contexts_and_delete so it does not deadlock when hints are present. Reason: when I re-engineered the merge_and_delete func to reduce its lock time, I failed to notice that the functions it calls still also do locking as before. This leads to deadlocks on dialplan reloads, when there are actually living, subscribed hints registered in the system. While the reporter come across this problem while using AEL, I might note that these deadlocks should also happen if extensions.conf were used. Here I added these routines to pbx.c: ast_add_extension_nolock add_pri_lockopt ast_add_extension2_lockopt find_context add_hint_nolock All of the above routines are static and restricted to be used only within pbx.c, and more specifically within the merge_contexts_and_delete routine. They are pretty much the same as their counterparts except they don't lock contexts or hints. Most of them now do the real work of their name-alike, with optional locking via extra arguments, and are called by their name-alike. The goal was to have the original functions so they would behave exactly as before. Both PJ and I tested these fixes, and the deadlocking problem is no longer encountered. (closes issue ASTERISK-13473) Reported by: pj Patches: 14357.diff uploaded by murf (license 17) Tested by: pj, murf ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=176946 |