[Home]

Summary:ASTERISK-13473: [patch] lockout after AEL reload
Reporter:pj (pj)Labels:
Date Opened:2009-01-28 13:01:29.000-0600Date Closed:2009-02-18 09:59:35.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:PBX/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 14357.diff
( 1) gdb.txt
Description:My asterisk get locked, after AEL reload,
It seems, that it happen, if I use pattern match for hints in dialplan, like:
       hint(SIP/${EXTEN}) _ZXX! => NoOP;

CLI console output with some errors before lock happen is attached,
also 'core show locks' output is attached




****** ADDITIONAL INFORMATION ******

[Jan 28 19:52:42] ERROR[5265]: /root/src/asterisk170835/include/asterisk/lock.h:1236 _ast_rwlock_rdlock: pbx.c line 5236 (handle_show_hints): Error obtaining read lock: Resource temporarily unavailable
[Jan 28 19:52:42] ERROR[5265]: /root/src/asterisk170835/include/asterisk/lock.h:1105 _ast_rwlock_unlock: pbx.c line 5258 (handle_show_hints): rwlock '&(&hints)->lock' freed more times than we've locked!
[Jan 28 19:52:43] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1236 _ast_rwlock_rdlock: pbx.c line 9104 (ast_rdlock_contexts): Error obtaining read lock: Resource temporarily unavailable
[Jan 28 19:52:43] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1105 _ast_rwlock_unlock: pbx.c line 9109 (ast_unlock_contexts): rwlock '&conlock' freed more times than we've locked!



[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1236 _ast_rwlock_rdlock: pbx.c line 9104 (ast_rdlock_contexts): Error obtaining read lock: Resource temporarily unavailable
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1093 _ast_rwlock_unlock: pbx.c line 9109 (ast_unlock_contexts): attempted unlock rwlock '&conlock' without owning it!
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:1095 _ast_rwlock_unlock: pbx.c line 9104 (ast_rdlock_contexts): '&conlock' was last locked here.
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk(ast_bt_get_addresses+0x19) [0x8100bb0]
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk [0x81132ec]
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk(ast_rdlock_contexts+0x32) [0x812e540]
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk [0x811c062]
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk [0x816659b]
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: asterisk [0x8178879]
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: /lib/i686/libpthread.so.0 [0xb7af3315]
[Jan 28 19:52:58] ERROR[5288]: /root/src/asterisk170835/include/asterisk/lock.h:230 __dump_backtrace: /lib/i686/libc.so.6(clone+0x5e) [0xb7d6edde]





=== Thread ID: -1214993520 (tps_processing_function started at [  451] taskprocessor.c ast_taskprocessor_get())
=== ---> Lock #0 (pbx.c): RDLOCK 9104 ast_rdlock_contexts &conlock 0x82023c0 (1)
       asterisk(ast_bt_get_addresses+0x19) [0x8100bb0]
       asterisk [0x81132ec]
       asterisk(ast_rdlock_contexts+0x32) [0x812e540]
       asterisk [0x811b658]
       asterisk(ast_canmatch_extension+0x55) [0x811cff1]
       /usr/lib/asterisk/modules/chan_sip.so [0xb77f9250]
       /usr/lib/asterisk/modules/chan_sip.so [0xb7816296]
       /usr/lib/asterisk/modules/chan_sip.so [0xb7821801]
       /usr/lib/asterisk/modules/chan_sip.so [0xb7822a93]
       /usr/lib/asterisk/modules/chan_sip.so [0xb782221e]
       asterisk(ast_io_wait+0x14d) [0x80f67ad]
       /usr/lib/asterisk/modules/chan_sip.so [0xb7823e77]
       asterisk [0x8178879]
       /lib/i686/libpthread.so.0 [0xb7af3315]
       /lib/i686/libc.so.6(clone+0x5e) [0xb7d6edde]
=== -------------------------------------------------------------------


=== Thread ID: -1215485040 (do_monitor           started at [21149] chan_sip.c restart_monitor())
=== ---> Lock #0 (astobj2.c): MUTEX 642 __ao2_callback c 0x83d9608 (1)
       asterisk(ast_bt_get_addresses+0x19) [0x8100bb0]
       asterisk [0x80830d4]
       asterisk(_ao2_lock+0x4a) [0x8082f97]
       asterisk [0x8084b2a]
       asterisk(_ao2_callback+0x56) [0x8084f1b]
       /usr/lib/asterisk/modules/chan_sip.so [0xb7823e36]
       asterisk [0x8178879]
       /lib/i686/libpthread.so.0 [0xb7af3315]
       /lib/i686/libc.so.6(clone+0x5e) [0xb7d6edde]
=== -------------------------------------------------------------------
===

Comments:By: pj (pj) 2009-02-05 14:10:44.000-0600

If it can help, I can notice, that when nobody has subscribed to hints, reloading ael dialplan does not cause asterisk lockout,
so it locks only in case, when somebody had subscribed hints and I do 'ael reload'



By: pj (pj) 2009-02-13 12:45:50.000-0600

uploading gdb.txt if it call help with solving this lock issue
it's from rpid branch: Asterisk SVN-group-issue8824-r173514
but it's not related to rpid work,
it can be easy reproduced from vanilla trunk asterisk

By: Steve Murphy (murf) 2009-02-13 15:04:41.000-0600

I apologize for not spotting this sooner. I set up hints and could not reproduce your results. Then I re-read your description and saw the ${EXTEN} var used....
Sorry, but you can't do that!

I haven't tested this, but I don't think you could even use global variables in this case either. There are some places you can use global vars, which would be substituted at load time, but I don't think this is one of them.

I put in vars myself, and still I cannot reproduce this!  Please attach your complete dialplan(s), both extensions.conf and extensions.ael; if you have to clean out any proprietary extensions, etc, then please verify that the 'clean' version still gives you problems. What I'd like to do is throw in a check for the hint contents, such that if they contain any ${..} constructs, you will get an ERROR. This should be helpful, I'd think! I am also interested in what particular set of circumstances lead to the lockup. I'm not seeing this problem here with my dialplan; I need yours to see what's happening; perhaps there is something more sinister going on.



By: pj (pj) 2009-02-13 15:47:19.000-0600

murf, thank you for answer, but I think, using pattern match for hints is now completelly legal and supported. It was added into asterisk trunk some time ago and it's working fine, except this lock, that happens when someone had subscribed to hints (that was automatically made using pattern match) and do 'ael reload'. As I said, this lock doesn't occur, when nobody has subscribed to hints.

here is sentence from CHANGES about dialplan pattern for hints:
It is now possible to specify a pattern match as a hint. Once a phone subscribes to something that matches the pattern a hint will be created using the contents and variables evaluated.

By: pj (pj) 2009-02-13 15:52:48.000-0600

this is my complet context for hints:

context linestates {
       hint(SIP/${EXTEN}) _ZXX! => ;
}

it automatically creates this hints:

   -= Registered Asterisk Dial Plan Hints =-
              888phone@linestates          : SIP/888phone          State:Idle            Watchers  1
                   726@linestates          : SIP/726               State:Idle            Watchers  1
                   408@linestates          : SIP/408               State:Idle            Watchers  1
                  324p@linestates          : SIP/324p              State:Idle            Watchers  1
                   322@linestates          : SIP/322               State:Idle            Watchers  1
                   628@linestates          : SIP/628               State:Idle            Watchers  0
                   734@linestates          : SIP/734               State:Unavailable     Watchers  0
                   743@linestates          : SIP/743               State:Unavailable     Watchers  1
                   455@linestates          : SIP/455               State:Unavailable     Watchers  1
                   324@linestates          : SIP/324               State:Unavailable     Watchers  1
                 _ZXX!@linestates          : SIP/${EXTEN}          State:Idle            Watchers  0


dialplan show linestates
[ Context 'linestates' created by 'pbx_ael' ]
 '322' =>          hint: SIP/322                                 [pbx_ael]
 '324' =>          hint: SIP/324                                 [pbx_ael]
 '324p' =>         hint: SIP/324p                                [pbx_ael]
 '408' =>          hint: SIP/408                                 [pbx_ael]
 '455' =>          hint: SIP/455                                 [pbx_ael]
 '628' =>          hint: SIP/628                                 [pbx_ael]
 '726' =>          hint: SIP/726                                 [pbx_ael]
 '734' =>          hint: SIP/734                                 [pbx_ael]
 '743' =>          hint: SIP/743                                 [pbx_ael]
 '888phone' =>     hint: SIP/888phone                            [pbx_ael]
 '_ZXX!' =>        hint: SIP/${EXTEN}                            [pbx_ael]


... when subscribtions are in effect...

sip show subscriptions
Peer             User        Call ID          Extension        Last state     Type            Mailbox    Expiry
193.85.164.154   324p        MGU4NTVlZTFiZjk  888phone@linest  Idle           pidf+xml        <none>     000400
193.85.164.154   324p        MmY4YjJmN2FhZDc  455@linestates   Idle           pidf+xml        <none>     000400
193.85.164.154   324p        MGI1MjgxNTEyZDk  726@linestates   Idle           pidf+xml        <none>     000400
193.85.164.154   324p        OTYyNGViNjRkMmY  324@linestates   Idle           pidf+xml        <none>     000400
193.85.164.154   324p        ZGE4ZjJiNWQ0YTA  324p@linestates  Idle           pidf+xml        <none>     000400
193.85.164.154   324p        Y2Y0NDQ1MDI2NzA  322@linestates   Idle           pidf+xml        <none>     000400
193.85.164.154   324p        N2EzN2MxN2ViZTQ  408@linestates   Idle           pidf+xml        <none>     000400
193.85.164.154   324p        NTY3ZjliNTIyMDY  743@linestates   Idle           pidf+xml        <none>     000400
8 active SIP subscriptions

By: Steve Murphy (murf) 2009-02-13 16:16:11.000-0600

Duh! you were so fast responding, I thought my message disappeared!

By: Steve Murphy (murf) 2009-02-13 17:24:27.000-0600

OK, I stand corrected.

I found the code addition that was made for this. The variable references have to be global vars. The code will not try to evaluate the ${...} in terms of a channel, where stuff like ${EXTEN} is stored. So, most likely, the ${EXTEN} will evaluate to a null string, so you'll be left with hint(SIP/). I don't know if
this will give you the intended results.

By: Steve Murphy (murf) 2009-02-13 17:34:47.000-0600

I correct myself (again)-- yes, I see! It will allow a few chosen variables on the dummy chan, context and extension; then do the globals. OK.

I'll shuffle my dialplan and see if I can reproduce this. more later.

By: Steve Murphy (murf) 2009-02-13 18:19:00.000-0600

No, I'm having trouble reproducing this. I have 4 dahdi ports, dahdi/25, dahdi/26,
dahdi/27, and dahdi/28. I put this line into my dialplan, in a subcontext that is included by the context that handles when these phones go off-hook:

hint(dahdi/${EXTEN}) _1ZX! =>
          {
            Set(CDR(myvar)=zingo);    
            Dial(Dahdi/${EXTEN:1},30,eTtWwKkHhXx);
          }

and core show hints says:

*CLI> core show hints

   -= Registered Asterisk Dial Plan Hints =-
 958@notmymessage        : Meetme:100&park:701&  State:Idle            Watchers  0
_1ZX!@dialextens          : dahdi/${EXTEN}        State:Idle            Watchers  0
foo4@default             : SIP/foo4&IAX2/foo4    State:Unavailable     Watchers  0
----------------
- 3 hints registered


(The other 2 are from other parts of the dialplan).

Anything I'm missing?

By: pj (pj) 2009-02-14 02:56:28.000-0600

I tried again, using minimalistic extensions.ael (I haven't extensions.conf at all), here I have only one context, as I wrote above:
context linestates {
       hint(SIP/${EXTEN}) _ZXX! => ;
}

I'm using eyebeam softphone to subscribe to 322, 324p, 743 etc....
now, I do 'ael reload' it causes asterisk lock, see below..
currently I'm using Asterisk SVN-group-issue8824-r173514
but same happened in the past, when using vanilla trunk, I will try again vanilla and will report again...


=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1216672880 (do_monitor           started at [21721] chan_sip.c restart_monitor())
=== ---> Lock #0 (astobj2.c): MUTEX 642 __ao2_callback c 0x92797e0 (1)
       asterisk(ast_bt_get_addresses+0x19) [0x8104de8]
       asterisk [0x8083a04]
       asterisk(_ao2_lock+0x4a) [0x80838c7]
       asterisk [0x808545a]
       asterisk(_ao2_callback+0x56) [0x808584b]
       /usr/lib/asterisk/modules/chan_sip.so [0xb783e3ca]
       asterisk [0x817cc8d]
       /lib/i686/libpthread.so.0 [0xb7b10315]
       /lib/i686/libc.so.6(clone+0x5e) [0xb7d8f26e]
=== -------------------------------------------------------------------
===

By: pj (pj) 2009-02-14 03:41:01.000-0600

tested right now Asterisk SVN-trunk-r175699 and can confirm, that lock happen even in case when using minimalistic extensions.ael from my example above.
for completeness, here is my sip.conf template, that I'm using for my phones

[phone](!)
type=peer
host=dynamic
qualify=4000
qualifyfreq=20
nat=yes
canreinvite=no
disallow=all
allow=g729,gsm,ilbc
callcounter=yes
busylevel=1
allowsubscribe=yes
subscribecontext=linestates
sendrpid=yes
trustrpid=yes
context=zamestnanci

By: Steve Murphy (murf) 2009-02-17 15:36:25.000-0600

Yay! At last, I was able to reproduce this behavior!

I have attached a patch for trunk that should correct the deadlocking.

Please test and let me know of success/failure ASAP.

Many thanks!

By: pj (pj) 2009-02-18 07:37:19.000-0600

Hi, tested your patch with Asterisk SVN-trunk-r175699 and it seems, that locking issue after ael reload is gone. I haven't found any negative side effects of this patch. thanks!

By: Digium Subversion (svnbot) 2009-02-18 09:35:28.000-0600

Repository: asterisk
Revision: 176943

U   trunk/main/pbx.c

------------------------------------------------------------------------
r176943 | murf | 2009-02-18 09:35:27 -0600 (Wed, 18 Feb 2009) | 45 lines


This patch fixes merge_contexts_and_delete so it does not deadlock when hints are present.

Reason: when I re-engineered the merge_and_delete func to
reduce its lock time, I failed to notice that the
functions it calls still also do locking as before.
This leads to deadlocks on dialplan reloads, when
there are actually living, subscribed hints registered
in the system.

While the reporter come across this problem while using
AEL, I might note that these deadlocks should also happen
if extensions.conf were used.

Here I added these routines to pbx.c:

ast_add_extension_nolock
add_pri_lockopt
ast_add_extension2_lockopt
find_context
add_hint_nolock

All of the above routines are static and restricted
to be used only within pbx.c, and more specifically
within the merge_contexts_and_delete routine.

They are pretty much the same as their counterparts
except they don't lock contexts or hints.

Most of them now do the real work of their
name-alike, with optional locking via extra arguments,
and are called by their name-alike. The goal was to
have the original functions so they would behave
exactly as before.

Both PJ and I tested these fixes, and the deadlocking
problem is no longer encountered.

(closes issue ASTERISK-13473)
Reported by: pj
Patches:
     14357.diff uploaded by murf (license 17)
Tested by: pj, murf


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=176943

By: Digium Subversion (svnbot) 2009-02-18 09:49:11.000-0600

Repository: asterisk
Revision: 176944

_U  branches/1.6.0/
U   branches/1.6.0/main/pbx.c

------------------------------------------------------------------------
r176944 | murf | 2009-02-18 09:49:11 -0600 (Wed, 18 Feb 2009) | 53 lines

Merged revisions 176943 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
r176943 | murf | 2009-02-18 08:35:26 -0700 (Wed, 18 Feb 2009) | 45 lines


This patch fixes merge_contexts_and_delete so it does not deadlock when hints are present.

Reason: when I re-engineered the merge_and_delete func to
reduce its lock time, I failed to notice that the
functions it calls still also do locking as before.
This leads to deadlocks on dialplan reloads, when
there are actually living, subscribed hints registered
in the system.

While the reporter come across this problem while using
AEL, I might note that these deadlocks should also happen
if extensions.conf were used.

Here I added these routines to pbx.c:

ast_add_extension_nolock
add_pri_lockopt
ast_add_extension2_lockopt
find_context
add_hint_nolock

All of the above routines are static and restricted
to be used only within pbx.c, and more specifically
within the merge_contexts_and_delete routine.

They are pretty much the same as their counterparts
except they don't lock contexts or hints.

Most of them now do the real work of their
name-alike, with optional locking via extra arguments,
and are called by their name-alike. The goal was to
have the original functions so they would behave
exactly as before.

Both PJ and I tested these fixes, and the deadlocking
problem is no longer encountered.

(closes issue ASTERISK-13473)
Reported by: pj
Patches:
     14357.diff uploaded by murf (license 17)
Tested by: pj, murf


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=176944

By: Digium Subversion (svnbot) 2009-02-18 09:59:34.000-0600

Repository: asterisk
Revision: 176946

_U  branches/1.6.1/
U   branches/1.6.1/main/pbx.c

------------------------------------------------------------------------
r176946 | murf | 2009-02-18 09:59:34 -0600 (Wed, 18 Feb 2009) | 53 lines

Merged revisions 176943 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
r176943 | murf | 2009-02-18 08:35:26 -0700 (Wed, 18 Feb 2009) | 45 lines


This patch fixes merge_contexts_and_delete so it does not deadlock when hints are present.

Reason: when I re-engineered the merge_and_delete func to
reduce its lock time, I failed to notice that the
functions it calls still also do locking as before.
This leads to deadlocks on dialplan reloads, when
there are actually living, subscribed hints registered
in the system.

While the reporter come across this problem while using
AEL, I might note that these deadlocks should also happen
if extensions.conf were used.

Here I added these routines to pbx.c:

ast_add_extension_nolock
add_pri_lockopt
ast_add_extension2_lockopt
find_context
add_hint_nolock

All of the above routines are static and restricted
to be used only within pbx.c, and more specifically
within the merge_contexts_and_delete routine.

They are pretty much the same as their counterparts
except they don't lock contexts or hints.

Most of them now do the real work of their
name-alike, with optional locking via extra arguments,
and are called by their name-alike. The goal was to
have the original functions so they would behave
exactly as before.

Both PJ and I tested these fixes, and the deadlocking
problem is no longer encountered.

(closes issue ASTERISK-13473)
Reported by: pj
Patches:
     14357.diff uploaded by murf (license 17)
Tested by: pj, murf


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=176946