[Home]

Summary:ASTERISK-16631: Random Deadlocks maybe in find_call() of chan_sip. or ast_call() of channel.c
Reporter:Tan Tuerel (thsgmbh)Labels:
Date Opened:2010-08-31 02:38:51Date Closed:2011-07-27 13:23:38
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk_backtrace_25.01.2011.txt
( 1) Asterisk_deadlock_25.01.2011_2.txt
( 2) asterisk_deadlock_26.01.2011.txt
( 3) Backtrace_08.09.2010.txt
( 4) Backtrace_09.09.2010.txt
( 5) backtrace1.txt
( 6) backtrace2.txt
( 7) Backtrace_21.09.2010.txt
( 8) backtrace3_with_gdb.txt
( 9) backtrace4.txt
(10) debug_channel_locks_example.txt
(11) test.call
Description:I get randomly deadlocks and i'm not sure, what's the reason for that!?!

It is reproducible with the attached test call file, which calls a ring group with 5-10 extensions. The issue does not occur directly after restarting asterisk but when there were at least some calls.

Maybe there are two different issues!?!
(backtrace1.txt after random daily deadlock, backtrace2-4.txt after executing call file)

I have a feeling that the "notifycid=ignore-context" setting is responsible for that!? All the extensions in the ring group have subscriptions and generate dialoginfos for snom phones...

Backtrace debug files and test call file attached...
Comments:By: Stefan Schmidt (schmidts) 2010-08-31 06:51:44

i´ve seen several locks like this on my tests. how many sip user and hints do you have on your system?
also a "sip show sched" would be interesting to see how many events are in the scheduler queue.

By: Tan Tuerel (thsgmbh) 2010-08-31 07:09:22

hi schmidts, here are the values:

- 137 sip peers [Monitored: 70 online, 8 offline Unmonitored: 59 online, 0 offline]
- 332 hints registered
- 277 active SIP subscriptions (mostly BLF on snom phones)

-----------------------------------
Uptime: 16 hours

sv-asterisk*CLI> sip show sched

Highwater = 680
schedcnt = 487
   retrans_pkt : 0
   __sip_autodestruct : 277
   expire_register : 128
   auto_congest : 0
   sip_reg_timeout : 0
   sip_poke_peer_s : 70
   sip_poke_noanswer : 0
   sip_reregister : 0
   sip_reinvite_retry : 0
  <unknown> : 12

By: Stefan Schmidt (schmidts) 2010-08-31 07:34:28

i see you only have 70 online peers but there are 128 register expire events, so maybe your registration timeout on the phones is lower than your timeout in sip.conf this could cause some problems.
also there are many autodestruct message in the scheduler but this shouldnt be the problem.
how many sip channels are open on your system?
which revision are you using?

By: Tan Tuerel (thsgmbh) 2010-08-31 08:08:42

No, that's correct... I have 70 monitored (qualify=yes) and 59 unmonitored peers.

Every snom phone has 2 registrations, one for normal use and one ONLY for paging/intercom with no subscriptions... I had to configure it that way because there was too much traffic/load on the asterisk box while performing a group paging to all extensions (that generates too many dialoginfo-notifys for every BLF subscription).

What generates the autodestruct messages?

What do you mean with "how many open sip channels"? There are normally only 1-5 concurrent calls on our system...

Asterisk SVN-branch-1.6.2-r284002

By: Tan Tuerel (thsgmbh) 2010-09-08 02:31:39

I tried the SIP setting "pedantic=yes" and since that no more random deadlocks!??
(not the deadlocks caused by call file)

-----------

Update: I got a deadlock again today... see attached Backtrace file...



By: Tan Tuerel (thsgmbh) 2010-09-09 04:43:15

And again a deadlock... (uploaded additional backtrace file)

With "core set debug 15" you can see thousands of channel locks and unlocks before the deadlock occurs, it seems like an infinite loop?!

What can cause the many channel locks??

By: Stefan Schmidt (schmidts) 2010-09-09 04:58:19

thats normal asterisk locking behavior. you would see this on a high debug level if you set the debug_locks option in make menuselect but thats not your problem.

it looks like the problem which i has discovered of slow sip handling but i never get a deadlock out of this.

maybe you should try this two patches from reviewboard:
https://reviewboard.asterisk.org/r/901/ (speed up subscribe handling)
https://reviewboard.asterisk.org/r/825/ (speed up registration)

this would speed up sip handling maybe enough so these lockings will not occur any more.

By: Tan Tuerel (thsgmbh) 2010-09-09 06:37:56

I will give it a try and report...

By: Tan Tuerel (thsgmbh) 2010-09-22 09:15:32

The performance tuning with the above mentioned patches did not solve my problem...

I've attached a new backtrace with gdb...

By: Stefan Schmidt (schmidts) 2010-09-22 12:48:34

i see on your latest backtrace that the macro macro-auto-blkvm locks something (THREAD 2 nearly at bottom of file).
maybe this is the problem.

By: Tan Tuerel (thsgmbh) 2010-09-24 01:55:23

It's a simple freePbx macro which only set a channel variable and delete a AstDB Key!?!

---------------------------
[macro-auto-blkvm]
include => macro-auto-blkvm-custom
exten => s,1,Set(__MACRO_RESULT=)
exten => s,n,Noop(Deleting: ${BLKVM_OVERRIDE} ${DB_DELETE(${BLKVM_OVERRIDE})})
---------------------------

I can't believe that this is the cause for the deadlocks?!

By: Stefan Schmidt (schmidts) 2010-09-24 02:23:50

the astdb del could be the cause of the lock!
have a look at https://reviewboard.asterisk.org/r/825/
depending how many entries you have in astdb it tooks really long to wait until its on the disk. maybe you could try this patch from above if this solves your issue.

By: Tan Tuerel (thsgmbh) 2010-09-24 06:25:45

I've already applied this patch (some weeks ago) to limit the db sync to two times a second... and anyway, we only have 2856 entries in our ast db!!

By: Russell Bryant (russell) 2011-07-27 13:23:32.540-0500

Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.4 and 1.6.x branches has ended. For continued maintenance support please move to the 1.8 branch which is a long term support (LTS) branch. For more information about branch support, please see https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions

If this is still an issue, please open a new issue so it can be re-triaged appropriately. Thanks!