[Home]

Summary:ASTERISK-17802: SIP stack stops working if a Dial command if forwarded by a SIP physical phone
Reporter:Tiziano Martelli (tiziano)Labels:
Date Opened:2011-05-05 07:49:17Date Closed:2011-09-22 07:27:41
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_dial
Versions:1.8.3 Frequency of
Occurrence
Related
Issues:
is the original version of this clone:ASTERISK-19611 SIP stack stops working (deadlock?) if a call to a snom phone is redirected by "302 Moved temporarily" to chan_local
Environment:Attachments:( 0) backtrace-threads.201105300920.txt
( 1) backtrace-threads.201105300933.txt
( 2) backtrace-threads.txt
( 3) core-show-locks.201105300920.txt
( 4) core-show-locks.201105300933.txt
( 5) core-show-locks.txt
Description:If you enable the call forward feature on a registered SIP phone  - I've tried with two different SNOM models a a Cisco/Linksys SPA9xx - when a call reach the phone you'll see (on asterisk CLI):

Got SIP response 302 "Moved Temporarily" back from (PHONE IP:PORT)
Now forwarding SIP/(CHANNEL) to 'Local/....@(CONTEXT)' (thanks to SIP/(FORWARDING PHONE CHANNEL)
Not accepting call completion offers from call-forward recipient Local/...@....

then all SIP activity ceased, both incoming and outgoing calls, while asterisk remains up and running

****** ADDITIONAL INFORMATION ******

While looking to the app_dial code to try to understand the problem, I posted this because of the seriousness of the problem.
I think that the last message is of no matter - it will always be printed - and the the problems raises from a SIP channel deadlock: immediately following that line of code, there is a lot "locking code" and the following comment:

* We must unlock c before calling ast_channel_redirecting_macro, because
* we put c into autoservice there. That is pretty much a guaranteed
* deadlock. This is why the handling of c's lock may seem a bit unusual here.

THe problems comes out on a production machine on which I couldn't recompile asterisk with debug options enabled. I hope that someone could help me while I try to set up a "laboratory machine" to debug code.
Comments:By: Leif Madsen (lmadsen) 2011-05-05 10:54:23

I don't think anything can be done here until more information is provided. Once you can reproduce this on your development/testing machine, please be sure to provide the backtrace, 'core show locks' data, and the SIP trace from the Asterisk console, along withe DEBUG level logging.  Thanks!

By: Tiziano Martelli (tiziano) 2011-05-24 07:32:40

Leif, while setting up a system for generating backtrace and so, I've stumbled upon a very strange (for me) thing:
normally, when I compile some C code without optimization, the result executable(s) is bigger then the one with optimization enabled. So, imagine my surprise when, compiling asterisk 1.8.4 with DONT_OPTIMIZE flag selected i got an executable of 20367186 bytes while the regular compilation gives an executable of 23110097 bytes (this is thue for about all modules, not only for asterisk executable). I understand that this is not the same issue, but I think that the question is important because, if my development environment is "not sane", we have nothing to talk about. So, please, could you or someone confirm that this behavior (DONT_OPTIMZE executable is thinner than regular one) is expected?
Thanks

By: Walter Doekes (wdoekes) 2011-05-24 08:51:11

This is expected: optimization is usually speed optimization, not space optimization. Therefore more code is used to achieve the same thing, but faster.

Maybe you're thinking of common DEBUG modes where extra debugging code is included? Or perhaps you're thinking of the process of stripping symbols (which isn't done).

$ du -sh main/masterisk main/asterisk.dont_optimize
22M main/asterisk
19M main/asterisk.dont_optimize



By: Tiziano Martelli (tiziano) 2011-05-24 17:46:53

Thanks Walter (I guessed your name, excuse me if wrong). Here I am:
- production machine 2 x E5640 (QuadCore HyperThread = 16 core), 16GB RAM
- linux openSUSE 11.3 (kernel 2.6.34.8-0.2-pae)
- asterisk 1.8.4 (in the meantime I've upgraded)
- I call the extension 564 that "dial(SIP/101)" on which I've setup a "Call Forward" to (existing) extension 554.
From asterisk CLI:
---------------------------------------------------------------------------
-- Executing [564@nazionali:1] dial("SIP/eos-00000003", "SIP/101")
== Using SIP RTP CoS mark 5
-- Called 101
-- Got SIP response 302 "Moved Temporarily" back from 192.168.7.20:5060
-- Now forwarding SIP/eos-00000003 to 'Local/554@mobili' (thanks to SIP/101-00000004)
[May 24 23:59:14] NOTICE[17395]: app_dial.c:844 do_forward: Not accepting call completion offers from call-forward recipient Local/554@mobili-c425;1
---------------------------------------------------------------------------
the SIP device connected to extension 554 never rings and all SIP activity "stuck". If, for example, I try a "core show channels" from CLI, I got two lines output (my call) but the "N calls processed" line never comes out; if I try a "core stop now" simply nothing happens. The only way to restart processing is "killproc /usr/sbin/asterisk" then restart asterik himself.
I've attached the "core show locks" output and the backtrace taken immediately after lock.

By: Walter Doekes (wdoekes) 2011-05-25 01:10:45

ASTERISK-17871 and thereby ASTERISK-17414 look related.

Could you try this patch?
http://svnview.digium.com/svn/asterisk/branches/1.8/channels/chan_local.c?r1=315446&r2=315445&pathrev=315446

(commit from ASTERISK-17414 for asterisk 1.8)

By: Tiziano Martelli (tiziano) 2011-05-25 09:30:33

I've compiled the suggested patch and it worked for me. Because it's on a production machine, I've got just one chance to make just one trial, but before the patch every redirect caused the SIP stack to lock, so I think the problem is solved. I've re-enabled the use of the "Call forward" feature to all peers so, if you let the issue open for some days, I could give a finer report

By: Tiziano Martelli (tiziano) 2011-05-30 02:48:07

I can confirm that the Local channel fix solved the SIP Call forward problem.
Now, I've another problem to submit. Because I think it's related to SIP deadlocks, I'll expose it here and, if you think it's a new issue, I could open it. On this particular machine (the one I've already talked about) there are 231 SIP peers registered: 4 are SIP thunks, all the others are SIP peers (phones).
As you can imagine, there are a lot of SIP transfer operations every day, both blind and attended. The first asterisk release I've installed on this machine is 1.8.2.2 (it does not have neither the blind-deadlock patch for the 18403 issue nor the attended-transfer patch for the 18837 issue). The machine deadlocked lot of times every day (more then 20) so, when 1.8.3.3 (patched for issue 18403) comes out, I've upgraded and the numbers of deadlocks lowers to about 10 every day. Now I've upgraded again to 1.8.4 (patched for issue 18837) and, despite all my expectations, there are still three or four SIP deadlocks every day. I've attached both the "core show locks" and the "backtrace threads" outputs of the two deadlock of this morning. Could you help me?

By: Walter Doekes (wdoekes) 2011-05-30 03:17:01

See these for your other issue:
https://issues.asterisk.org/view.php?id=19266
https://issues.asterisk.org/view.php?id=19304

By: Tiziano Martelli (tiziano) 2011-06-07 09:44:22.819-0500

As I already said in my previous comment, the "mail title" (SIP stack deadlock after a Call Forward) as been solved.
Now I can confirm that also the remaining random deadlocks are solved: asterisk is working continuously from 7 days and 4/5 time a days, instead of the deadlock, I found the following messages in logs:
res_timing_timerfd.c: Call to timerfd_timer_ack() with disarmed timer - break now to avoiding deadlock!
without any other consequence. Just one question before closing the issue: the message comes out with and ERROR banner, does this means that something happens to - for example - the current call?