Summary:ASTERISK-15574: [patch] Deadlock between handle_request_do and do_devstate_changes
Reporter:Laurent Steffan (lmsteffan)Labels:
Date Opened:2010-02-03 20:47:56.000-0600Date Closed:2010-04-06 09:43:55
Versions:Frequency of
Environment:Attachments:( 0) bt.output
( 1) bug_lock2.txt
( 2) bug_lock3.txt
( 3) bugMar17.txt
( 4) deadlock_16767
( 5) deadlock_16767v2.diff
( 6) deadlock_16767v3.diff
( 7) lock_error_report.txt
( 8) traceMar17.txt
Description:title says it all :-)


see attached report, taken when Asterisk was frozen (I had to forcibly kill Asterisk, "core restart now" did not work anymore).
Comments:By: David Vossel (dvossel) 2010-02-19 15:42:06.000-0600

I uploaded a patch.  Please test it and verify everything continues to work as expected with no deadlocks.

By: Laurent Steffan (lmsteffan) 2010-02-21 16:29:22.000-0600

Will test it on our production server as soon as I can stop it.  Thanks.

By: Laurent Steffan (lmsteffan) 2010-02-23 23:06:46.000-0600

After applying your patch Asterisk worked for one day and a half, then was subjected to another deadlock. I include the result of "core show locks" (after purging the non-relevant parts). The deadlock is obvious between the first and the fourth thread. One funny thing is that the fourth thread is said to be waiting for lock#0  but has gone on to seize other locks, including those that create the deadlock.

I can't be sure that this deadlock has the same cause as the previous one, but to my untrained eye it sure looks like it could... If that were not the case I can submit another bug report.

By: David Vossel (dvossel) 2010-02-24 14:24:11.000-0600

alright, well that does look odd, but I believe this is the same deadlock you encountered earlier.

I took by best shot at trying to guess what the problem could based upon your earlier information and that did not appear to work.  Lets try something different.  Next time this deadlock occurs I need you to gather some gdb information for me.

Run 'make menuselect' and make sure you are compiling with DON'T OPTIMIZE enabled under compile flags.  Recompile if necessary, then the next time this deadlock occurs attach gdb to the process and give me the "thread apply all bt" and "thread apply all bt full" output.  You can attach gdb to asterisk using gdb `pidof asterisk`

Without the gdb output I am just guessing at this point.

By: Laurent Steffan (lmsteffan) 2010-02-25 00:37:55.000-0600

I am afraid this will be difficult as it's our main (production) server and it's running under fairly heavy load, so I'm not too enthusiastic about installing gdb, setting DON'T OPTIMIZE (but I guess I will anyway), and above all interrupting all traffic while performing the gdb commands. Is there some set of options (under the compile flags) that I could use to dump core and *then* perform the gdb backtrace ? I think that once I saw options like "crash on error" or something but never ventured to try them...

By: David Vossel (dvossel) 2010-02-25 10:45:28.000-0600

You can dump the core by running asterisk with the -g option. The problem here is that a deadlock probably won't cause a crash and dump the core.  I believe something like gcore will do what you are wanting though.  From what I understand, gcore can be used to dump the core of any pid.

Let me know if there is anything else I can do to help.

By: Laurent Steffan (lmsteffan) 2010-03-08 14:35:16.000-0600

Here is a backtrace and the associated "show locks" output, obtained according to your indications. gcore does indeed work as you mentioned. Not sure whether it's the same bug, but it looks related.

Thanks for your help.

By: Laurent Steffan (lmsteffan) 2010-03-16 19:59:11

Yet another backtrace corresponding perhaps more closely to the original bug. Your remark about gcore has indeed been of a great help!

By: David Vossel (dvossel) 2010-03-17 10:01:44

awesome, sorry I haven't gotten to look at this yet.  Thank you for your feedback, I'll analyze it soon!

By: David Vossel (dvossel) 2010-04-02 17:57:12

I uploaded a patch that should resolve the issue. Let me know if the same deadlock occurs for you.

By: Laurent Steffan (lmsteffan) 2010-04-05 19:25:01

Okay, I'll push that on our server as soon as I can and I'll give you the feedback. Thanks!

By: David Vossel (dvossel) 2010-04-06 09:39:48

This issue is about to be closed.  If you have any more trouble with this please feel free to reopen.

By: Digium Subversion (svnbot) 2010-04-06 09:42:14

Repository: asterisk
Revision: 256319

U   trunk/channels/chan_sip.c

r256319 | dvossel | 2010-04-06 09:42:12 -0500 (Tue, 06 Apr 2010) | 8 lines

fixes deadlock in chan_sip caused by usage of MASTER_CHANNEL dialplan function

(closes issue ASTERISK-15574)
Reported by: lmsteffan
     deadlock_16767v3.diff uploaded by dvossel (license 671)



By: Digium Subversion (svnbot) 2010-04-06 09:43:53

Repository: asterisk
Revision: 256319

U   trunk/channels/chan_sip.c

r256319 | dvossel | 2010-04-06 09:42:10 -0500 (Tue, 06 Apr 2010) | 9 lines

fixes deadlock in chan_sip caused by usage of MASTER_CHANNEL dialplan function

(closes issue ASTERISK-15574)
Reported by: lmsteffan
     deadlock_16767v3.diff uploaded by dvossel (license 671)

Review: https://reviewboard.asterisk.org/r/606/