[Home]

Summary:ASTERISK-16156: [patch] DEADLOCK_AVOIDANCE can actually generate dealocks
Reporter:Peter Fern (pdf)Labels:
Date Opened:2010-05-27 00:23:44Date Closed:2010-07-02 21:43:08
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20100527__issue17407.diff.txt
( 1) fix_deadlockavoidance-1.4.31.patch
Description:We have reported this issue to Digium support, however were asked to submit the patch via this tracker.

In brief: the DEADLOCK_AVOIDANCE macro attempts to release an existing lock, then relock after a usleep(1), to allow other code to proceed around the lock.  However, it doesn't actually check to see if a lock was released by the unlock, before taking a new lock.  This means that whenever there is no lock released, a new lock is taken that will never be released, meaning that DEADLOCK_AVOIDANCE actually leaks locks, which can obviously lead to a deadlock.
Comments:By: Peter Fern (pdf) 2010-05-27 00:24:46

Attached patch resolves.

By: David Woolley (davidw) 2010-05-27 06:22:01

Isn't failing to unlock an indication of a serious coding error?  Rather than simply not locking, shouldn't it be outputting an error message.

Also, a deadlock of the resource may well be the safest result, as it has potentially been corrupted by manipulating it when it was, incorrectly, assumed to have been locked, and deadlocking it limits the area of damage.

By: David Woolley (davidw) 2010-05-27 09:49:47

This is papering over the cracks.  It is a pre-condition of using this macro that you do own the lock.

The overall intent is that you end up owning all the relevant locks.

If you can reproduce this condition without modifying the code, you need to fix the logic that allows a path to this call without successively gaining the lock.

By: Peter Fern (pdf) 2010-05-27 18:40:20

Adding some logging is certainly a good idea, and will help find coding errors, but leaving the existing code in place means that any time someone calls DEADLOCK_AVOIDANCE outside a lock, they'll leak a lock and will get stuck on it later.  Surely it's better to (perhaps log first as you say) continue if called unnecessarily, rather than potentially bring the whole box to a halt by adding a lock that will never be released?

By: Peter Fern (pdf) 2010-05-28 02:32:02

Actually, thinking about it - logging in the macro, whilst still a good idea to tell us that someone is doing something bad, will only be marginally useful in tracking down the problem since this will only tell us the point at which the macro was called without a lock, however the problem code may be much further up the stack.

I've reported one possible instance of this in ASTERISK-1717414, but I certainly can't say that there aren't others...

By: David Woolley (davidw) 2010-05-28 05:41:10

I think there are awkward policy questions: one has to balance the risk of instability, against the loss of service.

However, as an example, you will find that Linux drivers that get themselves into difficulty deliberately leave the offending process in an unkillable state, because the resources owned by the process are in an unknown state, so cannot safely be released.  In this context, the right approach might be to request the lock, then go into an indefinite sleep.

I think similar provisions go back to early mainframe OSes.

Where this causes difficulty for Asterisk is that I think Asterisk can end up temporarily locking resources in processing that is not fundamentally critical to system integrity, typically resulting in the CLI locking out because of some enquiry operation.  That means that a compromised resource may result in a complete system lockup, rather than, say, losing just one device, or call.

If you just log a warning, most people reporting a problem will not associate it with the problem, especially if it is some way back in the logs.

Forcing a restart may actually produce a more reliable system, as, even though it is drastic, you will not get weird, unexplainable, behaviour down stream.



By: Miguel Molina (coolmig) 2010-05-31 11:11:12

I wonder if all asterisk code uses the DEADLOCK_AVOIDANCE macro where it's needed instead of the explicit deadlock cycle.

By: Evandro César Arruda (ecarruda) 2010-06-18 21:51:47

Hey People,

I'm running this patch on two different servers running 1.4.32, i was having 1 or two freezes by day on asterisk, something work like IAX2 logging Maximum Retries, but sip phones unregister, all the calls stop to work, everthing go down but asterisk don't crash or generate core dump.

After the patch i have this uptime: System uptime: 3 days, 14 hours, 51 minutes, 53 seconds

No more freezes or lock, i will continue testing this patch on more server and i will write new report in few days, but, i can tell now, IT WORKED, solved all my lock problems, i'm having this problem more/less 5 months, many thanks PDF.

By: Evandro César Arruda (ecarruda) 2010-06-25 06:42:26

People,

Solved my problem, stopped to crash asterisk


*CLI> core show uptime
System uptime: 1 week, 2 days, 23 hours, 31 minutes, 53 seconds


Without this patch my max uptime was 3 days.

Thank you soo much man.

By: Miguel Molina (coolmig) 2010-06-25 17:27:51

That's very good news.

By: Paul Belanger (pabelanger) 2010-06-25 19:36:08

Promoted to 'Ready for Review'

By: David Woolley (davidw) 2010-06-28 07:13:46

Is there a reviewboard reference for this?

By: Digium Subversion (svnbot) 2010-07-02 16:36:39

Repository: asterisk
Revision: 273793

U   branches/1.4/channels/chan_agent.c
U   branches/1.4/channels/chan_dahdi.c
U   branches/1.4/channels/chan_h323.c
U   branches/1.4/channels/chan_local.c
U   branches/1.4/configure
U   branches/1.4/configure.ac
U   branches/1.4/include/asterisk/autoconfig.h.in
U   branches/1.4/include/asterisk/compiler.h
U   branches/1.4/include/asterisk/lock.h

------------------------------------------------------------------------
r273793 | tilghman | 2010-07-02 16:36:39 -0500 (Fri, 02 Jul 2010) | 9 lines

Have the DEADLOCK_AVOIDANCE macro warn when an unlock fails, to help catch potentially large software bugs.

(closes issue ASTERISK-16156)
Reported by: pdf
Patches:
      20100527__issue17407.diff.txt uploaded by tilghman (license 14)

Review: https://reviewboard.asterisk.org/r/751/

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=273793

By: Digium Subversion (svnbot) 2010-07-02 21:36:31

Repository: asterisk
Revision: 273830

_U  trunk/
U   trunk/channels/chan_agent.c
U   trunk/channels/chan_h323.c
U   trunk/channels/chan_local.c
U   trunk/include/asterisk/lock.h

------------------------------------------------------------------------
r273830 | tilghman | 2010-07-02 21:36:30 -0500 (Fri, 02 Jul 2010) | 16 lines

Merged revisions 273793 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r273793 | tilghman | 2010-07-02 16:36:39 -0500 (Fri, 02 Jul 2010) | 9 lines
 
 Have the DEADLOCK_AVOIDANCE macro warn when an unlock fails, to help catch potentially large software bugs.
 
 (closes issue ASTERISK-16156)
  Reported by: pdf
  Patches:
        20100527__issue17407.diff.txt uploaded by tilghman (license 14)
 
 Review: https://reviewboard.asterisk.org/r/751/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=273830

By: Digium Subversion (svnbot) 2010-07-02 21:43:06

Repository: asterisk
Revision: 273831

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_agent.c
U   branches/1.6.2/channels/chan_h323.c
U   branches/1.6.2/channels/chan_local.c
U   branches/1.6.2/include/asterisk/lock.h

------------------------------------------------------------------------
r273831 | tilghman | 2010-07-02 21:43:06 -0500 (Fri, 02 Jul 2010) | 23 lines

Merged revisions 273830 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r273830 | tilghman | 2010-07-02 21:36:31 -0500 (Fri, 02 Jul 2010) | 16 lines
 
 Merged revisions 273793 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r273793 | tilghman | 2010-07-02 16:36:39 -0500 (Fri, 02 Jul 2010) | 9 lines
   
   Have the DEADLOCK_AVOIDANCE macro warn when an unlock fails, to help catch potentially large software bugs.
   
   (closes issue ASTERISK-16156)
    Reported by: pdf
    Patches:
          20100527__issue17407.diff.txt uploaded by tilghman (license 14)
   
   Review: https://reviewboard.asterisk.org/r/751/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=273831