[Home]

Summary:ASTERISK-18882: Asterisk lock during production
Reporter:Andrew Parisio (parisioa)Labels:
Date Opened:2011-11-17 16:58:06.000-0600Date Closed:2011-12-20 14:18:21.000-0600
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Channels/chan_local Core/PBX
Versions:1.6.2.20 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) locks.txt
( 1) threads.txt
Description:During production asterisk hangs and simply stops processing calls.  Existing calls do not drop, and asterisk will not restart or stop with a kill (it must be kill -9'd).

This has happened every other day or so for the last two weeks, except for today where it has happened twice.  I had debug threads & don't opt turned on to catch the attached core show threads & core show locks.  
Comments:By: Andrew Parisio (parisioa) 2011-11-17 16:59:28.508-0600

Leif: Please don't body slam me over this one.

By: Matt Jordan (mjordan) 2011-11-17 17:02:19.546-0600

Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.4 and 1.6.x branches has ended. For continued maintenance support please move to the 1.8 branch which is a long term support (LTS) branch. For more information about branch support, please see https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions.  After testing with Asterisk 1.8, if you find this problem has not been resolved, please open a new issue against Asterisk 1.8.



By: Richard Mudgett (rmudgett) 2011-11-17 18:01:14.468-0600

I think you are in a deadlock avoidance loop in chan_local that can never be resolved because of other channel locks held by ast_do_masquerade().

Locking in this area of code is very different in v1.8.

By: Andrew Parisio (parisioa) 2011-11-17 18:18:46.902-0600

Having looked at the logs a little closer it appears to happen around the time a reload occurs (sip reload & dialplan reload), although it doesn't happen at every reload, just every once in a while, seemingly randomly.

I'll test upgrading to 1.8 and see if it continues to happen.

By: Leif Madsen (lmadsen) 2011-11-18 07:55:14.432-0600

Andrew: I promise nothing!

By: Andrew Parisio (parisioa) 2011-11-22 12:10:14.602-0600

We upgraded to 1.8.7.1 in production today and are testing it out.  Given the short week we have light call volume and may not trigger it anyway so it may not be confirmed fixed until next week (we didn't trigger it yesterday in 1.6.2.20).

By: Leif Madsen (lmadsen) 2011-12-20 08:52:25.339-0600

Ping?

By: Andrew Parisio (parisioa) 2011-12-20 12:20:45.174-0600

We haven't had a lock since so it appears as though the issue was resolved somewhere in 1.8.

Thanks!

By: Matt Jordan (mjordan) 2011-12-20 14:18:21.350-0600

Per Andrew, this appears to be resolved in the 1.8 branch