|Summary:||ASTERISK-18882: Asterisk lock during production|
|Reporter:||Andrew Parisio (parisioa)||Labels:|
|Date Opened:||2011-11-17 16:58:06.000-0600||Date Closed:||2011-12-20 14:18:21.000-0600|
|Environment:||Attachments:||( 0) locks.txt|
( 1) threads.txt
|Description:||During production asterisk hangs and simply stops processing calls. Existing calls do not drop, and asterisk will not restart or stop with a kill (it must be kill -9'd).|
This has happened every other day or so for the last two weeks, except for today where it has happened twice. I had debug threads & don't opt turned on to catch the attached core show threads & core show locks.
|Comments:||By: Andrew Parisio (parisioa) 2011-11-17 16:59:28.508-0600|
Leif: Please don't body slam me over this one.
By: Matt Jordan (mjordan) 2011-11-17 17:02:19.546-0600
Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.4 and 1.6.x branches has ended. For continued maintenance support please move to the 1.8 branch which is a long term support (LTS) branch. For more information about branch support, please see https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions. After testing with Asterisk 1.8, if you find this problem has not been resolved, please open a new issue against Asterisk 1.8.
By: Richard Mudgett (rmudgett) 2011-11-17 18:01:14.468-0600
I think you are in a deadlock avoidance loop in chan_local that can never be resolved because of other channel locks held by ast_do_masquerade().
Locking in this area of code is very different in v1.8.
By: Andrew Parisio (parisioa) 2011-11-17 18:18:46.902-0600
Having looked at the logs a little closer it appears to happen around the time a reload occurs (sip reload & dialplan reload), although it doesn't happen at every reload, just every once in a while, seemingly randomly.
I'll test upgrading to 1.8 and see if it continues to happen.
By: Leif Madsen (lmadsen) 2011-11-18 07:55:14.432-0600
Andrew: I promise nothing!
By: Andrew Parisio (parisioa) 2011-11-22 12:10:14.602-0600
We upgraded to 126.96.36.199 in production today and are testing it out. Given the short week we have light call volume and may not trigger it anyway so it may not be confirmed fixed until next week (we didn't trigger it yesterday in 188.8.131.52).
By: Leif Madsen (lmadsen) 2011-12-20 08:52:25.339-0600
By: Andrew Parisio (parisioa) 2011-12-20 12:20:45.174-0600
We haven't had a lock since so it appears as though the issue was resolved somewhere in 1.8.
By: Matt Jordan (mjordan) 2011-12-20 14:18:21.350-0600
Per Andrew, this appears to be resolved in the 1.8 branch