[Home]

Summary:ASTERISK-10436: Asterisk 1.4.12 crashes in channel.c
Reporter:callguy (callguy)Labels:
Date Opened:2007-10-03 12:34:52Date Closed:2007-10-16 17:01:01
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt.txt
( 1) btfull-1.4.12.crash.txt
( 2) bt-full-crash-10042007.txt
( 3) bt-ivan.txt
( 4) core171711102007.log
( 5) crash-1.4.12.channel-2.txt
Description:We upgraded to asterisk 1.4.12 and experienced a race condition that caused a crash within a few hours under load. This appears to be in channel.c. bt/bt full attached.
Comments:By: callguy (callguy) 2007-10-03 13:12:53

experienced again. looks slightly different but similar. additional bt/bt full attached.

By: Volnikov Ivan (ivan) 2007-10-04 04:54:52

callguy -
Can you display me in gdb dialog next output:
(gdb) frame 0
(gdb) p *t
and
(gdb) frame 1
(gdb) p *chan
thanks
There is a suspicion, that it is connected with even not one theme
(http://bugs.digium.com/view.php?id=10571)

By: Volnikov Ivan (ivan) 2007-10-04 05:00:43

It is not crash in <channel.c>.
It crash in wraper for critical section implemented in <lock.h>.
callguy -
Is your Asterisk run on multi-core (or multi-CPU) system?

By: callguy (callguy) 2007-10-04 06:18:28

Ivan - the gdb output you requested has been uploaded. Yes - we are on a dual CPU, multi-core machine (dell 1950) so the OS sees it as 4 CPUs.

By: callguy (callguy) 2007-10-04 10:51:26

Crash occurred again - we rolled back to 1.4.11 last night, which doesn't appear to have changed anything. bt looks similar.

Is there anything else we can provide to help get this resolved?

By: callguy (callguy) 2007-10-04 11:05:12

One additional note. This behavior is always triggered by ring groups with a large number of phones in them. When one of the users goes to answer it causes the condition to occur.

By: Volnikov Ivan (ivan) 2007-10-05 01:23:49

It is obvious, that at this phenomena one nature with http://bugs.digium.com/view.php?id=10571 Issue...
I already wrote about reasons of it (see notes to 10571).
Unfortunately developers languidly react to my questions.
I shell try to mailing with someone directly.

By: David Brillert (aragon) 2007-10-12 15:39:11

I hope this is related to 10571 because it looks like there is a working patch to fix 10571.

My backtraces look identical to those posted in this report (10875)

Will ivan_ast_1_4_12_rel_patch_lock.h.diff apply to 1.4.13
I have been experiencing core dumps on multiple sites related to this bug
I am using version 1.4.13
I'm curious if this patch will supersede r85158 | tilghman | 2007-10-09 16:55:06 r85158 | tilghman | 2007-10-09 16:34:34 -0500 (Tue, 09 Oct 2007) | 5 lines

This commit fixes the following issues:
- Deadlock in ast_write (issue 0010406)
- Deadlock in ast_read (issue 0010406)
- Possible mutex initialization error in lock.h (issue 0010571)

r85158 did not work for me.

By: David Brillert (aragon) 2007-10-12 15:41:33

I uploaded my bt core171711102007.log

By: callguy (callguy) 2007-10-12 15:45:54

Aragon: This is the same underlying issue as in 10571. You can use the most recent patch there against 1.4.12.1 and it should resolve this issue.

By: Jason Parker (jparker) 2007-10-15 17:40:46

Were you able to test the patch in ASTERISK-10177?

By: David Brillert (aragon) 2007-10-16 09:01:15

qwell I have been following up on this using ticket 10571
We are testing Ivans's patch on Asterisk 1.4.13 with debug threads and dont optimize enabled.
The systems were recompiled 16/10/2007 approx 18:00 hours
I am waiting and checking logs periodically to see if there are any new coredumps.

By: David Brillert (aragon) 2007-10-16 11:13:45

My site segfaulted already this morning.
I dont think anymore these problems are related to 10571
I have attached bt.txt if anyone wants to look at the backtrace.
I have opened a new bug report
http://bugs.digium.com/view.php?id=10997

By: Digium Subversion (svnbot) 2007-10-16 16:53:56

Repository: asterisk
Revision: 85994

U   branches/1.4/include/asterisk/lock.h

------------------------------------------------------------------------
r85994 | russell | 2007-10-16 16:53:52 -0500 (Tue, 16 Oct 2007) | 16 lines

Some locking errors exposed the fact that the lock debugging code itself was
not thread safe.  How ironic!  Anyway, these changes ensure that the code that
is accessing the lock debugging data is thread-safe.  

Many thanks to Ivan for finding and fixing the core issue here, and also
thanks to those that tested the patch and provided test results.

(closes issue ASTERISK-10177)
(closes issue ASTERISK-10446)
(closes issue ASTERISK-10436)
(might close some others, as well ...)

Patches: (from issue ASTERISK-10177)
     ivan_ast_1_4_12_rel_patch_lock.h.diff uploaded by Ivan (license 229)
      - a few small changes by me

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-10-16 17:01:01

Repository: asterisk
Revision: 85995

_U  trunk/
U   trunk/include/asterisk/lock.h

------------------------------------------------------------------------
r85995 | russell | 2007-10-16 17:01:01 -0500 (Tue, 16 Oct 2007) | 24 lines

Merged revisions 85994 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r85994 | russell | 2007-10-16 17:14:36 -0500 (Tue, 16 Oct 2007) | 16 lines

Some locking errors exposed the fact that the lock debugging code itself was
not thread safe.  How ironic!  Anyway, these changes ensure that the code that
is accessing the lock debugging data is thread-safe.  

Many thanks to Ivan for finding and fixing the core issue here, and also
thanks to those that tested the patch and provided test results.

(closes issue ASTERISK-10177)
(closes issue ASTERISK-10446)
(closes issue ASTERISK-10436)
(might close some others, as well ...)

Patches: (from issue ASTERISK-10177)
     ivan_ast_1_4_12_rel_patch_lock.h.diff uploaded by Ivan (license 229)
      - a few small changes by me

........

------------------------------------------------------------------------