[Home]

Summary:ASTERISK-12077: [patch] chan_iax2 becomes unresponsive
Reporter:gewfie (gewfie)Labels:
Date Opened:2008-05-23 19:23:54Date Closed:2008-06-03 11:17:53
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20080525__bug12717.diff.txt
( 1) locks.txt
Description:chan_iax2 over a period of time dies.  we have tried reloading the module but this does not bring it back.  we have to restart Asterisk in order to bring it back to life.

no new channels are able to be created, no registrations, etc.

we are not able to reproduce this output as it has a random nature.

attached is locks.txt which shows the current locks when it has died and they all point to the poke peer part of chan_iax2.c

Asterisk does not crash so there is no core dump or gdb output.  If there is anything else that you might need please let me know.

The system is running 1.4.20-rc3.
Comments:By: Tilghman Lesher (tilghman) 2008-05-25 11:37:12

Patch uploaded.  Please test.

By: gewfie (gewfie) 2008-05-25 21:09:01

We are now currently testing 1.4.20-rc3 with the patch you have provided and will report any feedback as soon as we can.

By: Tilghman Lesher (tilghman) 2008-06-02 10:59:29

gewfie:  given that it's been a week now with no further reports, may we assume that this patch fixes the issue for you?

By: gewfie (gewfie) 2008-06-03 00:08:47

Hi Corydon,

The server has been up for 1 week, 8 hours, 57 minutes, 28 seconds which is the longest uptime that we have had since going to 1.4 branch I believe due to this bug.

Not sure if this is related but during this week we have received the following hung threads in "core show locks".

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 79195040 (pbx_thread           started at [ 2660] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 1950 __ast_read &chan->lock 0x89ac048 (1)
=== -------------------------------------------------------------------
===
=== Thread ID: 22924192 (pbx_thread           started at [ 2660] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 1950 __ast_read &chan->lock 0x8a6d570 (1)
=== -------------------------------------------------------------------
===
=======================================================================

We run a "core show channels" through the AMI and when the thread gets locked we are unable to do "core show channels" any longer as it is not able to get the full channel list.

Apart from that the patch you provided has seemed to resolve the issue with chan_iax2 dying.

Thanks,

gewfie

By: Digium Subversion (svnbot) 2008-06-03 11:04:16

Repository: asterisk
Revision: 120001

U   branches/1.4/channels/chan_iax2.c

------------------------------------------------------------------------
r120001 | tilghman | 2008-06-03 11:04:16 -0500 (Tue, 03 Jun 2008) | 9 lines

Save the callno when we're poking, because our peer structure could change
during deadlock avoidance (and thus we unlock the wrong callno, causing a
cascade failure).
(closes issue ASTERISK-12077)
Reported by: gewfie
Patches:
      20080525__bug12717.diff.txt uploaded by Corydon76 (license 14)
Tested by: gewfie

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=120001

By: Digium Subversion (svnbot) 2008-06-03 11:13:15

Repository: asterisk
Revision: 120012

_U  trunk/
U   trunk/channels/chan_iax2.c

------------------------------------------------------------------------
r120012 | tilghman | 2008-06-03 11:13:15 -0500 (Tue, 03 Jun 2008) | 17 lines

Merged revisions 120001 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r120001 | tilghman | 2008-06-03 11:10:53 -0500 (Tue, 03 Jun 2008) | 9 lines

Save the callno when we're poking, because our peer structure could change
during deadlock avoidance (and thus we unlock the wrong callno, causing a
cascade failure).
(closes issue ASTERISK-12077)
Reported by: gewfie
Patches:
      20080525__bug12717.diff.txt uploaded by Corydon76 (license 14)
Tested by: gewfie

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=120012

By: Digium Subversion (svnbot) 2008-06-03 11:17:53

Repository: asterisk
Revision: 120034

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_iax2.c

------------------------------------------------------------------------
r120034 | tilghman | 2008-06-03 11:17:52 -0500 (Tue, 03 Jun 2008) | 25 lines

Merged revisions 120012 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r120012 | tilghman | 2008-06-03 11:19:35 -0500 (Tue, 03 Jun 2008) | 17 lines

Merged revisions 120001 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r120001 | tilghman | 2008-06-03 11:10:53 -0500 (Tue, 03 Jun 2008) | 9 lines

Save the callno when we're poking, because our peer structure could change
during deadlock avoidance (and thus we unlock the wrong callno, causing a
cascade failure).
(closes issue ASTERISK-12077)
Reported by: gewfie
Patches:
      20080525__bug12717.diff.txt uploaded by Corydon76 (license 14)
Tested by: gewfie

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=120034