[Home]

Summary:ASTERISK-10012: app_dial segfaults asterisk while trying to bridge channels
Reporter:Frank Waller (explidous)Labels:
Date Opened:2007-08-02 09:45:18Date Closed:2007-11-05 14:12:57.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_dial
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) all_thread_bt
( 1) bt.txt
( 2) cli_ouput
( 3) single_thread_full_bt
Description:app_dial causes a seg fault while trying to bridge channels on system placing many calls.

I was running Vicidial (a predictive dialer) on this server with twenty agents and dialing at a ratio of four to one. This means that there are twenty channels waiting in twenty meetmes and the server is dialing 80 numbers via IAX to another XEN server on the same box. When a number connects they get placed into one of the meetmes.

This crash happened amongst some other crashes that I am still debugging. I have not been able to narrow down exactly what caused this one. Most likely a threading issue.

****** STEPS TO REPRODUCE ******

I am able to reproduce this by simply dialing many calls simultaneously via IAX to the same server on a low latency connection.
Comments:By: Frank Waller (explidous) 2007-08-02 12:37:09

Looking at the single thread back trace there is something really funky going on here that makes me think the stack is getting screwed up.

ast_channel_bridge in channel.c calls ast_generic_bridge in frame.h, but there is no function called that in frame.h. there is however in channel.c. and ast_generic_bridge in channel.c does not call ast_frame_free in frame.c

By: Mark Michelson (mmichelson) 2007-08-03 16:55:30

to explidous:

ast_generic_bridge calls ast_frfree on line 3959 of channel.c. This function is inlined, so that is why the function name shows up in the backtrace as ast_generic_bridge with the filename as frame.h. ast_frfree then calls ast_frame_free.

By: Frank Waller (explidous) 2007-08-06 10:16:57

Ahh that is the source of the confusion.... However I had DONT_OPTIMIZE enabled shouldn't that have passed to gcc -fno-inline so that inlining is disabled?

By: Frank Waller (explidous) 2007-08-07 15:28:03

ok, here is a second backtrace on what seems to be the same issue this time on 1.4 SVN 78445

By: Michiel van Baak (mvanbaak) 2007-09-08 07:16:06

This one still relevant ?

By: P. Christeas (xrg) 2007-09-10 14:53:26

What version/build of glibc do you have?
I did notice an entry about a "fix pthread_mutex_timedlock() on x86_64" on my system and am currently checking that out..

By: Digium Subversion (svnbot) 2007-11-01 14:29:03

Repository: asterisk
Revision: 88153

U   team/russell/readq-1.4/main/channel.c

------------------------------------------------------------------------
r88153 | russell | 2007-11-01 14:29:02 -0500 (Thu, 01 Nov 2007) | 15 lines

The readq handling in ast_do_masquerade() got broken when the code was converted
to use the AST_LIST macros.  Furthermore, the actual operation performed was
extremely bizarre.  I have re-written the readq handling in ast_do_masquerade()
to make it safe so that the readq list does not get corrupted, as well as
simplified and documented the code. There is also another fix for list handling
for channel datastores.

(related to issues ASTERISK-10489, ASTERISK-10193, ASTERISK-10012, and the 2nd backtrace of ASTERISK-10616)
(potentially related to issues ASTERISK-9737 and ASTERISK-10404)

For users involved with any of the bug reports I have listed, please give this
code a try:

$ svn co http://svn.digium.com/svn/asterisk/team/russell/readq-1.4

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-11-05 14:10:20.000-0600

Repository: asterisk
Revision: 88709

U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r88709 | russell | 2007-11-05 14:10:17 -0600 (Mon, 05 Nov 2007) | 20 lines

Merge the last bit of changes from asterisk/team/russell/readq-1.4

The issue here is that the channel frame readq handling got broken when the
code was converted to use the linked list macros.  It caused corruption of the
list head and tail pointers.  So, I fixed up the usage of the linked list
macros and in passing, simplified the code.  I also documented what the code
is doing, as it was a bit difficult to figure out at first.

This bug showed itself with crashes showing messed up head/tail pointers for
the readq.  However, there are a couple of crashes that aren't quite as obvious,
but I think may be related.  So, if your bug gets closed by this commit, but
you still have a problem, please reopen or create a new bug report.

(closes issue ASTERISK-10489)
(closes issue ASTERISK-10193)
(closes issue ASTERISK-10012)
(closes issue ASTERISK-10616)
(closes issue ASTERISK-9737)
(closes issue ASTERISK-10404)

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-11-05 14:12:57.000-0600

Repository: asterisk
Revision: 88710

_U  trunk/
U   trunk/main/channel.c

------------------------------------------------------------------------
r88710 | russell | 2007-11-05 14:12:56 -0600 (Mon, 05 Nov 2007) | 28 lines

Merged revisions 88709 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r88709 | russell | 2007-11-05 14:11:04 -0600 (Mon, 05 Nov 2007) | 20 lines

Merge the last bit of changes from asterisk/team/russell/readq-1.4

The issue here is that the channel frame readq handling got broken when the
code was converted to use the linked list macros.  It caused corruption of the
list head and tail pointers.  So, I fixed up the usage of the linked list
macros and in passing, simplified the code.  I also documented what the code
is doing, as it was a bit difficult to figure out at first.

This bug showed itself with crashes showing messed up head/tail pointers for
the readq.  However, there are a couple of crashes that aren't quite as obvious,
but I think may be related.  So, if your bug gets closed by this commit, but
you still have a problem, please reopen or create a new bug report.

(closes issue ASTERISK-10489)
(closes issue ASTERISK-10193)
(closes issue ASTERISK-10012)
(closes issue ASTERISK-10616)
(closes issue ASTERISK-9737)
(closes issue ASTERISK-10404)

........

------------------------------------------------------------------------