Summary: | ASTERISK-10012: app_dial segfaults asterisk while trying to bridge channels | ||
Reporter: | Frank Waller (explidous) | Labels: | |
Date Opened: | 2007-08-02 09:45:18 | Date Closed: | 2007-11-05 14:12:57.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Applications/app_dial |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) all_thread_bt ( 1) bt.txt ( 2) cli_ouput ( 3) single_thread_full_bt | |
Description: | app_dial causes a seg fault while trying to bridge channels on system placing many calls. I was running Vicidial (a predictive dialer) on this server with twenty agents and dialing at a ratio of four to one. This means that there are twenty channels waiting in twenty meetmes and the server is dialing 80 numbers via IAX to another XEN server on the same box. When a number connects they get placed into one of the meetmes. This crash happened amongst some other crashes that I am still debugging. I have not been able to narrow down exactly what caused this one. Most likely a threading issue. ****** STEPS TO REPRODUCE ****** I am able to reproduce this by simply dialing many calls simultaneously via IAX to the same server on a low latency connection. | ||
Comments: | By: Frank Waller (explidous) 2007-08-02 12:37:09 Looking at the single thread back trace there is something really funky going on here that makes me think the stack is getting screwed up. ast_channel_bridge in channel.c calls ast_generic_bridge in frame.h, but there is no function called that in frame.h. there is however in channel.c. and ast_generic_bridge in channel.c does not call ast_frame_free in frame.c By: Mark Michelson (mmichelson) 2007-08-03 16:55:30 to explidous: ast_generic_bridge calls ast_frfree on line 3959 of channel.c. This function is inlined, so that is why the function name shows up in the backtrace as ast_generic_bridge with the filename as frame.h. ast_frfree then calls ast_frame_free. By: Frank Waller (explidous) 2007-08-06 10:16:57 Ahh that is the source of the confusion.... However I had DONT_OPTIMIZE enabled shouldn't that have passed to gcc -fno-inline so that inlining is disabled? By: Frank Waller (explidous) 2007-08-07 15:28:03 ok, here is a second backtrace on what seems to be the same issue this time on 1.4 SVN 78445 By: Michiel van Baak (mvanbaak) 2007-09-08 07:16:06 This one still relevant ? By: P. Christeas (xrg) 2007-09-10 14:53:26 What version/build of glibc do you have? I did notice an entry about a "fix pthread_mutex_timedlock() on x86_64" on my system and am currently checking that out.. By: Digium Subversion (svnbot) 2007-11-01 14:29:03 Repository: asterisk Revision: 88153 U team/russell/readq-1.4/main/channel.c ------------------------------------------------------------------------ r88153 | russell | 2007-11-01 14:29:02 -0500 (Thu, 01 Nov 2007) | 15 lines The readq handling in ast_do_masquerade() got broken when the code was converted to use the AST_LIST macros. Furthermore, the actual operation performed was extremely bizarre. I have re-written the readq handling in ast_do_masquerade() to make it safe so that the readq list does not get corrupted, as well as simplified and documented the code. There is also another fix for list handling for channel datastores. (related to issues ASTERISK-10489, ASTERISK-10193, ASTERISK-10012, and the 2nd backtrace of ASTERISK-10616) (potentially related to issues ASTERISK-9737 and ASTERISK-10404) For users involved with any of the bug reports I have listed, please give this code a try: $ svn co http://svn.digium.com/svn/asterisk/team/russell/readq-1.4 ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-11-05 14:10:20.000-0600 Repository: asterisk Revision: 88709 U branches/1.4/main/channel.c ------------------------------------------------------------------------ r88709 | russell | 2007-11-05 14:10:17 -0600 (Mon, 05 Nov 2007) | 20 lines Merge the last bit of changes from asterisk/team/russell/readq-1.4 The issue here is that the channel frame readq handling got broken when the code was converted to use the linked list macros. It caused corruption of the list head and tail pointers. So, I fixed up the usage of the linked list macros and in passing, simplified the code. I also documented what the code is doing, as it was a bit difficult to figure out at first. This bug showed itself with crashes showing messed up head/tail pointers for the readq. However, there are a couple of crashes that aren't quite as obvious, but I think may be related. So, if your bug gets closed by this commit, but you still have a problem, please reopen or create a new bug report. (closes issue ASTERISK-10489) (closes issue ASTERISK-10193) (closes issue ASTERISK-10012) (closes issue ASTERISK-10616) (closes issue ASTERISK-9737) (closes issue ASTERISK-10404) ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-11-05 14:12:57.000-0600 Repository: asterisk Revision: 88710 _U trunk/ U trunk/main/channel.c ------------------------------------------------------------------------ r88710 | russell | 2007-11-05 14:12:56 -0600 (Mon, 05 Nov 2007) | 28 lines Merged revisions 88709 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r88709 | russell | 2007-11-05 14:11:04 -0600 (Mon, 05 Nov 2007) | 20 lines Merge the last bit of changes from asterisk/team/russell/readq-1.4 The issue here is that the channel frame readq handling got broken when the code was converted to use the linked list macros. It caused corruption of the list head and tail pointers. So, I fixed up the usage of the linked list macros and in passing, simplified the code. I also documented what the code is doing, as it was a bit difficult to figure out at first. This bug showed itself with crashes showing messed up head/tail pointers for the readq. However, there are a couple of crashes that aren't quite as obvious, but I think may be related. So, if your bug gets closed by this commit, but you still have a problem, please reopen or create a new bug report. (closes issue ASTERISK-10489) (closes issue ASTERISK-10193) (closes issue ASTERISK-10012) (closes issue ASTERISK-10616) (closes issue ASTERISK-9737) (closes issue ASTERISK-10404) ........ ------------------------------------------------------------------------ |