Summary: | ASTERISK-09737: random crashes in channel.c | ||
Reporter: | paradise (paradise) | Labels: | |
Date Opened: | 2007-06-23 00:43:06 | Date Closed: | 2007-11-05 14:12:57.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) crash-bt.txt ( 1) crash-bt2.txt ( 2) crash-bt3.txt ( 3) crash-bt4.txt ( 4) extensions.conf | |
Description: | i don't really know how these crashes happen? but it occurs 3-6 times a day and is very annoying. i have these problem for a long time and reverting back to old 1.2 versions didn't solve my problem. ****** ADDITIONAL INFORMATION ****** - my box is not patched. - bt and bt full is attached | ||
Comments: | By: Russell Bryant (russell) 2007-06-23 23:40:37 Please rebuild with "make clean && make dont-optimize". The backtrace is completely bogus so it's not very helpful ... By: paradise (paradise) 2007-06-25 00:46:30 new backtrace on "make dont-optimize" binaries uploaded. By: paradise (paradise) 2007-06-27 15:13:03 please Help! i still have problem. now even more than 8 crashes a day By: Tilghman Lesher (tilghman) 2007-06-27 15:42:17 Given that this is probably a memory corruption error, we're going to need to be able to reproduce the exact circumstance in which this corruption happens. Therefore, please upload your extensions.conf and your agi script to the file upload area of this bug. By: Russell Bryant (russell) 2007-06-27 15:54:36 Which channel types are you using? By: paradise (paradise) 2007-06-27 16:04:35 i'm using SIP(mostly audiocodes fxs gateways) and ZAP(digium quad T1/E1) By: paradise (paradise) 2007-06-27 16:06:46 extensions.conf uploaded By: paradise (paradise) 2007-06-27 16:26:24 I'm using FastAgi to launch my AGIs. should i upload all of my agi files here!! BTW, FYI, just these agi commands is being used in my scripts: $Self->agi->exec('Answer'); $Self->agi->get_variable("GROUPCOUNT"); $Self->agi->exec('Dial',"SIP/$mynum\|$myrings\|Crg"); $Self->agi->exec('Dial',"SIP/$mynum\|$myrings\|CrgL($mytimeout:45000:15000)"); $Self->agi->exec('SetCDRUserField',"\"$mycdr\""); $Self->agi->exec('Set',"GROUP()=$mygrp"); $Self->agi->exec('GetGroupCount',"$mygrp"); $Self->agi->exec('Queue',"$myQ\|r"); By: paradise (paradise) 2007-06-27 16:31:16 Oops! I'm also using Local channels, and this agi command too: $Self->agi->exec('Dial',"Local/$myfwd\@myphones/n"); By: Tilghman Lesher (tilghman) 2007-06-27 16:37:18 No, that's sufficient information, I think. By: paradise (paradise) 2007-06-28 23:33:18 is it needed to upload more BTs? i found that some crashes occur in app_dial when freeing a bogus frame By: paradise (paradise) 2007-07-01 00:23:27 another bt uploaded By: Tilghman Lesher (tilghman) 2007-07-01 21:17:39 In the backtrace, I see that glibc is outputting an error to stderr. Could you capture the output and send me that error message? That will tell me which memory corruption error it is seeing (there are several possible). By: paradise (paradise) 2007-07-02 06:04:42 how can i do that? i'm sending my asterisk output messages to dump.log : asterisk -g >>/var/log/dump.log 2>>/var/log/dump.log but there's nothing there. By: paradise (paradise) 2007-07-05 00:07:48 what should i do now? By: Jason Parker (jparker) 2007-07-31 11:21:37 Can this issue be reproduced on 1.4? Once 1.2 goes into maintenance mode (scheduled for tomorrow - August 1st), all issues that only affect 1.2 may be closed. By: paradise (paradise) 2007-07-31 14:25:47 I don't use 1.4 ! but i will try it. By: Steve Murphy (murf) 2007-08-02 13:53:26 Ok, since this bug is unassigned, and against 1.2, I'm closing this with "won't fix" resolution, because the time for 1.2 support is now expired. Now, this need not be the end of the world. If you can, on a side system, please move this to a 1.4 system, and see if you can reproduce the problem on 1.4. If the problem persists, Yay! you can either re-open this bug, or file a new one against the 1.4, and we'll continue the effort. If you can't reproduce it, and the system is stable, you might, maybe, have an excuse to move up to 1.4. Sorry to do this to you; but we need to concentrate on 1.4 and the next release if we are to be effective in keeping Asterisk moving. Just for a heads-up, please read with care the UPGRADE and CHANGES, and compare your current config files against the stuff in the 1.4/configs dir, to see if config options have changed. This is a common downfall of those updating. By: paradise (paradise) 2007-08-06 23:12:37 I've just ported to SVN-branch-1.4-r78103M but still have that crashes. new BT is attached. By: Digium Subversion (svnbot) 2007-11-01 14:29:03 Repository: asterisk Revision: 88153 U team/russell/readq-1.4/main/channel.c ------------------------------------------------------------------------ r88153 | russell | 2007-11-01 14:29:02 -0500 (Thu, 01 Nov 2007) | 15 lines The readq handling in ast_do_masquerade() got broken when the code was converted to use the AST_LIST macros. Furthermore, the actual operation performed was extremely bizarre. I have re-written the readq handling in ast_do_masquerade() to make it safe so that the readq list does not get corrupted, as well as simplified and documented the code. There is also another fix for list handling for channel datastores. (related to issues ASTERISK-10489, ASTERISK-10193, ASTERISK-10012, and the 2nd backtrace of ASTERISK-10616) (potentially related to issues ASTERISK-9737 and ASTERISK-10404) For users involved with any of the bug reports I have listed, please give this code a try: $ svn co http://svn.digium.com/svn/asterisk/team/russell/readq-1.4 ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-11-05 14:10:21.000-0600 Repository: asterisk Revision: 88709 U branches/1.4/main/channel.c ------------------------------------------------------------------------ r88709 | russell | 2007-11-05 14:10:17 -0600 (Mon, 05 Nov 2007) | 20 lines Merge the last bit of changes from asterisk/team/russell/readq-1.4 The issue here is that the channel frame readq handling got broken when the code was converted to use the linked list macros. It caused corruption of the list head and tail pointers. So, I fixed up the usage of the linked list macros and in passing, simplified the code. I also documented what the code is doing, as it was a bit difficult to figure out at first. This bug showed itself with crashes showing messed up head/tail pointers for the readq. However, there are a couple of crashes that aren't quite as obvious, but I think may be related. So, if your bug gets closed by this commit, but you still have a problem, please reopen or create a new bug report. (closes issue ASTERISK-10489) (closes issue ASTERISK-10193) (closes issue ASTERISK-10012) (closes issue ASTERISK-10616) (closes issue ASTERISK-9737) (closes issue ASTERISK-10404) ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-11-05 14:12:57.000-0600 Repository: asterisk Revision: 88710 _U trunk/ U trunk/main/channel.c ------------------------------------------------------------------------ r88710 | russell | 2007-11-05 14:12:56 -0600 (Mon, 05 Nov 2007) | 28 lines Merged revisions 88709 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r88709 | russell | 2007-11-05 14:11:04 -0600 (Mon, 05 Nov 2007) | 20 lines Merge the last bit of changes from asterisk/team/russell/readq-1.4 The issue here is that the channel frame readq handling got broken when the code was converted to use the linked list macros. It caused corruption of the list head and tail pointers. So, I fixed up the usage of the linked list macros and in passing, simplified the code. I also documented what the code is doing, as it was a bit difficult to figure out at first. This bug showed itself with crashes showing messed up head/tail pointers for the readq. However, there are a couple of crashes that aren't quite as obvious, but I think may be related. So, if your bug gets closed by this commit, but you still have a problem, please reopen or create a new bug report. (closes issue ASTERISK-10489) (closes issue ASTERISK-10193) (closes issue ASTERISK-10012) (closes issue ASTERISK-10616) (closes issue ASTERISK-9737) (closes issue ASTERISK-10404) ........ ------------------------------------------------------------------------ |