[Home]

Summary:ASTERISK-09737: random crashes in channel.c
Reporter:paradise (paradise)Labels:
Date Opened:2007-06-23 00:43:06Date Closed:2007-11-05 14:12:57.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) crash-bt.txt
( 1) crash-bt2.txt
( 2) crash-bt3.txt
( 3) crash-bt4.txt
( 4) extensions.conf
Description:i don't really know how these crashes happen?
but it occurs 3-6 times a day and is very annoying.
i have these problem for a long time and reverting back to old 1.2 versions didn't solve my problem.



****** ADDITIONAL INFORMATION ******

- my box is not patched.
- bt and bt full is attached
Comments:By: Russell Bryant (russell) 2007-06-23 23:40:37

Please rebuild with "make clean && make dont-optimize".  The backtrace is completely bogus so it's not very helpful ...

By: paradise (paradise) 2007-06-25 00:46:30

new backtrace on "make dont-optimize" binaries uploaded.



By: paradise (paradise) 2007-06-27 15:13:03

please Help!
i still have problem.
now even more than 8 crashes a day

By: Tilghman Lesher (tilghman) 2007-06-27 15:42:17

Given that this is probably a memory corruption error, we're going to need to be able to reproduce the exact circumstance in which this corruption happens.  Therefore, please upload your extensions.conf and your agi script to the file upload area of this bug.

By: Russell Bryant (russell) 2007-06-27 15:54:36

Which channel types are you using?

By: paradise (paradise) 2007-06-27 16:04:35

i'm using SIP(mostly audiocodes fxs gateways) and ZAP(digium quad T1/E1)



By: paradise (paradise) 2007-06-27 16:06:46

extensions.conf uploaded

By: paradise (paradise) 2007-06-27 16:26:24

I'm using FastAgi to launch my AGIs.
should i upload all of my agi files here!!
BTW, FYI, just these agi commands is being used in my scripts:

$Self->agi->exec('Answer');
$Self->agi->get_variable("GROUPCOUNT");
$Self->agi->exec('Dial',"SIP/$mynum\|$myrings\|Crg");
$Self->agi->exec('Dial',"SIP/$mynum\|$myrings\|CrgL($mytimeout:45000:15000)");
$Self->agi->exec('SetCDRUserField',"\"$mycdr\"");
$Self->agi->exec('Set',"GROUP()=$mygrp");
$Self->agi->exec('GetGroupCount',"$mygrp");
$Self->agi->exec('Queue',"$myQ\|r");



By: paradise (paradise) 2007-06-27 16:31:16

Oops!
I'm also using Local channels, and this agi command too:

$Self->agi->exec('Dial',"Local/$myfwd\@myphones/n");



By: Tilghman Lesher (tilghman) 2007-06-27 16:37:18

No, that's sufficient information, I think.

By: paradise (paradise) 2007-06-28 23:33:18

is it needed to upload more BTs?
i found that some crashes occur in app_dial when freeing a bogus frame

By: paradise (paradise) 2007-07-01 00:23:27

another bt uploaded

By: Tilghman Lesher (tilghman) 2007-07-01 21:17:39

In the backtrace, I see that glibc is outputting an error to stderr.  Could you capture the output and send me that error message?  That will tell me which memory corruption error it is seeing (there are several possible).

By: paradise (paradise) 2007-07-02 06:04:42

how can i do that?
i'm sending my asterisk output messages to dump.log :

asterisk -g >>/var/log/dump.log 2>>/var/log/dump.log

but there's nothing there.

By: paradise (paradise) 2007-07-05 00:07:48

what should i do now?

By: Jason Parker (jparker) 2007-07-31 11:21:37

Can this issue be reproduced on 1.4?

Once 1.2 goes into maintenance mode (scheduled for tomorrow - August 1st), all issues that only affect 1.2 may be closed.

By: paradise (paradise) 2007-07-31 14:25:47

I don't use 1.4 !
but i will try it.

By: Steve Murphy (murf) 2007-08-02 13:53:26

Ok, since this bug is unassigned, and against 1.2, I'm closing this with
"won't fix" resolution, because the time for 1.2 support is now expired.

Now, this need not be the end of the world. If you can, on a side system,
please move this to a 1.4 system, and see if you can reproduce the problem
on 1.4. If the problem persists, Yay! you can either re-open this bug,
or file a new one against the 1.4, and we'll continue the effort.

If you can't reproduce it, and the system is stable, you might, maybe,
have an excuse to move up to 1.4.

Sorry to do this to you; but we need to concentrate on 1.4 and the next release if we are to be effective in keeping Asterisk moving.

Just for a heads-up, please read with care the UPGRADE and CHANGES, and compare your current config files against the stuff in the 1.4/configs dir, to see if config options have changed. This is a common downfall of those updating.

By: paradise (paradise) 2007-08-06 23:12:37

I've just ported to SVN-branch-1.4-r78103M
but still have that crashes.

new BT is attached.



By: Digium Subversion (svnbot) 2007-11-01 14:29:03

Repository: asterisk
Revision: 88153

U   team/russell/readq-1.4/main/channel.c

------------------------------------------------------------------------
r88153 | russell | 2007-11-01 14:29:02 -0500 (Thu, 01 Nov 2007) | 15 lines

The readq handling in ast_do_masquerade() got broken when the code was converted
to use the AST_LIST macros.  Furthermore, the actual operation performed was
extremely bizarre.  I have re-written the readq handling in ast_do_masquerade()
to make it safe so that the readq list does not get corrupted, as well as
simplified and documented the code. There is also another fix for list handling
for channel datastores.

(related to issues ASTERISK-10489, ASTERISK-10193, ASTERISK-10012, and the 2nd backtrace of ASTERISK-10616)
(potentially related to issues ASTERISK-9737 and ASTERISK-10404)

For users involved with any of the bug reports I have listed, please give this
code a try:

$ svn co http://svn.digium.com/svn/asterisk/team/russell/readq-1.4

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-11-05 14:10:21.000-0600

Repository: asterisk
Revision: 88709

U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r88709 | russell | 2007-11-05 14:10:17 -0600 (Mon, 05 Nov 2007) | 20 lines

Merge the last bit of changes from asterisk/team/russell/readq-1.4

The issue here is that the channel frame readq handling got broken when the
code was converted to use the linked list macros.  It caused corruption of the
list head and tail pointers.  So, I fixed up the usage of the linked list
macros and in passing, simplified the code.  I also documented what the code
is doing, as it was a bit difficult to figure out at first.

This bug showed itself with crashes showing messed up head/tail pointers for
the readq.  However, there are a couple of crashes that aren't quite as obvious,
but I think may be related.  So, if your bug gets closed by this commit, but
you still have a problem, please reopen or create a new bug report.

(closes issue ASTERISK-10489)
(closes issue ASTERISK-10193)
(closes issue ASTERISK-10012)
(closes issue ASTERISK-10616)
(closes issue ASTERISK-9737)
(closes issue ASTERISK-10404)

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-11-05 14:12:57.000-0600

Repository: asterisk
Revision: 88710

_U  trunk/
U   trunk/main/channel.c

------------------------------------------------------------------------
r88710 | russell | 2007-11-05 14:12:56 -0600 (Mon, 05 Nov 2007) | 28 lines

Merged revisions 88709 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r88709 | russell | 2007-11-05 14:11:04 -0600 (Mon, 05 Nov 2007) | 20 lines

Merge the last bit of changes from asterisk/team/russell/readq-1.4

The issue here is that the channel frame readq handling got broken when the
code was converted to use the linked list macros.  It caused corruption of the
list head and tail pointers.  So, I fixed up the usage of the linked list
macros and in passing, simplified the code.  I also documented what the code
is doing, as it was a bit difficult to figure out at first.

This bug showed itself with crashes showing messed up head/tail pointers for
the readq.  However, there are a couple of crashes that aren't quite as obvious,
but I think may be related.  So, if your bug gets closed by this commit, but
you still have a problem, please reopen or create a new bug report.

(closes issue ASTERISK-10489)
(closes issue ASTERISK-10193)
(closes issue ASTERISK-10012)
(closes issue ASTERISK-10616)
(closes issue ASTERISK-9737)
(closes issue ASTERISK-10404)

........

------------------------------------------------------------------------