[Home]

Summary:ASTERISK-05044: Random Crashs on CVS HEAD
Reporter:paradise (paradise)Labels:
Date Opened:2005-09-10 09:58:22Date Closed:2005-09-12 16:19:02
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) crash.txt
( 1) crash2.txt
( 2) extensions.conf
Description:I've updated my working 2 months old code * to CVS HEAD.
now i have 10-15 crashs per day. but i couln't find out the cause.
coredump is attached.
Comments:By: Olle Johansson (oej) 2005-09-10 10:03:08

Tell us more, much more. Platform, extra modules loaded, any special config options...

By: paradise (paradise) 2005-09-10 10:10:43

no extra module or patch is applied.
i've checked also with 2005-09-05 CVS code; the same result.
no configuration changes is done from my old running code to CVS HEAD unless applying new changes in musiconhold.conf.
from the core dump it seems that it should be something in res_agi or app_dial! (i guess)

i'm using 6 fxo modules from digium cards. and all of my clients are SIP. (snom, grandstream, eyeBeam).

the box is P-III and is running FC3 with 2.6.11 kernel.



By: Olle Johansson (oej) 2005-09-10 13:37:00

Using format_mp3 ?

By: paradise (paradise) 2005-09-10 15:48:02

no, i'm using rawplayer and convert my mp3s to raw

By: Mark Spencer (markster) 2005-09-10 17:24:31

How frequent does the crash occur?  It appears to happen within a malloc call, so it's going to take some serious diagnostic to find.

By: Mark Spencer (markster) 2005-09-10 17:27:26

Nevermind about the comment about how often it occurs.

Can you post your dialplan so i can look for something suspicious?  Also, it might be helpful to have SIP debug running so we can see what happens just before the crash.  We will need to narrow down the problem.

By: paradise (paradise) 2005-09-11 00:34:28

extensions.conf and another core dump is attached.
getting sip debug while the crash is not reproducable is too hard.
is there any way to log sip debugs?

note that when i revert back to two months ago everything goes OK.



By: Mark Spencer (markster) 2005-09-12 00:09:28

Okay, in the absense of being able to make a real sip debug, I suggest doing a binary search on the date that the problem is introduced.  Start with one month ago CVS, then go to either 2 or 6 weeks depending on whether 1 month worked or didn't work, and try to isolate the day that the bug is introduced.  Then we can work from those patch sets to identify the problem.

By: C F (shmaltz) 2005-09-12 15:16:04

I also have 2 machines running FC3, whenever I try upgrading those 2 machines, I run into tons of trouble, I therefore do the following before upgrading on the FC3 machines:
mv /urs/src/asterisk /usr/src/asterisk.last
mv /usr/src/zaptel /usr/src/zaptel.last
mv /usr/src/libpri /usr/src/libpri.last

mv /usr/lib/asterisk/modules /usr/lib/asterisk/modules.last
mv /usr/include/asterisk /usr/include/asterisk.last

then I do a fresh cvs co in /usr/src

If I have trouble I just go back to the .last folders.

By: Tilghman Lesher (tilghman) 2005-09-12 16:16:04

Both of these coredumps seem to be generated with abort(3).  This suggests that the problem is in dial_exec_full(), with a free being called to a pointer that was not malloc'ed.

There seems to be a problem with calling Dial from an AGI, which makes this problem report a duplicate of ASTERISK-5037.

By: Tilghman Lesher (tilghman) 2005-09-12 16:19:02

Let's consolidate discussion with ASTERISK-5037