[Home]

Summary:ASTERISK-11368: The asterisk service crashes twice a day
Reporter:moty (moty)Labels:
Date Opened:2008-02-04 04:04:39.000-0600Date Closed:2011-06-07 14:00:40
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) backtrace.txt
( 1) bt.txt
( 2) frame.txt
( 3) gdb.txt
Description:Our production asterisk service (using safe_asterisk) is crashing twice a day in avarage. I didn't manage to attach a core dump... Someone?

****** ADDITIONAL INFORMATION ******

It's very crucial for us since we need to manually star the asterisk service because of another issue (Asterisk is out of IAX threads on startup...).
Comments:By: Joshua C. Colp (jcolp) 2008-02-04 08:26:48.000-0600

Was this built with DONT_OPTIMIZE selected under Compiler Flags in menuselect? Could you please do the following and attach the output:

frame 6
print *f

Thanks!

By: moty (moty) 2008-02-04 08:39:48.000-0600

Hi,
Attached is the output.

I am not sure about the DONT_OPTIMIZE, I think not. Anyhow, since this is a production environment, it will be very hard for me to re-compile it.

By: Joshua C. Colp (jcolp) 2008-02-04 08:44:31.000-0600

This doesn't look right at all... which might have been caused by lack of DONT_OPTIMIZE

By: moty (moty) 2008-02-04 08:51:25.000-0600

First, thanks for the rapid answers.
Second, what do you suggest?

By: Dmitry Andrianov (dimas) 2008-02-04 10:11:57.000-0600

He suggest you to recompile with DONT_OPTIMIZE :)

this option does not really slow asterisk down a lot...

By: Tilghman Lesher (tilghman) 2008-02-06 12:39:34.000-0600

Also, you should be trying SVN 1.4, as we've fixed a fairly major memory corruption issue after the release of 1.4.17 (fix will be in 1.4.18), and from the looks of your backtrace, you've got memory corruption.

By: moty (moty) 2008-02-07 04:13:15.000-0600

Hi,

Thanks all for your replies.
I will hopefully compile asterisk with DONT_OPTIMIZE flag for better bt.

Anyhow, I will wait for the final .18 release, since again, it's a production environment.

Any other suggestions will be very appreciated.

By: Norman Franke (norman) 2008-02-08 19:48:20.000-0600

This looks identical [Thread 325 (process 6235)] to a crash I keep getting, except I'm running 1.4.18-rc4 which doesn't seem to be different from 1.4.18 release. In my case, I can reverse engineer that the thread that was corrupted was dialing an extension from a client, but failed



By: Chris Miller (scratchspace) 2008-02-09 13:02:40.000-0600

We're experiencing similar behavior with 1.4.17 on RHEL 5 kernel 2.6.18-53.1.4.el5. The system has Sangoma A101d and A200 installed with wanpipe 3.2.1 and Zaptel 1.4.7. The problems we've witnessed seem consistent with or related to 11712, 11818, 11862, and 11915. Originally the system would just hang and we would have to kill -9 the process. We did see failed locks building up to this event. Upon analyzing a couple of core dumps, it appeared this was most likely the memory corruption issue. None the less, we applied patched 11818 and 11862 to 1.4.18 final. Within 12 hours the system crashed while one local SIP extension called another. Attached is a backtrace.

By: Norman Franke (norman) 2008-02-12 18:11:25.000-0600

You may want to try the patch in ASTERISK-1189960 since I'm testing to help with my similar crashes.

By: moty (moty) 2008-02-14 02:54:59.000-0600

Hi,

I've attached another back trace after compiling the asterisk with DONT_OPTIMIZE flag. Please take a look and let me know if it helps to resolve this issue.

Moty

By: Norman Franke (norman) 2008-02-14 10:53:41.000-0600

Can you also upload the console log? Did it crash or just hang? Thread 364 seems very similar to the other issues, I think.

To track these down, I often run them under gdb. For example, "gdb /usr/sbin/asterisk" then "run -c -vvvg" When something happens, I can control-C to enter the debugger, then "generate-core-file" then re-run. I can then analyze the core offline with minimal downtime. Unfortunately, when I did that for a similar freeze up, I couldn't really tell anything. In my case, I suspected a new SIP call was being initiated from a client workstation, and while trying to add the channel to the channel list, it froze (since it wasn't actually in the list yet.)

By: moty (moty) 2008-03-02 02:42:38.000-0600

Hi Guys,

Any news?

Thanks,
Moty

By: Jason Parker (jparker) 2008-04-02 13:00:54

moty: norman asked several questions and gave some very good advice on 02/14.  Please upgrade to Asterisk 1.4.19, answer his questions, follow his instructions, and report back here on whether this is still an issue.

By: Russell Bryant (russell) 2008-04-22 13:43:18

Suspended.  Feel free to reopen after an upgrade with up to date information about what is happening.