Summary:ASTERISK-20474: Segfault Asterisk 1.8.15 on vmware
Reporter:Alex Cremonezi (alexcremonezi)Labels:
Date Opened:2012-09-25 00:36:10Date Closed:2013-01-04 10:22:49.000-0600
Versions: Frequency of
Environment:IBM System X 3500 Four-core Intel ® Xeon E5620 2.40 GHz with 12 MB of cache per processor socket standard with 12 MB of cache per processor socket 12 GB RAM 450 HD SATA - RAID1 OS Guest: VMware ESXi 4.1 Server - IBM Build Official ----- Performs just one VM: VM: CentOS 6.3 Storage: 50 GB Mem: 12 GB RAM Linux centralip-dev 2.6.32-279.5.2.el6.x86_64 #1 SMP Fri Aug 24 01:07:11 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Asterisk: --- Attachments:( 0) backtrace-2012-09-10.txt
( 1) backtrace-2012-09-18.txt
( 2) backtrace-2012-09-24.txt
( 3) backtrace-2012-10-04.txt
Description:Every two weeks a segfault happens in my asterisk. I can not reproduce this segfault because I do not know the exact reason, but always occurs in the middle of the afternoon, about every two weeks.

About 500 extensions. About 40 simultaneous connections.

Segfault occurred 3 times this month (September, 2012):
September 10, 2012 at 15h16
September 18, 2012 at 15h58
September 24, 2012 at 15h12

The three backtraces are attached.

When I performs the command " grep segfault /var/log/messages* ", results:

../messages:Sep 24 15:12:10 localhost kernel: asterisk[7285]: segfault at 7f950000001a ip 00007f9625e80599 sp 00007f95d8106af0 error 4 in libc-2.12.so[7f9625e08000+186000]
../messages-20120902:Aug 31 15:37:55 localhost kernel: asterisk[32114]: segfault at 7f6600000010 ip 00007f67258cc4de sp 00007f668b7bc560 error 6 in libc-2.12.so[7f6725854000+186000]
../messages-20120916:Sep 10 15:16:53 localhost kernel: asterisk[6384]: segfault at 7f5d0000001a ip 00007f5e57eb4599 sp 00007f5e09682bc0 error 4 in libc-2.12.so[7f5e57e3c000+186000]
../messages-20120923:Sep 18 15:58:11 localhost kernel: asterisk[8093]: segfault at 12 ip 00007f4a84176489 sp 00007f4a31587b70 error 6 in libc-2.12.so[7f4a840fe000+186000]

Comments:By: Alex Cremonezi (alexcremonezi) 2012-09-25 00:37:41.812-0500

Segfault occurred, month September, 2012.

By: Alex Cremonezi (alexcremonezi) 2012-09-25 00:48:02.067-0500

Segfault occurred, month September, 2012.

By: Alex Cremonezi (alexcremonezi) 2012-09-25 01:22:53.862-0500

The three backtraces are attached. (September 2012)

By: Matt Jordan (mjordan) 2012-09-25 08:23:26.539-0500

Your backtrace appears to contain memory corruption and we require valgrind output in order to move this issue forward. Please see https://wiki.asterisk.org/wiki/display/AST/Valgrind for more information about how to produce debugging information. Thanks!

By: Alex Cremonezi (alexcremonezi) 2012-10-04 22:03:44.921-0500

This is happening on a production environment. I think the customer would be unhappy to run it under valgrind unless we were very certain it wouldn't cause any issues noticeable by their customers. We could upgrade a test system to this version of Asterisk, but since we don't know how to reproduce the problem it may not happen.
Can you advise if a valgrind trace is absolutely required, and if there's any risk associated with it?

By: Alex Cremonezi (alexcremonezi) 2012-10-04 22:31:27.040-0500

I have witnessed deadlock once every 3 days in system at random times, even in times of low demand, such as at the beginning of the workday.

We have about 30 Gateways AudioCodes MP124 using SIP PEERS. The situation is very critical and daunting.

I attached a new backtrace occurred on October 4, 2012, yesterday.

By: Alex Cremonezi (alexcremonezi) 2012-10-04 22:32:08.370-0500

new back trace

By: Matt Jordan (mjordan) 2012-10-05 08:17:31.880-0500

While I appreciate the numerous backtraces, its clear that this is occurring due to a memory corruption (seg faults while allocating memory).  The backtraces will not provide any additional information, since the seg faults are occurring in multiple locations.

If you are unable to run this under valgrind, you may want to at least try compiling Asterisk with MALLOC_DEBUG and seeing if it reports any memory fence violations.  That would be the best chance - outside of valgrind - of actually finding this problem.

Without that information, I don't think there's going to be anything anyone can do to help you.

By: Rusty Newton (rnewton) 2012-11-27 17:04:44.476-0600

Alex, resetting the "Waiting for feedback" status on this to give you more time to try Matt's suggestion above, or to see if you can get valgrind output. After another 14 days we'll close this out for lack of activity. It can always be re-opened later on if you get the requested debug.