Summary:ASTERISK-20792: Segfault during calloc, core dump shows logging string at requested pointer address
Reporter:Emiel Suilen (esuilen)Labels:
Date Opened:2012-12-13 09:00:44.000-0600Date Closed:2013-03-07 10:16:59.000-0600
Versions:Frequency of
Environment:CentOS 6.3 (Final) Kernel 2.6.32-279.9.1el6.x86_64 4 Gb memory, single Intel Xeon E6520 Asterisk 0) bt
( 1) bt_full
( 2) edited_full
( 3) edited_full_short
( 4) p_addr
Description:In an environment with many calls (>4k calls/24 hours)  and full logging turned on, our customer experiences occasional crashes. A backtrace of the core dump shows this happens during channel creation, and that the pointer used for the channel is overwriting a string used by the logger.

Attached are the backtrace, full backtrace, and an examination of the relevant frame in GDB, which shows that the allocated pointer already holds information, that starts several blocks earlier.

A full core dump cannot be provided, due to the size. The core dump originated from, but the same core dumps were also found in higher versions. Unfortunately, these were compiled without debug info. We are unable to reproduce this for other customers, or on single user machines.
Comments:By: Emiel Suilen (esuilen) 2012-12-13 09:01:48.504-0600


By: Emiel Suilen (esuilen) 2012-12-13 09:02:11.798-0600

backtrace full

By: Emiel Suilen (esuilen) 2012-12-13 09:05:21.322-0600

GDB commands in []

By: Rusty Newton (rnewton) 2012-12-14 17:05:28.898-0600

What was the highest version of Asterisk you reproduced this crash in?

Can you post an excerpt of the Asterisk full log with DEBUG, VERBOSE enabled at level 5, right before and up to the time of the crash?

By: Emiel Suilen (esuilen) 2012-12-17 05:09:22.928-0600

2 minutes up to the crash. Anonymized users and numbers.

By: Emiel Suilen (esuilen) 2012-12-17 05:11:03.088-0600

Short version, up to 200 lines before the crash.

By: Emiel Suilen (esuilen) 2012-12-17 05:14:01.321-0600

I attached the full log (verbose level 10) for 2 minutes up to the crash, and a shortened version of it. Both were anonymized. The line which we see more frequently around crashes are the following two:
[Nov 28 11:56:57] VERBOSE[2463] res_musiconhold.c: [Nov 28 11:56:57]     -- Stopped music on hold on SIP/
[Nov 28 11:56:57] VERBOSE[399] pbx.c: [Nov 28 11:56:57]   == Spawn extension (default, 1001, 90) exited non-zero on 'SIP/991operator1-00002e75<ZOMBIE>'

We also saw this for version

By: Matt Jordan (mjordan) 2012-12-17 09:21:22.384-0600

Your backtrace appears to contain memory corruption and we require valgrind output in order to move this issue forward. Please see https://wiki.asterisk.org/wiki/display/AST/Valgrind for more information about how to produce debugging information. Thanks!

The other option would be to reproduce it using with the MALLOC_DEBUG build option enabled. Some major enhancements were put into Asterisk (starting in that release) that help to hunt down these kinds of issues. Note that we would need the mmlog file created when the MALLOC_DEBUG option is enabled.

By: Emiel Suilen (esuilen) 2012-12-17 10:04:10.546-0600

Matt, you want the valgrind output of version or Or another version?

Note that we will do this with inhouse testing, on the same physical system, as we cannot ask the customer to operate while asterisk is running under valgrind.

[edit]started asterisk1.8.17.0 on valgrind.

By: Rusty Newton (rnewton) 2013-01-31 13:27:39.320-0600

We would always prefer the problem to be reproduced in the most recent version possible. Setting this again in Waiting on Feedback until valgrind output is available. Thanks!

By: Matt Jordan (mjordan) 2013-03-07 10:16:53.645-0600

Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested.  Further information can be found at http://www.asterisk.org/developers/bug-guidelines