|Summary:||ASTERISK-20792: Segfault during calloc, core dump shows logging string at requested pointer address|
|Reporter:||Emiel Suilen (esuilen)||Labels:|
|Date Opened:||2012-12-13 09:00:44.000-0600||Date Closed:||2013-03-07 10:16:59.000-0600|
|Environment:||CentOS 6.3 (Final) Kernel 2.6.32-279.9.1el6.x86_64 4 Gb memory, single Intel Xeon E6520 Asterisk 188.8.131.52||Attachments:||( 0) bt|
( 1) bt_full
( 2) edited_full
( 3) edited_full_short
( 4) p_addr
|Description:||In an environment with many calls (>4k calls/24 hours) and full logging turned on, our customer experiences occasional crashes. A backtrace of the core dump shows this happens during channel creation, and that the pointer used for the channel is overwriting a string used by the logger.
Attached are the backtrace, full backtrace, and an examination of the relevant frame in GDB, which shows that the allocated pointer already holds information, that starts several blocks earlier.
A full core dump cannot be provided, due to the size. The core dump originated from 184.108.40.206, but the same core dumps were also found in higher versions. Unfortunately, these were compiled without debug info. We are unable to reproduce this for other customers, or on single user machines.
|Comments:||By: Emiel Suilen (esuilen) 2012-12-13 09:01:48.504-0600|
By: Emiel Suilen (esuilen) 2012-12-13 09:02:11.798-0600
By: Emiel Suilen (esuilen) 2012-12-13 09:05:21.322-0600
GDB commands in 
By: Rusty Newton (rnewton) 2012-12-14 17:05:28.898-0600
What was the highest version of Asterisk you reproduced this crash in?
Can you post an excerpt of the Asterisk full log with DEBUG, VERBOSE enabled at level 5, right before and up to the time of the crash?
By: Emiel Suilen (esuilen) 2012-12-17 05:09:22.928-0600
2 minutes up to the crash. Anonymized users and numbers.
By: Emiel Suilen (esuilen) 2012-12-17 05:11:03.088-0600
Short version, up to 200 lines before the crash.
By: Emiel Suilen (esuilen) 2012-12-17 05:14:01.321-0600
I attached the full log (verbose level 10) for 2 minutes up to the crash, and a shortened version of it. Both were anonymized. The line which we see more frequently around crashes are the following two:
[Nov 28 11:56:57] VERBOSE res_musiconhold.c: [Nov 28 11:56:57] -- Stopped music on hold on SIP/192.168.15.11-00002e2f
[Nov 28 11:56:57] VERBOSE pbx.c: [Nov 28 11:56:57] == Spawn extension (default, 1001, 90) exited non-zero on 'SIP/991operator1-00002e75<ZOMBIE>'
We also saw this for version 220.127.116.11
By: Matt Jordan (mjordan) 2012-12-17 09:21:22.384-0600
Your backtrace appears to contain memory corruption and we require valgrind output in order to move this issue forward. Please see https://wiki.asterisk.org/wiki/display/AST/Valgrind for more information about how to produce debugging information. Thanks!
The other option would be to reproduce it using 18.104.22.168-rc1 with the MALLOC_DEBUG build option enabled. Some major enhancements were put into Asterisk (starting in that release) that help to hunt down these kinds of issues. Note that we would need the mmlog file created when the MALLOC_DEBUG option is enabled.
By: Emiel Suilen (esuilen) 2012-12-17 10:04:10.546-0600
Matt, you want the valgrind output of version 22.214.171.124 or 126.96.36.199? Or another version?
Note that we will do this with inhouse testing, on the same physical system, as we cannot ask the customer to operate while asterisk is running under valgrind.
started asterisk188.8.131.52 on valgrind.
By: Rusty Newton (rnewton) 2013-01-31 13:27:39.320-0600
We would always prefer the problem to be reproduced in the most recent version possible. Setting this again in Waiting on Feedback until valgrind output is available. Thanks!
By: Matt Jordan (mjordan) 2013-03-07 10:16:53.645-0600
Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested. Further information can be found at http://www.asterisk.org/developers/bug-guidelines