[Home]

Summary:ASTERISK-11083: crashes with "*** glibc detected *** corrupted double-linked list" with connections to asterisk manager
Reporter:Roel van Meer (rolek)Labels:
Date Opened:2007-12-19 10:34:38.000-0600Date Closed:2007-12-26 14:46:11.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/ManagerInterface
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20071221__bug11601.diff.txt
( 1) gdb-output-1.txt
( 2) gdb-output-2.txt
( 3) valgrind5_11039.txt
Description:I'm running asterisk 1.4.x with a IAX uplink and some SIP phones. Essentially, this works fine. However, when making connections to the asterisk manager interface [from a php application, in a certain timed fashion], asterisk crashes while showing a notice like:
*** glibc detected *** corrupted double-linked list: 0x081cb7e0 ***
This happens reproducibly on three different boxes, all with different hardware.

****** ADDITIONAL INFORMATION ******

OS: Slackware 10.1
Kernel: 2.6.21.5
Affected asterisk versions: 1.4.13, 1.4.15, 1.4.16
glibc version: 2.3.4
Comments:By: Russell Bryant (russell) 2007-12-19 10:38:48.000-0600

That's interesting that you have correlated the crash to the use of the manager interface.  The crash is actually occurring in chan_iax2.  Have you verified that this is not happening to you when you do not use the manager interface?

If it only happens when using the manager, can you give some details about what your manager connection is executing?

By: Roel van Meer (rolek) 2007-12-19 10:46:46.000-0600

The php code that is making the connection does two relatively simple things (example in pseudo code):

open socket
write to socket "Action: login"
read output
write to socket: "show channels"
read output
write to socket: "Action: Logoff"
read output
close socket

The second command sequence has a similar structure but runs the 'database show' command.
The thing that really bothers me is that if we run this every second or faster, the problem does not appear. However, if we run it every 5 seconds, the problems occur and asterisk crashes, often within 10 minutes from starting.

By: Russell Bryant (russell) 2007-12-19 10:47:32.000-0600

Also, what version is this backtrace from, and do you have any patches applied?  The line numbers don't exactly match up, but it may be because I'm looking at a different version.  Also, make sure you compiled with DONT_OPTIMIZE turned on.

By: Roel van Meer (rolek) 2007-12-19 10:50:54.000-0600

I haven't seen any crashes before we started using the manager interface, in over four weeks of development on three machines. The thing that is really bothering me is that actually stressing the server prevents creahes as well. In other words, when I run this script in a 'while [ 1 ]' loop from bash, then there are no crashes:

#!/bin/bash

function text2() {
cat <<- EOF
Action: Login
UserName: webtool
Secret: demo

EOF
}
function text3() {
cat <<- EOF
Action: Command
Command: show channels

EOF
}
function text4() {
cat <<- EOF
Action: Logoff

EOF
}
(text2 ; text3 ; text4 ; sleep 0.1) | telnet 127.0.0.1 5038

By: Roel van Meer (rolek) 2007-12-19 10:55:24.000-0600

This is 1.4.16, but, while I thought I compiled this particular version with DONT_OPTIMIZE, it seems it fell off somewhere during my tests. Apologies for that. I'll post a new backtrace soon.

By: Tilghman Lesher (tilghman) 2007-12-19 11:12:56.000-0600

Please follow the instructions in doc/valgrind.txt

By: Roel van Meer (rolek) 2007-12-19 11:52:53.000-0600

gdb-output-2.txt has a backtrace of 1.4.16 with DONT_OPTIMIZE enabled. I'll try to tackle valgrind next.

By: Russell Bryant (russell) 2007-12-19 12:04:53.000-0600

Yeah, hopefully valgrind can provide some insight.  This backtrace is different than the last, and points to the type of problem that valgrind is very helpful in tracking down ...

By: Roel van Meer (rolek) 2007-12-20 10:15:44.000-0600

It seems that compiling with both DONT_OPTIMIZE and MALLOC_DEBUG somehow causes the problem to disappear. With that version, I've seen no crashes in 10 hours. When reverting to the same version compiled with DONT_OPTIMIZE but without MALLOC_DEBUG I get a crash within 10 minutes again.

Is it useful to do a valgrind trace of the version without MALLOC_DEBUG?

By: Tilghman Lesher (tilghman) 2007-12-20 12:36:21.000-0600

Yes, it would be useful.

By: Roel van Meer (rolek) 2007-12-21 03:36:46.000-0600

I'm very sorry, but I can't reproduce the crashes when running under valgrind. [Note to self: apparently this can happen, given the fact that it's a valgrind faq entry (4.4)]

I'm experiencing crashes when asterisk is started as:
/bin/su asterisk -c "/usr/sbin/asterisk -vvvfg"
/bin/su - asterisk ; /usr/sbin/asterisk -vvvfg
/usr/sbin/asterisk -vvvfg
or when started from safe_asterisk as root or as user asterisk.

I'm seeing no crashes when started as:
valgrind /usr/sbin/asterisk -vvvfg
/bin/su - asterisk ; valgrind /usr/sbin/asterisk -vvvfg
/bin/su asterisk -c "valgrind --log-file=/var/lib/asterisk/valgrind.log /usr/sbin/asterisk -vvvfg 2>/var/lib/asterisk/malloc_debug.log"

I'm not an experienced valgrind user. Is it useful to have a memory leak summary or valgrind log file output after running valgrind for some time but without the crash? Is there any other output that is helpful for you?

Sorry for the trouble and thanks for all the help so far.



By: Tilghman Lesher (tilghman) 2007-12-21 07:53:19.000-0600

Yes, the valgrind output is highly useful, even if valgrind prevents a crash, as the output tells us where a crash might have occurred.

By: Roel van Meer (rolek) 2007-12-21 08:00:29.000-0600

I've just added the valgrind output of the following command:

valgrind --leak-check=full --log-file=/var/lib/asterisk/valgrind5_%p.txt /usr/sbin/asterisk -vvvfg

This is asterisk 1.4.16 with DONT_OPTIMIZE but without MALLOC_DEBUG. If you need any other output or other compile options, please let me know.

By: Tilghman Lesher (tilghman) 2007-12-21 08:31:00.000-0600

Okay, the only real problem in here looks like a glibc bug, but please try this patch anyway.

By: Roel van Meer (rolek) 2007-12-21 09:07:55.000-0600

Preliminary results: this seems to fix it..
I'll keep you posted.

By: Roel van Meer (rolek) 2007-12-24 04:19:53.000-0600

After running almost three days on my test servers, without crashing, I'm now going to try it on the production server. I think you fixed it. Thanks a lot!
Happy Christmas, if you happen to celebrate that.

By: Tilghman Lesher (tilghman) 2007-12-24 09:24:36.000-0600

Please report this glibc issue upstream to the Slackware project.  This should be fixed by them (or further upstream from them, but that is the correct route).

Source: http://www.gnu.org/software/libc/bugs.html



By: Digium Subversion (svnbot) 2007-12-26 14:40:19.000-0600

Repository: asterisk
Revision: 94808

U   branches/1.4/main/manager.c

------------------------------------------------------------------------
r94808 | tilghman | 2007-12-26 14:40:18 -0600 (Wed, 26 Dec 2007) | 6 lines

Workaround for what is probably a glibc bug (but we'll see this crop up again
and again, if we don't add the workaround).
Reported by: rolek
Patch by: tilghman
(Closes issue ASTERISK-11083, closes issue ASTERISK-10937)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=94808

By: Digium Subversion (svnbot) 2007-12-26 14:46:11.000-0600

Repository: asterisk
Revision: 94809

_U  trunk/
U   trunk/main/manager.c

------------------------------------------------------------------------
r94809 | tilghman | 2007-12-26 14:46:10 -0600 (Wed, 26 Dec 2007) | 14 lines

Merged revisions 94808 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r94808 | tilghman | 2007-12-26 14:43:38 -0600 (Wed, 26 Dec 2007) | 6 lines

Workaround for what is probably a glibc bug (but we'll see this crop up again
and again, if we don't add the workaround).
Reported by: rolek
Patch by: tilghman
(Closes issue ASTERISK-11083, closes issue ASTERISK-10937)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=94809