Summary: | ASTERISK-11083: crashes with "*** glibc detected *** corrupted double-linked list" with connections to asterisk manager | ||
Reporter: | Roel van Meer (rolek) | Labels: | |
Date Opened: | 2007-12-19 10:34:38.000-0600 | Date Closed: | 2007-12-26 14:46:11.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/ManagerInterface |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 20071221__bug11601.diff.txt ( 1) gdb-output-1.txt ( 2) gdb-output-2.txt ( 3) valgrind5_11039.txt | |
Description: | I'm running asterisk 1.4.x with a IAX uplink and some SIP phones. Essentially, this works fine. However, when making connections to the asterisk manager interface [from a php application, in a certain timed fashion], asterisk crashes while showing a notice like: *** glibc detected *** corrupted double-linked list: 0x081cb7e0 *** This happens reproducibly on three different boxes, all with different hardware. ****** ADDITIONAL INFORMATION ****** OS: Slackware 10.1 Kernel: 2.6.21.5 Affected asterisk versions: 1.4.13, 1.4.15, 1.4.16 glibc version: 2.3.4 | ||
Comments: | By: Russell Bryant (russell) 2007-12-19 10:38:48.000-0600 That's interesting that you have correlated the crash to the use of the manager interface. The crash is actually occurring in chan_iax2. Have you verified that this is not happening to you when you do not use the manager interface? If it only happens when using the manager, can you give some details about what your manager connection is executing? By: Roel van Meer (rolek) 2007-12-19 10:46:46.000-0600 The php code that is making the connection does two relatively simple things (example in pseudo code): open socket write to socket "Action: login" read output write to socket: "show channels" read output write to socket: "Action: Logoff" read output close socket The second command sequence has a similar structure but runs the 'database show' command. The thing that really bothers me is that if we run this every second or faster, the problem does not appear. However, if we run it every 5 seconds, the problems occur and asterisk crashes, often within 10 minutes from starting. By: Russell Bryant (russell) 2007-12-19 10:47:32.000-0600 Also, what version is this backtrace from, and do you have any patches applied? The line numbers don't exactly match up, but it may be because I'm looking at a different version. Also, make sure you compiled with DONT_OPTIMIZE turned on. By: Roel van Meer (rolek) 2007-12-19 10:50:54.000-0600 I haven't seen any crashes before we started using the manager interface, in over four weeks of development on three machines. The thing that is really bothering me is that actually stressing the server prevents creahes as well. In other words, when I run this script in a 'while [ 1 ]' loop from bash, then there are no crashes: #!/bin/bash function text2() { cat <<- EOF Action: Login UserName: webtool Secret: demo EOF } function text3() { cat <<- EOF Action: Command Command: show channels EOF } function text4() { cat <<- EOF Action: Logoff EOF } (text2 ; text3 ; text4 ; sleep 0.1) | telnet 127.0.0.1 5038 By: Roel van Meer (rolek) 2007-12-19 10:55:24.000-0600 This is 1.4.16, but, while I thought I compiled this particular version with DONT_OPTIMIZE, it seems it fell off somewhere during my tests. Apologies for that. I'll post a new backtrace soon. By: Tilghman Lesher (tilghman) 2007-12-19 11:12:56.000-0600 Please follow the instructions in doc/valgrind.txt By: Roel van Meer (rolek) 2007-12-19 11:52:53.000-0600 gdb-output-2.txt has a backtrace of 1.4.16 with DONT_OPTIMIZE enabled. I'll try to tackle valgrind next. By: Russell Bryant (russell) 2007-12-19 12:04:53.000-0600 Yeah, hopefully valgrind can provide some insight. This backtrace is different than the last, and points to the type of problem that valgrind is very helpful in tracking down ... By: Roel van Meer (rolek) 2007-12-20 10:15:44.000-0600 It seems that compiling with both DONT_OPTIMIZE and MALLOC_DEBUG somehow causes the problem to disappear. With that version, I've seen no crashes in 10 hours. When reverting to the same version compiled with DONT_OPTIMIZE but without MALLOC_DEBUG I get a crash within 10 minutes again. Is it useful to do a valgrind trace of the version without MALLOC_DEBUG? By: Tilghman Lesher (tilghman) 2007-12-20 12:36:21.000-0600 Yes, it would be useful. By: Roel van Meer (rolek) 2007-12-21 03:36:46.000-0600 I'm very sorry, but I can't reproduce the crashes when running under valgrind. [Note to self: apparently this can happen, given the fact that it's a valgrind faq entry (4.4)] I'm experiencing crashes when asterisk is started as: /bin/su asterisk -c "/usr/sbin/asterisk -vvvfg" /bin/su - asterisk ; /usr/sbin/asterisk -vvvfg /usr/sbin/asterisk -vvvfg or when started from safe_asterisk as root or as user asterisk. I'm seeing no crashes when started as: valgrind /usr/sbin/asterisk -vvvfg /bin/su - asterisk ; valgrind /usr/sbin/asterisk -vvvfg /bin/su asterisk -c "valgrind --log-file=/var/lib/asterisk/valgrind.log /usr/sbin/asterisk -vvvfg 2>/var/lib/asterisk/malloc_debug.log" I'm not an experienced valgrind user. Is it useful to have a memory leak summary or valgrind log file output after running valgrind for some time but without the crash? Is there any other output that is helpful for you? Sorry for the trouble and thanks for all the help so far. By: Tilghman Lesher (tilghman) 2007-12-21 07:53:19.000-0600 Yes, the valgrind output is highly useful, even if valgrind prevents a crash, as the output tells us where a crash might have occurred. By: Roel van Meer (rolek) 2007-12-21 08:00:29.000-0600 I've just added the valgrind output of the following command: valgrind --leak-check=full --log-file=/var/lib/asterisk/valgrind5_%p.txt /usr/sbin/asterisk -vvvfg This is asterisk 1.4.16 with DONT_OPTIMIZE but without MALLOC_DEBUG. If you need any other output or other compile options, please let me know. By: Tilghman Lesher (tilghman) 2007-12-21 08:31:00.000-0600 Okay, the only real problem in here looks like a glibc bug, but please try this patch anyway. By: Roel van Meer (rolek) 2007-12-21 09:07:55.000-0600 Preliminary results: this seems to fix it.. I'll keep you posted. By: Roel van Meer (rolek) 2007-12-24 04:19:53.000-0600 After running almost three days on my test servers, without crashing, I'm now going to try it on the production server. I think you fixed it. Thanks a lot! Happy Christmas, if you happen to celebrate that. By: Tilghman Lesher (tilghman) 2007-12-24 09:24:36.000-0600 Please report this glibc issue upstream to the Slackware project. This should be fixed by them (or further upstream from them, but that is the correct route). Source: http://www.gnu.org/software/libc/bugs.html By: Digium Subversion (svnbot) 2007-12-26 14:40:19.000-0600 Repository: asterisk Revision: 94808 U branches/1.4/main/manager.c ------------------------------------------------------------------------ r94808 | tilghman | 2007-12-26 14:40:18 -0600 (Wed, 26 Dec 2007) | 6 lines Workaround for what is probably a glibc bug (but we'll see this crop up again and again, if we don't add the workaround). Reported by: rolek Patch by: tilghman (Closes issue ASTERISK-11083, closes issue ASTERISK-10937) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=94808 By: Digium Subversion (svnbot) 2007-12-26 14:46:11.000-0600 Repository: asterisk Revision: 94809 _U trunk/ U trunk/main/manager.c ------------------------------------------------------------------------ r94809 | tilghman | 2007-12-26 14:46:10 -0600 (Wed, 26 Dec 2007) | 14 lines Merged revisions 94808 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r94808 | tilghman | 2007-12-26 14:43:38 -0600 (Wed, 26 Dec 2007) | 6 lines Workaround for what is probably a glibc bug (but we'll see this crop up again and again, if we don't add the workaround). Reported by: rolek Patch by: tilghman (Closes issue ASTERISK-11083, closes issue ASTERISK-10937) ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=94809 |