Summary: | ASTERISK-10699: Crash related to ast_module_user_remove | ||
Reporter: | laszlovl (lvl) | Labels: | |
Date Opened: | 2007-11-06 15:04:47.000-0600 | Date Closed: | 2007-11-09 10:34:34.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 20071107__bug11174.diff.txt ( 1) bt.txt ( 2) bt-full.txt ( 3) bt-threads.txt ( 4) console-output.txt ( 5) new_bt.txt ( 6) valgrind.txt | |
Description: | Asterisk randomly crashes after this sequence of events: == Manager 'call' logged on from 127.0.0.1 -- Executing [s@callee:3] Hangup("Local/s@callee-c93d,2", "") in new stack == Spawn extension (callee, s, 3) exited non-zero on 'Local/s@callee-c93d,2' *** glibc detected *** asterisk: free(): invalid pointer: 0x081b8f79 *** ======= Backtrace: ========= /lib/i686/cmov/libc.so.6[0xb7de9cf5] /lib/i686/cmov/libc.so.6(cfree+0x90)[0b7ded790] asterisk(__ast_module_user_remove+0x11c)[0x80b7cf0] /usr/lib/asterisk/modules/chan_local.so[0xb6f247df] asterisk(ast_hangup+0x372)[0x8085854] asterisk[0x80cf76f] asterisk[0x80cf9bf] asterisk[0x8110eee] /lib/i686/cmov/libpthread.so.0[0xb7f3646b] /lib/i686/cmov/libc.so.6(clone+0x5e)[0xb7e534de] ****** ADDITIONAL INFORMATION ****** Our asterisk is experiencing semi-random crashes which seem to be related to ast_module_user_remove. They are easily reproducable during our stresstests (which consist out of a new call being thrown into a queue every few seconds, aswell as agents logging in or out of the queue every few seconds, through the asterisk manager interface), although we've seen them occur during regular usage aswell. The crashing always seems to occur after the aforementioned sequence of events, though the same sequence can occur many times in a row without any problems aswell. Stresstesting with a 2-second interval makes asterisk crash after around a minute. The higher the interval, the longer it takes to crash. Attached are the full console output from the crash, and backtraces from the coredump. A few people on IRC have suggested to enable DONT_OPTIMIZE first, though I've doublechecked that asterisk is indeed compiled with that flag. The problem appears identically on the 1.4 tarball, SVN trunk and SVN branch. Running Debian, 2.6.18. | ||
Comments: | By: Tilghman Lesher (tilghman) 2007-11-06 18:49:42.000-0600 Actually, this would appear to be memory corruption. Please follow the instructions in doc/valgrind.txt to get us the necessary debugging output. By: laszlovl (lvl) 2007-11-06 20:44:25.000-0600 I haven't been able to reproduce the crash yet while asterisk is running in valgrind - probably because it appears to slow everything down quite a bit (our testing box isn't very fast to begin with). Recompiling with MALLOC_DEBUG did yield some very interesting information though, the key points being: == Spawn extension (callee, s, 3) exited non-zero on 'Local/s@callee-4c57,2' WARNING: Freeing unused memory at 0x824bde9, in __ast_module_user_remove of loader.c, line 204 == Spawn extension (test2, s, 2) exited non-zero on 'Local/s@callee-4c57,1' Segmentation fault (core dumped) and Program terminated with signal 11, Segmentation fault. #0 0x080bb646 in __ast_module_user_remove (mod=0x82236b8, u=0x824a9e0) at loader.c:201 201 AST_LIST_REMOVE(&mod->users, u, entry); I'm attaching the complete backtrace as new_bt.txt. By: Tilghman Lesher (tilghman) 2007-11-07 15:04:16.000-0600 Again, for these types of problems, the backtrace is unlikely to show us the problem. It is the messages that valgrind and the MALLOC_DEBUG options show us that will be helpful. Specifically, the problem is that by the time Asterisk crashes with a memory corruption error, the memory corruption happened well before the crash. The crash is just the result of finding corrupt memory. By: laszlovl (lvl) 2007-11-07 16:31:03.000-0600 Ok. I haven't been able to reproduce the crash while asterisk is running within valgrind - but here's a logfile from a couple of minutes of heavy stresstesting. The malloc_debug.txt contained the same line repeated around 900 times: "XXX ERROR XXX A thread holds more locks than '16'. Increase AST_MAX_LOCKS!" By: Digium Subversion (svnbot) 2007-11-07 17:37:40.000-0600 Repository: asterisk Revision: 89093 U branches/1.4/apps/app_queue.c ------------------------------------------------------------------------ r89093 | tilghman | 2007-11-07 17:37:39 -0600 (Wed, 07 Nov 2007) | 7 lines The member refcount must be incremented, to avoid using it after deallocation. A huge thanks go to lvl- for patiently providing the necessary valgrind output that was necessary to finding this problem of memory corruption. Reported by: lvl- Patch by: tilghman Closes issue ASTERISK-10699 ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-11-07 17:45:49.000-0600 Repository: asterisk Revision: 89094 _U trunk/ U trunk/apps/app_queue.c ------------------------------------------------------------------------ r89094 | tilghman | 2007-11-07 17:45:49 -0600 (Wed, 07 Nov 2007) | 15 lines Merged revisions 89093 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r89093 | tilghman | 2007-11-07 17:39:37 -0600 (Wed, 07 Nov 2007) | 7 lines The member refcount must be incremented, to avoid using it after deallocation. A huge thanks go to lvl- for patiently providing the necessary valgrind output that was necessary to finding this problem of memory corruption. Reported by: lvl- Patch by: tilghman Closes issue ASTERISK-10699 ........ ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-11-09 10:34:34.000-0600 Repository: asterisk Revision: 89131 _U team/file/t38fun/ U team/file/t38fun/apps/app_queue.c U team/file/t38fun/apps/app_voicemail.c U team/file/t38fun/cdr/cdr_tds.c U team/file/t38fun/channels/chan_sip.c U team/file/t38fun/codecs/codec_zap.c U team/file/t38fun/configs/extensions.ael.sample U team/file/t38fun/configs/res_odbc.conf.sample U team/file/t38fun/doc/valgrind.txt U team/file/t38fun/include/asterisk/lock.h U team/file/t38fun/main/config.c U team/file/t38fun/main/manager.c U team/file/t38fun/main/say.c U team/file/t38fun/main/srv.c U team/file/t38fun/main/tdd.c U team/file/t38fun/pbx/pbx_ael.c U team/file/t38fun/res/res_jabber.c U team/file/t38fun/res/res_musiconhold.c ------------------------------------------------------------------------ r89131 | file | 2007-11-09 10:34:31 -0600 (Fri, 09 Nov 2007) | 168 lines Merged revisions 89032,89036-89037,89042,89045-89046,89053,89079,89085,89088,89090,89093,89095,89097,89099,89101,89103,89105,89111,89115,89119,89125 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r89032 | file | 2007-11-06 13:08:05 -0400 (Tue, 06 Nov 2007) | 4 lines Make it so that if a peer is determined to be unreachable using qualify their devicestate will report back unavailable. (closes issue ASTERISK-10551) Reported by: pj ........ r89036 | murf | 2007-11-06 13:52:50 -0400 (Tue, 06 Nov 2007) | 1 line closes issue ASTERISK-8547 - where the [catname](!) and [catname](othercat1,othercat2,...) notation gets dropped across a ConfigUpdate (or any other thing that would cause a config file to be written). While I was at it, I also cleaned up some of the destroy routines to free up comments, which was not being done. Made sure the new struct I introduced is also cleaned up properly at destruction time. My code handles multiple template inclusions. Many thanks to ssokol for his patch, which, while not literally used in the final merge, served as a foundation for the fix. ........ r89037 | russell | 2007-11-06 14:20:07 -0400 (Tue, 06 Nov 2007) | 11 lines If someone were to delete the files used by an existing MOH class, and then issue a reload, further use of that class could result in a crash due to dividing by zero. This set of changes fixes up some places to prevent this from happening. (closes issue ASTERISK-10500) Reported by: jcomellas Patches: res_musiconhold_division_by_zero.patch uploaded by jcomellas (license 282) Additional changes added by me. ........ r89042 | oej | 2007-11-06 14:53:37 -0400 (Tue, 06 Nov 2007) | 2 lines Bug fixes to tdd support in zaptel. ........ r89045 | tilghman | 2007-11-06 15:09:06 -0400 (Tue, 06 Nov 2007) | 2 lines We went to the trouble of creating a method of tracking failed trylocks, then never turned it on (oops). ........ r89046 | qwell | 2007-11-06 15:09:30 -0400 (Tue, 06 Nov 2007) | 4 lines Correctly set the total number of channels from a zaptel transcoder board. SPD-49, patch by Matthew Nicholson. ........ r89053 | russell | 2007-11-06 16:18:49 -0400 (Tue, 06 Nov 2007) | 3 lines Fix init_classes() so that classes that actually do have files loaded aren't treated as empty, and immediately destroyed ... ........ r89079 | tilghman | 2007-11-07 00:07:49 -0400 (Wed, 07 Nov 2007) | 5 lines Suppress AEL warnings on load. Reported by: eliel Patch by: eliel Closes issue ASTERISK-10703 ........ r89085 | mmichelson | 2007-11-07 11:56:49 -0400 (Wed, 07 Nov 2007) | 5 lines Fixing a segfault in the manager "core show channels concise" command. (closes issue ASTERISK-10708, reported by arnd and patched by ys) ........ r89088 | murf | 2007-11-07 17:40:28 -0400 (Wed, 07 Nov 2007) | 1 line In response to 10578, I just ran 1.4 thru valgrind; some of the config leakage I've already fixed, but it doesn't hurt to double check. I found and fixed leaks in res_jabber, cdr_tds, pbx_ael. Nothing major, tho. ........ r89090 | mmichelson | 2007-11-07 18:40:35 -0400 (Wed, 07 Nov 2007) | 6 lines This patch makes it possible for SIP phones to dial extensions defined with '#' characters in extensions.conf AND maintain their escaped characters when forming URI's (closes issue ASTERISK-10265, reported by cahen, patched by me, code review by file) ........ r89093 | tilghman | 2007-11-07 19:39:37 -0400 (Wed, 07 Nov 2007) | 7 lines The member refcount must be incremented, to avoid using it after deallocation. A huge thanks go to lvl- for patiently providing the necessary valgrind output that was necessary to finding this problem of memory corruption. Reported by: lvl- Patch by: tilghman Closes issue ASTERISK-10699 ........ r89095 | file | 2007-11-07 19:53:25 -0400 (Wed, 07 Nov 2007) | 4 lines If callerid is configured in sip.conf use that for checking the presence of an extension in the dialplan. (closes issue ASTERISK-10710) Reported by: spditner ........ r89097 | file | 2007-11-07 21:11:25 -0400 (Wed, 07 Nov 2007) | 8 lines Add support for allowing one outgoing transaction. This means if a response comes back out of order chan_sip will still handle it. I dream of a chan_sip with real transaction support. (closes issue ASTERISK-10498) Reported by: flefoll (closes issue ASTERISK-10472) Reported by: ramonpeek (closes issue ASTERISK-9288) Reported by: atca_pres ........ r89099 | file | 2007-11-07 21:28:56 -0400 (Wed, 07 Nov 2007) | 6 lines Improve the devicestate logic for multiple devices. If any are available then the extension is considered available. (closes issue ASTERISK-9843) Reported by: nic_bellamy Patches: sip-hinting-svn-branch-1.4.patch uploaded by nic (license 299) ........ r89101 | file | 2007-11-07 22:26:48 -0400 (Wed, 07 Nov 2007) | 4 lines Do not add a sip: to the beginning of the To URI unless needed. (closes issue ASTERISK-10331) Reported by: goestelecom ........ r89103 | tilghman | 2007-11-08 00:55:19 -0400 (Thu, 08 Nov 2007) | 2 lines Typo ........ r89105 | kpfleming | 2007-11-08 01:26:47 -0400 (Thu, 08 Nov 2007) | 2 lines fix a glaring bug in the new SRV record handling that would cause incorrect weight sorting ........ r89111 | mmichelson | 2007-11-08 12:47:23 -0400 (Thu, 08 Nov 2007) | 5 lines I made this same adjustment in trunk to fix a bug, and it makes sense to do it in 1.4 as well. If an imapfolder is specified in voicemail.conf, don't ever explicitly connect to INBOX since it may not exist. ........ r89115 | qwell | 2007-11-08 14:45:15 -0400 (Thu, 08 Nov 2007) | 4 lines Avoid warnings on load when using sample configuration files. Issue 11195, patch by eliel. ........ r89119 | mmichelson | 2007-11-08 17:00:08 -0400 (Thu, 08 Nov 2007) | 7 lines Rework of the commit I made yesterday to use the already built-in ast_uri_decode function as opposed to my home-rolled one. Also added comments. Thanks to oej for pointing me in the right direction ........ r89125 | qwell | 2007-11-08 19:52:35 -0400 (Thu, 08 Nov 2007) | 4 lines Properly say the seconds here.. Issue 11203, fix described by vma. ........ ------------------------------------------------------------------------ |