[Home]

Summary:ASTERISK-10699: Crash related to ast_module_user_remove
Reporter:laszlovl (lvl)Labels:
Date Opened:2007-11-06 15:04:47.000-0600Date Closed:2007-11-09 10:34:34.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20071107__bug11174.diff.txt
( 1) bt.txt
( 2) bt-full.txt
( 3) bt-threads.txt
( 4) console-output.txt
( 5) new_bt.txt
( 6) valgrind.txt
Description:Asterisk randomly crashes after this sequence of events:

== Manager 'call' logged on from 127.0.0.1
-- Executing [s@callee:3] Hangup("Local/s@callee-c93d,2", "") in new stack
== Spawn extension (callee, s, 3) exited non-zero on 'Local/s@callee-c93d,2'
*** glibc detected *** asterisk: free(): invalid pointer: 0x081b8f79 ***

======= Backtrace: =========
/lib/i686/cmov/libc.so.6[0xb7de9cf5]
/lib/i686/cmov/libc.so.6(cfree+0x90)[0b7ded790]
asterisk(__ast_module_user_remove+0x11c)[0x80b7cf0]
/usr/lib/asterisk/modules/chan_local.so[0xb6f247df]
asterisk(ast_hangup+0x372)[0x8085854]
asterisk[0x80cf76f]
asterisk[0x80cf9bf]
asterisk[0x8110eee]
/lib/i686/cmov/libpthread.so.0[0xb7f3646b]
/lib/i686/cmov/libc.so.6(clone+0x5e)[0xb7e534de]

****** ADDITIONAL INFORMATION ******

Our asterisk is experiencing semi-random crashes which seem to be related to ast_module_user_remove. They are easily reproducable during our stresstests (which consist out of a new call being thrown into a queue every few seconds, aswell as agents logging in or out of the queue every few seconds, through the asterisk manager interface), although we've seen them occur during regular usage aswell.

The crashing always seems to occur after the aforementioned sequence of events, though the same sequence can occur many times in a row without any problems aswell. Stresstesting with a 2-second interval makes asterisk crash after around a minute. The higher the interval, the longer it takes to crash.

Attached are the full console output from the crash, and backtraces from the coredump. A few people on IRC have suggested to enable DONT_OPTIMIZE first, though I've doublechecked that asterisk is indeed compiled with that flag.

The problem appears identically on the 1.4 tarball, SVN trunk and SVN branch. Running Debian, 2.6.18.
Comments:By: Tilghman Lesher (tilghman) 2007-11-06 18:49:42.000-0600

Actually, this would appear to be memory corruption.  Please follow the instructions in doc/valgrind.txt to get us the necessary debugging output.

By: laszlovl (lvl) 2007-11-06 20:44:25.000-0600

I haven't been able to reproduce the crash yet while asterisk is running in valgrind - probably because it appears to slow everything down quite a bit (our testing box isn't very fast to begin with).

Recompiling with MALLOC_DEBUG did yield some very interesting information though, the key points being:

== Spawn extension (callee, s, 3) exited non-zero on 'Local/s@callee-4c57,2'
WARNING: Freeing unused memory at 0x824bde9, in __ast_module_user_remove of loader.c, line 204
== Spawn extension (test2, s, 2) exited non-zero on 'Local/s@callee-4c57,1'
Segmentation fault (core dumped)

and

Program terminated with signal 11, Segmentation fault.
#0  0x080bb646 in __ast_module_user_remove (mod=0x82236b8, u=0x824a9e0) at loader.c:201
201             AST_LIST_REMOVE(&mod->users, u, entry);

I'm attaching the complete backtrace as new_bt.txt.

By: Tilghman Lesher (tilghman) 2007-11-07 15:04:16.000-0600

Again, for these types of problems, the backtrace is unlikely to show us the problem.  It is the messages that valgrind and the MALLOC_DEBUG options show us that will be helpful.

Specifically, the problem is that by the time Asterisk crashes with a memory corruption error, the memory corruption happened well before the crash.  The crash is just the result of finding corrupt memory.

By: laszlovl (lvl) 2007-11-07 16:31:03.000-0600

Ok. I haven't been able to reproduce the crash while asterisk is running within valgrind - but here's a logfile from a couple of minutes of heavy stresstesting.

The malloc_debug.txt contained the same line repeated around 900 times: "XXX ERROR XXX A thread holds more locks than '16'.  Increase AST_MAX_LOCKS!"

By: Digium Subversion (svnbot) 2007-11-07 17:37:40.000-0600

Repository: asterisk
Revision: 89093

U   branches/1.4/apps/app_queue.c

------------------------------------------------------------------------
r89093 | tilghman | 2007-11-07 17:37:39 -0600 (Wed, 07 Nov 2007) | 7 lines

The member refcount must be incremented, to avoid using it after deallocation.
A huge thanks go to lvl- for patiently providing the necessary valgrind output
that was necessary to finding this problem of memory corruption.
Reported by: lvl-
Patch by: tilghman
Closes issue ASTERISK-10699

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-11-07 17:45:49.000-0600

Repository: asterisk
Revision: 89094

_U  trunk/
U   trunk/apps/app_queue.c

------------------------------------------------------------------------
r89094 | tilghman | 2007-11-07 17:45:49 -0600 (Wed, 07 Nov 2007) | 15 lines

Merged revisions 89093 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r89093 | tilghman | 2007-11-07 17:39:37 -0600 (Wed, 07 Nov 2007) | 7 lines

The member refcount must be incremented, to avoid using it after deallocation.
A huge thanks go to lvl- for patiently providing the necessary valgrind output
that was necessary to finding this problem of memory corruption.
Reported by: lvl-
Patch by: tilghman
Closes issue ASTERISK-10699

........

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-11-09 10:34:34.000-0600

Repository: asterisk
Revision: 89131

_U  team/file/t38fun/
U   team/file/t38fun/apps/app_queue.c
U   team/file/t38fun/apps/app_voicemail.c
U   team/file/t38fun/cdr/cdr_tds.c
U   team/file/t38fun/channels/chan_sip.c
U   team/file/t38fun/codecs/codec_zap.c
U   team/file/t38fun/configs/extensions.ael.sample
U   team/file/t38fun/configs/res_odbc.conf.sample
U   team/file/t38fun/doc/valgrind.txt
U   team/file/t38fun/include/asterisk/lock.h
U   team/file/t38fun/main/config.c
U   team/file/t38fun/main/manager.c
U   team/file/t38fun/main/say.c
U   team/file/t38fun/main/srv.c
U   team/file/t38fun/main/tdd.c
U   team/file/t38fun/pbx/pbx_ael.c
U   team/file/t38fun/res/res_jabber.c
U   team/file/t38fun/res/res_musiconhold.c

------------------------------------------------------------------------
r89131 | file | 2007-11-09 10:34:31 -0600 (Fri, 09 Nov 2007) | 168 lines

Merged revisions 89032,89036-89037,89042,89045-89046,89053,89079,89085,89088,89090,89093,89095,89097,89099,89101,89103,89105,89111,89115,89119,89125 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r89032 | file | 2007-11-06 13:08:05 -0400 (Tue, 06 Nov 2007) | 4 lines

Make it so that if a peer is determined to be unreachable using qualify their devicestate will report back unavailable.
(closes issue ASTERISK-10551)
Reported by: pj

........
r89036 | murf | 2007-11-06 13:52:50 -0400 (Tue, 06 Nov 2007) | 1 line

closes issue ASTERISK-8547 - where the [catname](!) and [catname](othercat1,othercat2,...) notation gets dropped across a ConfigUpdate (or any other thing that would cause a config file to be written). While I was at it, I also cleaned up some of the destroy routines to free up comments, which was not being done. Made sure the new struct I introduced is also cleaned up properly at destruction time. My code handles multiple template inclusions. Many thanks to ssokol for his patch, which, while not literally used in the final merge, served as a foundation for the fix.
........
r89037 | russell | 2007-11-06 14:20:07 -0400 (Tue, 06 Nov 2007) | 11 lines

If someone were to delete the files used by an existing MOH class, and then
issue a reload, further use of that class could result in a crash due to
dividing by zero.  This set of changes fixes up some places to prevent this
from happening.

(closes issue ASTERISK-10500)
Reported by: jcomellas
Patches:
     res_musiconhold_division_by_zero.patch uploaded by jcomellas (license 282)
 Additional changes added by me.

........
r89042 | oej | 2007-11-06 14:53:37 -0400 (Tue, 06 Nov 2007) | 2 lines

Bug fixes to tdd support in zaptel.

........
r89045 | tilghman | 2007-11-06 15:09:06 -0400 (Tue, 06 Nov 2007) | 2 lines

We went to the trouble of creating a method of tracking failed trylocks, then never turned it on (oops).

........
r89046 | qwell | 2007-11-06 15:09:30 -0400 (Tue, 06 Nov 2007) | 4 lines

Correctly set the total number of channels from a zaptel transcoder board.

SPD-49, patch by Matthew Nicholson.

........
r89053 | russell | 2007-11-06 16:18:49 -0400 (Tue, 06 Nov 2007) | 3 lines

Fix init_classes() so that classes that actually do have files loaded aren't
treated as empty, and immediately destroyed ...

........
r89079 | tilghman | 2007-11-07 00:07:49 -0400 (Wed, 07 Nov 2007) | 5 lines

Suppress AEL warnings on load.
Reported by: eliel
Patch by: eliel
Closes issue ASTERISK-10703

........
r89085 | mmichelson | 2007-11-07 11:56:49 -0400 (Wed, 07 Nov 2007) | 5 lines

Fixing a segfault in the manager "core show channels concise" command.

(closes issue ASTERISK-10708, reported by arnd and patched by ys)


........
r89088 | murf | 2007-11-07 17:40:28 -0400 (Wed, 07 Nov 2007) | 1 line

In response to 10578, I just ran 1.4 thru valgrind; some of the config leakage I've already fixed, but it doesn't hurt to double check. I found and fixed leaks in res_jabber, cdr_tds, pbx_ael. Nothing major, tho.
........
r89090 | mmichelson | 2007-11-07 18:40:35 -0400 (Wed, 07 Nov 2007) | 6 lines

This patch makes it possible for SIP phones to dial extensions defined with '#' characters
in extensions.conf AND maintain their escaped characters when forming URI's

(closes issue ASTERISK-10265, reported by cahen, patched by me, code review by file)


........
r89093 | tilghman | 2007-11-07 19:39:37 -0400 (Wed, 07 Nov 2007) | 7 lines

The member refcount must be incremented, to avoid using it after deallocation.
A huge thanks go to lvl- for patiently providing the necessary valgrind output
that was necessary to finding this problem of memory corruption.
Reported by: lvl-
Patch by: tilghman
Closes issue ASTERISK-10699

........
r89095 | file | 2007-11-07 19:53:25 -0400 (Wed, 07 Nov 2007) | 4 lines

If callerid is configured in sip.conf use that for checking the presence of an extension in the dialplan.
(closes issue ASTERISK-10710)
Reported by: spditner

........
r89097 | file | 2007-11-07 21:11:25 -0400 (Wed, 07 Nov 2007) | 8 lines

Add support for allowing one outgoing transaction. This means if a response comes back out of order chan_sip will still handle it. I dream of a chan_sip with real transaction support.
(closes issue ASTERISK-10498)
Reported by: flefoll
(closes issue ASTERISK-10472)
Reported by: ramonpeek
(closes issue ASTERISK-9288)
Reported by: atca_pres

........
r89099 | file | 2007-11-07 21:28:56 -0400 (Wed, 07 Nov 2007) | 6 lines

Improve the devicestate logic for multiple devices. If any are available then the extension is considered available.
(closes issue ASTERISK-9843)
Reported by: nic_bellamy
Patches:
     sip-hinting-svn-branch-1.4.patch uploaded by nic (license 299)

........
r89101 | file | 2007-11-07 22:26:48 -0400 (Wed, 07 Nov 2007) | 4 lines

Do not add a sip: to the beginning of the To URI unless needed.
(closes issue ASTERISK-10331)
Reported by: goestelecom

........
r89103 | tilghman | 2007-11-08 00:55:19 -0400 (Thu, 08 Nov 2007) | 2 lines

Typo

........
r89105 | kpfleming | 2007-11-08 01:26:47 -0400 (Thu, 08 Nov 2007) | 2 lines

fix a glaring bug in the new SRV record handling that would cause incorrect weight sorting

........
r89111 | mmichelson | 2007-11-08 12:47:23 -0400 (Thu, 08 Nov 2007) | 5 lines

I made this same adjustment in trunk to fix a bug, and it makes sense to do it in 1.4 as
well. If an imapfolder is specified in voicemail.conf, don't ever explicitly connect to
INBOX since it may not exist.


........
r89115 | qwell | 2007-11-08 14:45:15 -0400 (Thu, 08 Nov 2007) | 4 lines

Avoid warnings on load when using sample configuration files.

Issue 11195, patch by eliel.

........
r89119 | mmichelson | 2007-11-08 17:00:08 -0400 (Thu, 08 Nov 2007) | 7 lines

Rework of the commit I made yesterday to use the already built-in
ast_uri_decode function as opposed to my home-rolled one. Also added
comments.

Thanks to oej for pointing me in the right direction


........
r89125 | qwell | 2007-11-08 19:52:35 -0400 (Thu, 08 Nov 2007) | 4 lines

Properly say the seconds here..

Issue 11203, fix described by vma.

........

------------------------------------------------------------------------