[Home]

Summary:ASTERISK-13541: Asterisk crashes in ast_channels_free with "free(): invalid pointer"
Reporter:Max Molchanov (censo)Labels:
Date Opened:2009-02-09 09:05:49.000-0600Date Closed:2011-06-07 14:00:41
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/Channels
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk-1605-gdb.txt
( 1) asterisk-log.txt.gz
( 2) feb19.txt
Description:Asterisk has crashed under load after 20 hours of work, glibc reports freeing of invalid pointer.
Asterisk was compiled without optimizations and with malloc_debug on.
Comments:By: Joshua C. Colp (jcolp) 2009-02-09 09:11:35.000-0600

We are going to need much more information to diagnose this. What is the machine doing? Can you attach the console output before it crashed?

By: Max Molchanov (censo) 2009-02-09 09:24:32.000-0600

Attached console output, Asterisk was hanging up a call on DAHDI.
PS: FreePBX is used on top.



By: Max Molchanov (censo) 2009-02-11 06:26:36.000-0600

I investigated source of some problems, preceding crash and found some inconsistency in mutex handling in Asterisk.
  These logs appeared the same second when Asterisk crashed:

1. [Feb  9 15:29:59] ERROR[7222] /root/dev/pbx/asterisk-1.6.0.5/include/asterisk/lock.h: pbx.c line 6424 (ast_explicit_goto): Error obtaining mutex: Invalid argument
2. [Feb  9 15:29:59] ERROR[7222] /root/dev/pbx/asterisk-1.6.0.5/include/asterisk/lock.h: pbx.c line 6437 (ast_explicit_goto): mutex '&chan->lock_dont_use' freed more times than we've locked!
3. [Feb  9 15:29:59] ERROR[7222] /root/dev/pbx/asterisk-1.6.0.5/include/asterisk/lock.h: pbx.c line 6437 (ast_explicit_goto): Error releasing mutex: Invalid argument

  It is not clear why mutex get corrupted, but:

1. When trying to lock it in '__ast_pthread_mutex_lock', pthread_mutex_lock is called (lock.h:427) and if result is 'wrong', t->reentrancy (lock.h:437) is _not_ incremented.
2. __ast_pthread_mutex_unlock is called and it decrements usage of the mutex every time! It does not care whether mutex was right or wrong. Since the mutex was not taken at step one, it reports wrong reentrancy.
3. And only at last '__ast_pthread_mutex_unlock' tries to unlock the mutex, reporting one more error.

  I think, since return value for both lock/unlock functions is not checked, it is needed to keep track of last "lock" operation to prevent false reentrancy decrements.

By: Joshua C. Colp (jcolp) 2009-02-12 11:31:24.000-0600

Please reproduce this was debug enabled in logger.conf (going to console) and core set debug 2. Attach the output. Please also use the 1.6.0 branch, it is much newer then 1.6.0.5.

By: Max Molchanov (censo) 2009-02-12 13:32:39.000-0600

I've enabled debug. Will wait and see if anything happens.
Unfortunately I can not update Asterisk to branch, system is under load now.

By the way, in Asterisk logs attached I see GotoIf(null) call, which appears only once in 188Mb log, just before Asterisk reported mutex troubles. It looks like GotoIf with 'null' parameter passes some junk instead of correct channel pointers. What do you think?


[Feb  9 15:29:59] VERBOSE[7222] logger.c:     -- Executing [s@macro-hangupcall:9] GotoIf("(null)", "1?theend") in new stack
[Feb  9 15:29:59] ERROR[7222] /root/dev/pbx/asterisk-1.6.0.5/include/asterisk/lock.h: pbx.c line 6424 (ast_explicit_goto): Error obtaining mutex: Invalid argumenttack



By: Joshua C. Colp (jcolp) 2009-02-13 08:50:06.000-0600

The channel is getting freed prematurely so the channel pointer is no longer valid. Despite this the dialplan is continuing. We have to determine why the channel is getting freed in the first place.

By: Joshua C. Colp (jcolp) 2009-02-25 10:52:30.000-0600

Censo: Any update?

By: Max Molchanov (censo) 2009-02-25 15:20:11.000-0600

Yes, I see there was similar crash when debug logs where turned on, I attached it as feb19.txt. Looks somewhat similar to bug 0014157.
I'll upgrade to 1.6.0.6 and see if anything similar happens. If it does, I can try to start Asterisk under valgrind.

By: Max Molchanov (censo) 2009-03-02 05:39:29.000-0600

I think this particular issue may be closed, I see no occurrences.
However, crashes still happen on 1.6.0.6 during call transfer, I'll investigate and open a new ticket.



By: Joshua C. Colp (jcolp) 2009-03-02 07:33:36.000-0600

Closed per reporter.