[Home]

Summary:ASTERISK-14200: [patch] segfault in local_devicestate() in chan_local.c
Reporter:caspy (caspy)Labels:
Date Opened:2009-05-26 05:53:01Date Closed:2011-06-07 14:01:05
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_local
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt_20090526_1210.txt
( 1) bt_20090526_1214.txt
( 2) bt_20090530_1515.txt
( 3) bt_20090601_1129.txt
( 4) bt_20090602_1249.txt
( 5) r190288.diff
Description:Asterisk crashes randomly:
(gdb) bt full
#0  0xb7c95e1a in strcmp () from /lib/i686/cmov/libc.so.6
No symbol table info available.
#1  0xb7291599 in local_devicestate (data=0xb7a52286) at chan_local.c:150
exten = 0xb7a52200 "1181"
context = 0xb7a52205 "toagent"
opts = 0x0
res = 1
lp = (struct local_pvt *) 0x558
__PRETTY_FUNCTION__ = "local_devicestate"


complete backtraces in attaches
Comments:By: caspy (caspy) 2009-05-26 06:13:40

add on:


in the morning, when THREAD_DEBUG was turned on, these errors were in log just before crash:

[May 26 11:23:24] ERROR[24975] /usr/home/caspy/compile/asterisk-1.6.0.6/include/asterisk/lock.h: chan_local.c line 595 (local_hangup): Error releasing mutex: Invalid argument
[May 26 11:30:46] ERROR[30551] /usr/home/caspy/compile/asterisk-1.6.0.6/include/asterisk/lock.h: chan_local.c line 595 (local_hangup): Error releasing mutex: Invalid argument
[May 26 11:34:03] ERROR[31088] /usr/home/caspy/compile/asterisk-1.6.0.6/include/asterisk/lock.h: chan_local.c line 165 (local_pvt_destroy): Error: attempt to destroy locked mutex '&pvt->lock'.


now THREAD_DEBUG turned off, cause asterisk seem to be slightly more stable.

By: Sean Bright (seanbright) 2009-05-27 07:48:47

There have been a few crash fixes in 1.6.0 since 1.6.0.6 was released.  Are you able to test with 1.6.0.9 to see if this is still a problem?

By: caspy (caspy) 2009-05-27 10:44:29

i'll do upgrade tomorrow evening.

By: caspy (caspy) 2009-05-28 15:17:22

vanilla 1.6.0.9 w/DONT_OPTIMIZE,DEBUG_TREADS,MALLOC_DEBUG just after a few minutes generated this messages:

[May 29 00:03:51] ERROR[27817]: /usr/home/caspy/compile/asterisk-1.6.0.9/include/asterisk/lock.h:549 __ast_pthread_mutex_unlock: chan_local.c line 596 (local_hangup): Error releasing mutex: Invalid argument
[May 29 00:05:11] ERROR[27864]: /usr/home/caspy/compile/asterisk-1.6.0.9/include/asterisk/lock.h:549 __ast_pthread_mutex_unlock: chan_local.c line 596 (local_hangup): Error releasing mutex: Invalid argument


i'm more than sure, it will die shortly.
and i forget to mention, that in case of compiled in DEBUG_TREADS segfault is happens in DEBUG_THREADS's code.

By: Sean Bright (seanbright) 2009-05-28 16:31:40

OK.  Can you please follow the instructions in doc/backtrace.txt (don't enable DEBUG_THREADS or MALLOC_DEBUG) and get another backtrace from a crash?  The new errors appear to be happening in a different area of the code.

By: caspy (caspy) 2009-06-01 02:56:44

hi!

uploaded: bt_20090530_1515.txt, bt_20090601_1129.txt
this is 1.6.0.9 w/DONT_OPTIMIZE only.

By: Sean Bright (seanbright) 2009-06-01 09:30:05

OK.  It looks like a memory corruption issue.  If you could run the latest version of 1.6.0 from svn:

   svn co http://svn.digium.com/svn/asterisk/branches/1.6.0

And follow the instructions in doc/valgrind.txt that would be great.

If that is not possible, I would like more information on exactly what you are doing in your environment so I can set up a test here.

By: caspy (caspy) 2009-06-01 10:01:22

valgrind is not an option. i've tried it, slowdown is unaccepable.


i'll try 1.6.0, but it can take some time cause a heavy production.


what about a dialplan:
- i have a number sip phones, like SIP/1234
- i have a Queue with members like Local/1234@toagent
 (members are added with device state of SIP channel:
  AddQueueMember(callcenterq,Local/1234@toagent,,,,SIP/1234) )
- 'toagent' context is:
context toagent {
 _XXXX => {
   Set(CDR(amaflags)=omit);
   ChanIsAvail(SIP/${EXTEN},s);
   if ("${AVAILCHAN}" != "") {
     Dial(SIP/${EXTEN});
   };
   Busy();
 };
};
- people are calling to this queue, where a number of agents are taking calls.


what about you to try to reproduce: unfortunately, i's most likely impossible. this error is very unpredictable. for example:
- compiling with DONT_OPTIMIZE make this crash more rare,
- turning on 'core set debug atleast 4' make this error more-and-more rare,
- it is very-very load-dependent. as more load - more stable. but on individual rare calls at night - all ok too. the best time to crash - morning, where callers count just begin to grow.
- compiling DEBUG_TREADS moves crash to threads-tracking code. _moves_, not making another one. :|


and also, please, look at bug ASTERISK-1462783. it seems to be very close to this issue, inspite of totally different parts of code.
it looks like in both cases a crash is forced by situation, when one thread is killing a variable, that is still used by another thread.

By: caspy (caspy) 2009-06-02 09:47:54

one more backtrace

By: Sean Bright (seanbright) 2009-06-03 16:43:44

Was the latest crash with the latest from the 1.6.0 branch?

By: caspy (caspy) 2009-06-03 16:52:38

no, it's still 1.6.0.9

By: Sean Bright (seanbright) 2009-06-03 17:09:33

Ah.  Could you try the attached patch (r190288.diff)?  This is a change that went into the 1.6.0 branch that relates to locking in chan_local.  It's very possible that it resolves your crash.

By: caspy (caspy) 2009-06-03 17:18:12

seanbright,
patch installed upon 1.6.0.9. let's look for stability now.
thanks.

By: caspy (caspy) 2009-06-03 17:21:12

(it may take longer time to test, cause i have turned on debug for ASTERISK-1462783.
please, don't close issue too soon)

By: Sean Bright (seanbright) 2009-06-09 12:50:43

caspy,

Have you had any luck replicating this crash after applying the patch?

By: caspy (caspy) 2009-06-09 13:07:54

seanbright,

No. Since the patch was installed asterisk stable.
But, please, leave issue open for an 1-1.5 weeks more. Results will be more accurate, and i also want to return DEBUG_THREADS, and look for error messages about threads trouble gone away.

I'll write upon any results.

By: caspy (caspy) 2009-06-15 02:13:51

seanbright,

with this patch messages described in comment 0105384 has gone too.
I think we can consider, that this patch resolves this issue.

Thanks!

By: Sean Bright (seanbright) 2009-06-15 12:00:38

This has already been fixed by r190286 and friends.