Summary: | ASTERISK-14200: [patch] segfault in local_devicestate() in chan_local.c | ||
Reporter: | caspy (caspy) | Labels: | |
Date Opened: | 2009-05-26 05:53:01 | Date Closed: | 2011-06-07 14:01:05 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_local |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) bt_20090526_1210.txt ( 1) bt_20090526_1214.txt ( 2) bt_20090530_1515.txt ( 3) bt_20090601_1129.txt ( 4) bt_20090602_1249.txt ( 5) r190288.diff | |
Description: | Asterisk crashes randomly: (gdb) bt full #0 0xb7c95e1a in strcmp () from /lib/i686/cmov/libc.so.6 No symbol table info available. #1 0xb7291599 in local_devicestate (data=0xb7a52286) at chan_local.c:150 exten = 0xb7a52200 "1181" context = 0xb7a52205 "toagent" opts = 0x0 res = 1 lp = (struct local_pvt *) 0x558 __PRETTY_FUNCTION__ = "local_devicestate" complete backtraces in attaches | ||
Comments: | By: caspy (caspy) 2009-05-26 06:13:40 add on: in the morning, when THREAD_DEBUG was turned on, these errors were in log just before crash: [May 26 11:23:24] ERROR[24975] /usr/home/caspy/compile/asterisk-1.6.0.6/include/asterisk/lock.h: chan_local.c line 595 (local_hangup): Error releasing mutex: Invalid argument [May 26 11:30:46] ERROR[30551] /usr/home/caspy/compile/asterisk-1.6.0.6/include/asterisk/lock.h: chan_local.c line 595 (local_hangup): Error releasing mutex: Invalid argument [May 26 11:34:03] ERROR[31088] /usr/home/caspy/compile/asterisk-1.6.0.6/include/asterisk/lock.h: chan_local.c line 165 (local_pvt_destroy): Error: attempt to destroy locked mutex '&pvt->lock'. now THREAD_DEBUG turned off, cause asterisk seem to be slightly more stable. By: Sean Bright (seanbright) 2009-05-27 07:48:47 There have been a few crash fixes in 1.6.0 since 1.6.0.6 was released. Are you able to test with 1.6.0.9 to see if this is still a problem? By: caspy (caspy) 2009-05-27 10:44:29 i'll do upgrade tomorrow evening. By: caspy (caspy) 2009-05-28 15:17:22 vanilla 1.6.0.9 w/DONT_OPTIMIZE,DEBUG_TREADS,MALLOC_DEBUG just after a few minutes generated this messages: [May 29 00:03:51] ERROR[27817]: /usr/home/caspy/compile/asterisk-1.6.0.9/include/asterisk/lock.h:549 __ast_pthread_mutex_unlock: chan_local.c line 596 (local_hangup): Error releasing mutex: Invalid argument [May 29 00:05:11] ERROR[27864]: /usr/home/caspy/compile/asterisk-1.6.0.9/include/asterisk/lock.h:549 __ast_pthread_mutex_unlock: chan_local.c line 596 (local_hangup): Error releasing mutex: Invalid argument i'm more than sure, it will die shortly. and i forget to mention, that in case of compiled in DEBUG_TREADS segfault is happens in DEBUG_THREADS's code. By: Sean Bright (seanbright) 2009-05-28 16:31:40 OK. Can you please follow the instructions in doc/backtrace.txt (don't enable DEBUG_THREADS or MALLOC_DEBUG) and get another backtrace from a crash? The new errors appear to be happening in a different area of the code. By: caspy (caspy) 2009-06-01 02:56:44 hi! uploaded: bt_20090530_1515.txt, bt_20090601_1129.txt this is 1.6.0.9 w/DONT_OPTIMIZE only. By: Sean Bright (seanbright) 2009-06-01 09:30:05 OK. It looks like a memory corruption issue. If you could run the latest version of 1.6.0 from svn: svn co http://svn.digium.com/svn/asterisk/branches/1.6.0 And follow the instructions in doc/valgrind.txt that would be great. If that is not possible, I would like more information on exactly what you are doing in your environment so I can set up a test here. By: caspy (caspy) 2009-06-01 10:01:22 valgrind is not an option. i've tried it, slowdown is unaccepable. i'll try 1.6.0, but it can take some time cause a heavy production. what about a dialplan: - i have a number sip phones, like SIP/1234 - i have a Queue with members like Local/1234@toagent (members are added with device state of SIP channel: AddQueueMember(callcenterq,Local/1234@toagent,,,,SIP/1234) ) - 'toagent' context is: context toagent { _XXXX => { Set(CDR(amaflags)=omit); ChanIsAvail(SIP/${EXTEN},s); if ("${AVAILCHAN}" != "") { Dial(SIP/${EXTEN}); }; Busy(); }; }; - people are calling to this queue, where a number of agents are taking calls. what about you to try to reproduce: unfortunately, i's most likely impossible. this error is very unpredictable. for example: - compiling with DONT_OPTIMIZE make this crash more rare, - turning on 'core set debug atleast 4' make this error more-and-more rare, - it is very-very load-dependent. as more load - more stable. but on individual rare calls at night - all ok too. the best time to crash - morning, where callers count just begin to grow. - compiling DEBUG_TREADS moves crash to threads-tracking code. _moves_, not making another one. :| and also, please, look at bug ASTERISK-1462783. it seems to be very close to this issue, inspite of totally different parts of code. it looks like in both cases a crash is forced by situation, when one thread is killing a variable, that is still used by another thread. By: caspy (caspy) 2009-06-02 09:47:54 one more backtrace By: Sean Bright (seanbright) 2009-06-03 16:43:44 Was the latest crash with the latest from the 1.6.0 branch? By: caspy (caspy) 2009-06-03 16:52:38 no, it's still 1.6.0.9 By: Sean Bright (seanbright) 2009-06-03 17:09:33 Ah. Could you try the attached patch (r190288.diff)? This is a change that went into the 1.6.0 branch that relates to locking in chan_local. It's very possible that it resolves your crash. By: caspy (caspy) 2009-06-03 17:18:12 seanbright, patch installed upon 1.6.0.9. let's look for stability now. thanks. By: caspy (caspy) 2009-06-03 17:21:12 (it may take longer time to test, cause i have turned on debug for ASTERISK-1462783. please, don't close issue too soon) By: Sean Bright (seanbright) 2009-06-09 12:50:43 caspy, Have you had any luck replicating this crash after applying the patch? By: caspy (caspy) 2009-06-09 13:07:54 seanbright, No. Since the patch was installed asterisk stable. But, please, leave issue open for an 1-1.5 weeks more. Results will be more accurate, and i also want to return DEBUG_THREADS, and look for error messages about threads trouble gone away. I'll write upon any results. By: caspy (caspy) 2009-06-15 02:13:51 seanbright, with this patch messages described in comment 0105384 has gone too. I think we can consider, that this patch resolves this issue. Thanks! By: Sean Bright (seanbright) 2009-06-15 12:00:38 This has already been fixed by r190286 and friends. |