Summary: | ASTERISK-17655: pri lockup. no futher calls able to be taken | ||
Reporter: | Alec Davis (alecdavis) | Labels: | |
Date Opened: | 2011-04-06 22:00:33 | Date Closed: | 2011-05-06 17:30:21 |
Priority: | Minor | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_dahdi |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) backtrace-apr08.txt ( 1) backtrace-apr7.txt ( 2) backtrace-apr7-2.txt ( 3) core-show-locks.txt ( 4) core-show-locks2.txt ( 5) core-show-locks-apr08.txt | |
Description: | attached backtrace and core show locks ****** ADDITIONAL INFORMATION ****** [2011-04-07 10:58:43.199] VERBOSE[29802] pbx.c: -- Executing [8320@incoming:9] Dial("DAHDI/i1/5604947-c3b", "DAHDI/g0/033417052:8320,,r") in new stack [2011-04-07 10:58:43.199] DEBUG[5255] chan_dahdi.c: Enabled echo cancellation on channel 31 [2011-04-07 10:58:43.199] DEBUG[29802] chan_dahdi.c: Using channel 3 [2011-04-07 10:58:43.199] DEBUG[29802] sig_pri.c: sig_pri_request 3 [2011-04-07 10:58:43.199] DEBUG[29802] dsp.c: Setup tone 1100 Hz, 500 ms, block_size=160, hits_required=21 [2011-04-07 10:58:43.199] DEBUG[29802] dsp.c: Setup tone 2100 Hz, 2600 ms, block_size=160, hits_required=116 [2011-04-07 10:58:43.199] DEBUG[29802] devicestate.c: device 'DAHDI/i1/033417052:8320-c3c' state '2' [2011-04-07 10:58:43.199] ERROR[29802] lock.c: chan_dahdi.c line 2078 (my_unlock_private): mutex '&p->lock' freed more times than we've locked! [2011-04-07 10:58:43.199] ERROR[29802] lock.c: chan_dahdi.c line 2078 (my_unlock_private): Error releasing mutex: Operation not permitted | ||
Comments: | By: Alec Davis (alecdavis) 2011-04-06 22:49:31 message below for 8 minutes!! [2011-04-07 11:06:07.435] DEBUG[29802] chan_dahdi.c: Dropping frame since I'm still dialing on DAHDI/i1/033417052:8320-c3c... [2011-04-07 11:06:07.435] DEBUG[29802] chan_dahdi.c: Dropping frame since I'm still dialing on DAHDI/i1/033417052:8320-c3c... By: Richard Mudgett (rmudgett) 2011-04-07 10:59:46 This may no longer be a problem. I have recently committed a major patch on the 1.8 branch to fix many B channel shifting issues. Part of that patch also fixed a deadlock issue I ran into. By: Alec Davis (alecdavis) 2011-04-07 17:03:23 richard: being a production box, not exactly keen. It's been stable for weeks, then yesterday 2 pri lockups about an hour apart. I only filed the first. Are you able to pinpoint the deadlock issue, so that I can apply a focused patch. Was it in chan_dahdi, or sig_pri? By: Richard Mudgett (rmudgett) 2011-04-07 17:45:44 The particular deadlock in core-show-locks.txt doesn't make sense to me. I think the holder of the private lock should not still have it. It looks like someone also tried a pri_grab() without already having the private lock based upon your ERROR messages above. The deadlock issue I ran into was because the private lock was held by pri_dchannel() while it was trying to create the ast_channel for an incoming call. Part of the B channel shifting patch added channel allocation protection between incoming and outgoing calls picking the same B channel. See sig_pri_available() and sig_pri_is_chan_available(). B channel shifting patch is -r312575 and -r312949 for v1.8. What is the call pattern for this particular box? Is it in/out calls? Is it very busy? Collision mitigation may be possible by using a different channel allocation strategy to avoid both sides being likely to pick the same B channel. By: Alec Davis (alecdavis) 2011-04-07 18:51:31 core-show-locks2.txt and backtrace-apr7-2.txt debug log related to the 2nd lockup. [2011-04-07 13:07:22.125] VERBOSE[10963] pbx.c: -- Executing [s@macro-dialout:16] Dial("SIP/GXP0018-000000a1", "DAHDI/g0/0800268743#") in new stack [2011-04-07 13:07:22.125] DEBUG[10963] chan_dahdi.c: Using channel 3 [2011-04-07 13:07:22.125] DEBUG[10963] sig_pri.c: sig_pri_request 3 [2011-04-07 13:07:22.125] DEBUG[10963] dsp.c: Setup tone 1100 Hz, 500 ms, block_size=160, hits_required=21 [2011-04-07 13:07:22.125] DEBUG[10963] dsp.c: Setup tone 2100 Hz, 2600 ms, block_size=160, hits_required=116 [2011-04-07 13:07:22.125] DEBUG[10963] devicestate.c: device 'DAHDI/i1/0800268743#-bc' state '2' [2011-04-07 13:07:22.124] DEBUG[29965] devicestate.c: device 'DAHDI/i1/5604817-bb' state '2' [2011-04-07 13:07:22.126] VERBOSE[10964] sig_pri.c: -- Starting simple switch on 'DAHDI/i1/5604817-bb' [2011-04-07 13:07:22.126] ERROR[10963] lock.c: chan_dahdi.c line 2078 (my_unlock_private): mutex '&p->lock' freed more times than we've locked! [2011-04-07 13:07:22.126] ERROR[10963] lock.c: chan_dahdi.c line 2078 (my_unlock_private): Error releasing mutex: Operation not permitted [2011-04-07 13:07:22.126] VERBOSE[29965] sig_pri.c: -- Accepting overlap call from '5604817' to '039833199' on channel 0/1, span 1 By: Alec Davis (alecdavis) 2011-04-07 23:00:18 there seems to be a race between hangup and request. this can be seen in the backtrace-apr7-2 between threads 15 and 18 both trying to lock 0xb637bb40 rmudgett: looking at your chan_dahdi commits, I noticed this <pre> /* * We already have the B channel reserved for this call. We * just need to make sure that dahdi_hangup() has completed * cleaning up before continuing. */ ast_mutex_lock(&p->lock); ast_mutex_unlock(&p->lock);</pre> seems relevant? By: Alec Davis (alecdavis) 2011-04-08 00:15:34 backtrace-apr08.txt and core-show-locks-apr08.txt Another lockup today, seems like the 1st lockup. 1 thread with a 1 lock 'LOCK8' waiting on something. LOCK1, wait for LOCK7 LOCK7, wait for LOCK3 LOCK3, wait for LOCK8 LOCK8, but not sure what is it's problem. ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 0xb6dffb90 (do_devstate_changes started at [ 724] devicestate.c ast_device_state_engine_init()) === ---> Lock #0 (astobj2.c): MUTEX 657 internal_ao2_callback c 0x99586f0 (1) LOCK1 === ---> Waiting for Lock #1 (channel.c): MUTEX 1641 ast_channel_cmp_cb chan 0xaf87cc10 (1) LOCK7 === --- ---> Locked Here: channel.c line 2796 (ast_raw_answer) === ------------------------------------------------------------------- === === Thread ID: 0xb2cfcb90 (do_monitor started at [24470] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 23964 handle_request_do &netlock 0xb6b0f6c0 (1) LOCK2 === ---> Waiting for Lock #1 (astobj2.c): MUTEX 493 internal_ao2_link c 0x99586f0 (1) LOCK1 === --- ---> Locked Here: astobj2.c line 657 (internal_ao2_callback) === ------------------------------------------------------------------- === === Thread ID: 0xaffb7b90 (pri_dchannel started at [ 7117] sig_pri.c sig_pri_start_pri()) === ---> Lock #0 (sig_pri.c): MUTEX 4351 pri_dchannel &pri->lock 0xb638a824 (1) LOCK3 === ---> Waiting for Lock #1 (chan_dahdi.c): MUTEX 2072 my_lock_private &p->lock 0xb04f72f0 (1) LOCK8 === --- ---> Locked Here: chan_dahdi.c line 2072 (my_lock_private) === ------------------------------------------------------------------- === === Thread ID: 0xaff3fb90 (do_monitor started at [11577] chan_dahdi.c restart_monitor()) === ---> Lock #0 (chan_dahdi.c): MUTEX 11304 do_monitor &iflock 0xb6386b40 (1) LOCK4 === ---> Waiting for Lock #1 (chan_dahdi.c): MUTEX 11322 do_monitor &i->lock 0xb04f72f0 (1) LOCK8 === --- ---> Locked Here: chan_dahdi.c line 2072 (my_lock_private) === ------------------------------------------------------------------- === === Thread ID: 0xaff7bb90 (pri_dchannel started at [ 7117] sig_pri.c sig_pri_start_pri()) === ---> Lock #0 (chan_dahdi.c): MUTEX 2072 my_lock_private &p->lock 0xafe2ac78 (1) LOCK5 === ---> Waiting for Lock #1 (astobj2.c): MUTEX 493 internal_ao2_link c 0x99586f0 (1) LOCK1 === --- ---> Locked Here: astobj2.c line 657 (internal_ao2_callback) === ------------------------------------------------------------------- === === Thread ID: 0xafa1eb90 (pbx_thread started at [ 5038] pbx.c ast_pbx_start()) === ---> Lock #0 (cdr.c): RDLOCK 1142 post_cdr &(&be_list)->lock 0x8203868 (1) LOCK6 === ---> Waiting for Lock #1 (astobj2.c): MUTEX 657 internal_ao2_callback c 0x99586f0 (1) LOCK1 === --- ---> Locked Here: astobj2.c line 657 (internal_ao2_callback) === ------------------------------------------------------------------- === === Thread ID: 0xaecb3b90 (pbx_thread started at [ 5038] pbx.c ast_pbx_start()) === ---> Lock #0 (channel.c): MUTEX 2796 ast_raw_answer chan 0xaf87cc10 (1) LOCK7 === ---> Tried and failed to get Lock #1 (sig_pri.c): MUTEX 288 pri_grab &pri->lock 0xb638a824 (0) LOCK3 === ------------------------------------------------------------------- === === Thread ID: 0xaf74ab90 (pbx_thread started at [ 5038] pbx.c ast_pbx_start()) === ---> Lock #0 (chan_dahdi.c): MUTEX 2072 my_lock_private &p->lock 0xb04f72f0 (1) LOCK8 === ------------------------------------------------------------------- === ======================================================================= By: Alec Davis (alecdavis) 2011-04-08 00:34:35 [2011-04-08 16:06:08.738] VERBOSE[23914] sig_pri.c: -- Accepting call from '5604819' to '8322' on channel 0/29, span 1 [2011-04-08 16:06:08.738] DEBUG[23929] app_queue.c: Device 'DAHDI/i1/5604819-246' changed to state '2' (In use) but we don't care because they're not a member of any queue. [2011-04-08 16:06:08.739] DEBUG[2343] pbx.c: Function result is '5604819' [2011-04-08 16:06:08.740] DEBUG[2343] pbx.c: Launching 'MYSQL' [2011-04-08 16:06:08.740] VERBOSE[2343] pbx.c: -- Executing [8322@incoming:2] MYSQL("DAHDI/i1/5604819-246", "Query ...) in new stack ... MYSQL ... [2011-04-08 16:06:08.741] DEBUG[2343] pbx.c: Launching 'Dial' [2011-04-08 16:06:08.741] VERBOSE[2343] pbx.c: -- Executing [8322@incoming:9] Dial("DAHDI/i1/5604819-246", "DAHDI/g0/033417052:8322,,r") in new stack [2011-04-08 16:06:08.741] DEBUG[2343] chan_dahdi.c: Using channel 1 [2011-04-08 16:06:08.741] DEBUG[2343] sig_pri.c: sig_pri_request 1 [2011-04-08 16:06:08.739] DEBUG[23914] chan_dahdi.c: Enabled echo cancellation on channel 29 [2011-04-08 16:06:08.741] DEBUG[2343] dsp.c: Setup tone 1100 Hz, 500 ms, block_size=160, hits_required=21 [2011-04-08 16:06:08.741] DEBUG[2343] dsp.c: Setup tone 2100 Hz, 2600 ms, block_size=160, hits_required=116 [2011-04-08 16:06:08.741] DEBUG[2343] devicestate.c: device 'DAHDI/i1/033417052:8322-247' state '2' [2011-04-08 16:06:08.741] ERROR[2343] lock.c: chan_dahdi.c line 2078 (my_unlock_private): mutex '&p->lock' freed more times than we've locked! [2011-04-08 16:06:08.741] ERROR[2343] lock.c: chan_dahdi.c line 2078 (my_unlock_private): Error releasing mutex: Operation not permitted By: Richard Mudgett (rmudgett) 2011-04-08 10:16:58 Look at -r301946. You are suffering from that bug at least. By: Richard Mudgett (rmudgett) 2011-05-06 14:21:18 alec: Did applying the patch of -r301946 take care of your problem? By: Alec Davis (alecdavis) 2011-05-06 16:09:22 rmudgett: not sure exactly, went to the current 1.8 branch SVN-branch-1.8-r315673M and havn't seen it since. By: Richard Mudgett (rmudgett) 2011-05-06 17:30:21 From the symptoms the -r301946 bug was the main culprit here. |