Summary: | ASTERISK-12685: One-touch parking results in stuck/deadlocked channel if parked channel hangs up while announcement still being played | ||
Reporter: | mdu113 (mdu113) | Labels: | |
Date Opened: | 2008-09-04 16:08:21 | Date Closed: | 2008-12-09 10:56:19.000-0600 |
Priority: | Minor | Regression? | No |
Status: | Closed/Complete | Components: | Resources/res_features |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 142474-console.txt ( 1) bug13425.diff ( 2) bug13425v2.patch ( 3) parking_crash1.txt ( 4) parking_crash2.txt ( 5) skipcdr.patch | |
Description: | This is a continuation of issue 11979. I discovered it just after I reported everything's fine and that issue got closed. I don't see a way to re-open it, probably because I'm not the original reporter, so I'm opening a new issue. The setup is exactly as I described in 11979: - features.conf: [featuremap] parkcall => *30 ; Park call (one step parking) [applicationmap] park_caller => *33,peer/callee,AGI,park.agi|park_caller park_callee => *33,peer/caller,AGI,park.agi|park_callee The scenario: 2 SIP endpoints: xyz011101(polycom 501) calls xyz110001(cisco 7960), call is connected (through FastAGI application), then I dial *30 on xyz011101 to park the peer and hangup the peer (xyz110001) while announcement being played to xyz011101. The result is xyz011101 is stucked. It appears in channel list after I hang it up on the phone and sometimes (intermittently) asterisk crashes or deadlocks after that. Here's the console output: -- Executing [1100@xyz:1] Set("SIP/xyz011101-081e15f8", "TIMEOUT(absolute)=10800") in new stack -- Channel will hangup at 2008-09-04 23:48:14 UTC. -- Executing [1100@xyz:2] AGI("SIP/xyz011101-081e15f8", "agi://devel.acecape.com") in new stack -- AGI Script Executing Application: (SetCallerPres) Options: (allowed_passed_screen) -- AGI Script Executing Application: (Set) Options: (CALLERID(name)=Unknown) -- AGI Script Executing Application: (Set) Options: (CALLERID(num)=Unknown) -- AGI Script Executing Application: (Dial) Options: (SIP/xyz110001|180|Kk) -- Called xyz110001 -- SIP/xyz110001-081e2b88 is ringing -- SIP/xyz110001-081e2b88 answered SIP/xyz011101-081e15f8 -- Started music on hold, class 'default', on channel 'SIP/xyz110001-081e2b88' == Parked SIP/xyz110001-081e2b88 on 701@parkedcalls. Will timeout back to extension [xyz] , 1 in 600 seconds -- <SIP/xyz011101-081e15f8> Playing 'digits/7' (language 'en') -- Stopped music on hold on SIP/xyz110001-081e2b88 == SIP/xyz110001-081e2b88 got tired of being parked == CDR on channel 'SIP/xyz110001-081e2b88' lacks end == CDR on channel 'SIP/xyz110001-081e2b88' lacks end [Sep 4 16:48:21] WARNING[24101]: res_features.c:1846 do_parking_thread: Whoa, failed to remove the extension! -- <SIP/xyz011101-081e15f8> Playing 'digits/0' (language 'en') -- <SIP/xyz011101-081e15f8> Playing 'digits/1' (language 'en') -- Added extension 'isk/var/lib/asterisk/sounds/digits/en/1.al' priority 1 to parkedcalls At this point I hangup xyz011101 and do "show channels": webdevel*CLI> show channels Channel Location State Application(Data) SIP/xyz011101-081e15 s@xyz:1 Up Dial(SIP/xyz110001|180|Kk) 1 active channel 1 active call A few minutes later asterisk crashed ****** ADDITIONAL INFORMATION ****** In the same scenario if I initiate one-touch parking on the callee (xyz110001) and hangup xyz011101 during parking announcement the result is about the same: -- Executing [1100@xyz:1] Set("SIP/xyz011101-081e0d00", "TIMEOUT(absolute)=10800") in new stack -- Channel will hangup at 2008-09-05 00:12:28 UTC. -- Executing [1100@xyz:2] AGI("SIP/xyz011101-081e0d00", "agi://devel.acecape.com") in new stack -- AGI Script Executing Application: (SetCallerPres) Options: (allowed_passed_screen) -- AGI Script Executing Application: (Set) Options: (CALLERID(name)=Unknown) -- AGI Script Executing Application: (Set) Options: (CALLERID(num)=Unknown) -- AGI Script Executing Application: (Dial) Options: (SIP/xyz110001|180|Kk) -- Called xyz110001 -- SIP/xyz110001-081e5d98 is ringing -- SIP/xyz110001-081e5d98 answered SIP/xyz011101-081e0d00 -- Started music on hold, class 'default', on channel 'SIP/xyz011101-081e0d00' == Parked SIP/xyz011101-081e0d00 on 701@parkedcalls. Will timeout back to extension [xyz] s, 1 in 600 seconds -- <SIP/xyz110001-081e5d98> Playing 'digits/7' (language 'en') -- Stopped music on hold on SIP/xyz011101-081e0d00 == SIP/xyz011101-081e0d00 got tired of being parked [Sep 4 17:12:34] WARNING[24281]: channel.c:1258 ast_channel_free: PBX may not have been terminated properly on 'SIP/xyz011101-081e0d00' == CDR on channel 'SIP/xyz011101-081e0d00' lacks end == CDR on channel 'SIP/xyz011101-081e0d00' lacks end [Sep 4 17:12:34] WARNING[24281]: res_features.c:1846 do_parking_thread: Whoa, failed to remove the extension! -- <SIP/xyz110001-081e5d98> Playing 'digits/0' (language 'en') -- <SIP/xyz110001-081e5d98> Playing 'digits/1' (language 'en') -- Added extension ' ' priority 1 to parkedcalls At this point I hangup xyz110001 and do "show channels": webdevel*CLI> show channels Channel Location State Application(Data) SIP/xyz110001-081e5d (None) Up AppDial((Outgoing Line)) 1 active channel 1 active call | ||
Comments: | By: Jeff Peeler (jpeeler) 2008-09-12 13:18:52 This patch fixes the problem for me. Test and make sure it works for you as well. By: mdu113 (mdu113) 2008-09-15 13:55:43 I've updated code to r142927 and applied your patch. Still a lot of problems. parking_crash1.txt - console output and backtrace of the following scenario: A calls B, B answers, A dials feature code to park call. In other words one-touch parking initiated by caller leads to the crash now. parking_crash2.txt - console output and backtrace of the following scenario: A calls B, B answers and diales feature code to park A. At this point everything fine and A is parked at extension 701, i.e. one-touch parking initiated by callee seems to work fine now. Now I wanted to test how it would behave if parking fails, so I made my AGI application to set PARKINGEXTEN variable to 701 for the next call and to the following: C calls B (PARKINGEXTEN=701 is set), B answers and dials feature code to park C. At this point asterisk crashes. gdb suggests that it crashed in ast_cdr_start(), so it may be actually cdr problem, I don't know. Also I wanted to say a few words about your patch. If I understand it correctly you now chose to masquerade parked channel if feature is running on peer. I believe there's a problem with it in current implementation. Imagine the following situation: A calls B, B answers and attempts to park A. Suppose the parking is failed for any reason. Again if my understanding of the code is correct I believe the call will be dropped in this case, because by the moment the failure is determined (in park_call_full()) the channel already masqueraded away. I hope you'd agree with me that in case of one-touch parking, parking failure shouldn't cause a call to be dropped as it was the case previously. I think you need either perform all the checks that may fail before masquerading or somehow bridge the channel back after masquerading in case of failure. By: mdu113 (mdu113) 2008-09-15 14:30:58 Correction. Crash on parking initiated by caller (parking_crash1.txt) doesn't seem to be related to your patch. I removed your patch completely and still see the same crash. Looks like some other change made between r141028 and r142927 causes it, because r141028 didn't have this problem. By: mdu113 (mdu113) 2008-09-15 14:50:43 Ok, the problem is caused by r142675 by murf. I cannot reopen the issue he was trying solve by that commit (13251) since I'm not the original reporter. jpeeler, may be you want to notify him? By: Jeff Peeler (jpeeler) 2008-09-15 15:02:41 mdu113: Thanks for the updates. I'll look into the problem when parking fails after the masquerade has been done. I did try solving the problem without masquerading first, it ended up not being the right approach. Also, I'll reopen 13251 for you so that you can comment there about this new crash. By: Jeff Peeler (jpeeler) 2008-09-15 16:13:59 Were both crashes caused by the CDR changes? If so, I should go ahead and commit this as the old code also hangs up the call if parking fails. Bridging the channels back together in the event of a parking failure should be tracked in a new bug report. You can assign it to me if you like. By: mdu113 (mdu113) 2008-09-15 16:47:30 Yes, the 2nd crash (parking_crash2.txt) also seems to be related to cdr change in r142575 (issue 13364 reported by me). If I downgrade to the previous revision 142474, it's no longer crashes. But I see another problem that is actually seems to be parking related. The detailed information in attached 142474-console.txt. Scenario: A calls B (PARINGEXTEN=701 set), B answers and parks A (it happens regardless of who initiated parking). A is parked at extension 701@parkedcalls. A hangs up. "dialplan show" reveals that extension 701@parkedcalls isn't released and all subsequent parking attempts fail, because park_call_full() uses ast_exists_extension() to check parking space availability. If I don't use PARKINGEXTEN, then it all works, as that code path just iterates through parking lot. I think parking extension should be removed after parked channel hangs up. By: Jeff Peeler (jpeeler) 2008-09-15 18:07:00 Oops, that was an oversight. Should be fixed now. By: Digium Subversion (svnbot) 2008-09-16 11:04:50 Repository: asterisk Revision: 143204 U branches/1.4/res/res_features.c ------------------------------------------------------------------------ r143204 | jpeeler | 2008-09-16 11:04:46 -0500 (Tue, 16 Sep 2008) | 10 lines (closes issue ASTERISK-12715) (closes issue ASTERISK-12685) Reported by: mdu113 Patches: bug13425.diff uploaded by jpeeler (license 325) Tested by: mdu113 The main change here was to masquerade the channel if the channel that was to be parked was running a PBX on it. The PBX thread can then maintain full control of the channel (the zombie) as it expects to while allowing the parking thread full control of the real (parked) channel. ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=143204 By: mdu113 (mdu113) 2008-09-17 16:40:48 jpeeler, sorry, it's still not completely fixed. The deadlock still happens if parked channel hangs up while announcement is being played. It's just harder to catch now, because it depends on the moment when hangup occurs. The scenario exactly like in initial report. For the caller-initiated parking I can make it deadlock quite reliably if I hangup parked channel during announcement playing first digit (7). -- Executing [1100@xyz:1] Set("SIP/xyz011101-081edf10", "TIMEOUT(absolute)=10800") in new stack -- Channel will hangup at 2008-09-18 00:38:29 UTC. -- Executing [1100@xyz:2] AGI("SIP/xyz011101-081edf10", "agi://devel.acecape.com") in new stack -- AGI Script Executing Application: (SetCallerPres) Options: (allowed_passed_screen) -- AGI Script Executing Application: (Set) Options: (CALLERID(name)=Unknown) -- AGI Script Executing Application: (Set) Options: (CALLERID(num)=Unknown) -- AGI Script Executing Application: (Dial) Options: (SIP/xyz110001|180|kK) -- Called xyz110001 -- SIP/xyz110001-081f8a40 is ringing -- SIP/xyz110001-081f8a40 answered SIP/xyz011101-081edf10 -- Started music on hold, class 'default', on channel 'SIP/xyz110001-081f8a40' == Parked SIP/xyz110001-081f8a40 on 701@parkedcalls. Will timeout back to extension [xyz] , 1 in 600 seconds -- Added extension '701' priority 1 to parkedcalls -- <SIP/xyz011101-081edf10> Playing 'digits/7' (language 'en') -- Stopped music on hold on SIP/xyz110001-081f8a40 == SIP/xyz110001-081f8a40 got tired of being parked -- <SIP/xyz011101-081edf10> Playing 'digits/0' (language 'en') -- <SIP/xyz011101-081edf10> Playing 'digits/1' (language 'en') webdevel*CLI> show channels Channel Location State Application(Data) SIP/xyz011101-081edf s@xyz:1 Up Dial(SIP/xyz110001|180|kK) 1 active channel 1 active call Timing is important now. If I do it a little earlier or later it works. The callee-initiated parking produces the following stucked channels: Channel Location State Application(Data) SIP/xyz110001-081f17 (None) Up AppDial((Outgoing Line)) Parked/SIP/xyz011101 s@xyz:1 Down Dial(SIP/xyz110001|180|kK) 2 active channels 1 active call FYI: I was testing it by applying your patches to the r142474 to avoid interference with problems introduced by later commits that I described in issues 13506 and 13507 Please let me know if you need any additional info By: Jeff Peeler (jpeeler) 2008-09-18 15:12:35 Can you try this with the latest code? That's what I am testing with currently and I can not reproduce this problem. I do see some locking problems, but it seems to be related to the new CDR changes. If not, which patches are you applying? You would need to be applying the one from this bug as well as 13493 to have all the parking changes. By: mdu113 (mdu113) 2008-09-18 15:59:17 Yes, I'm applying both patches. If it's hard for you to reproduce it I think it's because my testing machine is really an old and slow one (p3-800) and since it's timing dependent it might be hard for you to catch the right moment on the faster machine. I'm off today and can't test it at home. I'll try my best to test it tomorrow, but can't promise it as tomorrow is going to be a crazy day for me. I promise to test it Monday the latest. By: Jeff Peeler (jpeeler) 2008-09-18 18:28:27 Well no rush, I'm going to be out of town almost until the end of the month. By then I hope all the CDR problems will be ironed out and we can finish this. By: mdu113 (mdu113) 2008-09-22 13:22:18 Ok, here's my findings. First of all I couldn't test it with latest code because it crashes (ASTERISK-12747), so I did testing with r142474 again. I applied this patch and then patch from ASTERISK-12739 in that order (otherwise it doesn't apply and if I don't apply patch from ASTERISK-12739 then the problem with stale parking extension reappear). Deadlock still happens and I see no changes. Moreover I found another problem. if Park application called from AGI script asterisk crashes after playing announcement. I think this is related to issues 13465, 13434, 11979. In this case masquerading isn't used and AGI still returns wrong return code. May be using masquerading in all cases is safer? By: Jeff Peeler (jpeeler) 2008-09-30 15:50:44 This is getting too confusing... you're going to have to use the latest code. All code changes don't necessarily get posted as a patch. I made a change fixing a potential parking crash that was done after the revision you are testing with. Update to the latest code and apply the skipcdr.patch if necessary to avoid the CDR problems. I'd like to verify that you can then still reproduce the hung channels. I can't reproduce it here. By: mdu113 (mdu113) 2008-09-30 17:19:31 Unfortunately bug13425.diff doesn't apply to the latest code r145262. root@webdevel:~/src/voip/asterisk-svn/branches/1.4# patch -p0 < bug13425.diff patching file res/res_features.c Hunk #2 succeeded at 415 with fuzz 2. Hunk #3 FAILED at 450. Hunk #4 succeeded at 529 with fuzz 2 (offset 46 lines). Hunk ASTERISK-1 FAILED at 551. Hunk ASTERISK-2 FAILED at 569. Hunk ASTERISK-3 FAILED at 625. Hunk ASTERISK-4 FAILED at 1991. 5 out of 8 hunks FAILED -- saving rejects to file res/res_features.c.rej By: mdu113 (mdu113) 2008-09-30 17:21:29 BTW, do you want me to open another issue on crash after Park application was called by AGI? By: Jeff Peeler (jpeeler) 2008-09-30 17:27:26 You don't need to apply any patches besides the skipcdr.patch when using the latest code as they are already present. First I want you to test with the latest code and check if the hung channel is present. Then, we can also check about the AGI problem. By: mdu113 (mdu113) 2008-10-01 10:29:17 You're right. Latest code + skipcdr.patch and I cannot reproduce the problem. Seems to work fine. Thanks a lot and sorry for the confusion. By: Jeff Peeler (jpeeler) 2008-10-01 13:13:46 Are all the AGI problems fixed as well? By: mdu113 (mdu113) 2008-10-01 13:33:48 No, AGI problem is still there. As far as I understand in case of direct parking by Park application the masquerading is not used and agi_exec_full() still returns incorrect code. By: Jeff Peeler (jpeeler) 2008-10-01 18:25:32 agi_exec_full must return 0 upon success as described in 13434, so I do believe masquerading is the only solution. New patch added. By: mdu113 (mdu113) 2008-10-02 11:56:33 Yes, it fixes the problem. Just out of curiousity, why not using masquerading all the time instead of adding ifs here and there? By: David Brillert (aragon) 2008-10-03 15:06:33 jpeeler just out of curiosity does this commit also close this bug or this a separate issue? Revision: 146026 U branches/1.4/res/res_features.c By: Jeff Peeler (jpeeler) 2008-10-03 15:26:06 mdu113: Has to do with which thread has control of the channel. Asterisk is written in such a way as each thread expects full control of the channel. In this case (but similar to the other parking fix), the control of the channel is being passed to the parking thread. If a masquerade were not done (you witnessed this) the two threads could potentially fight over the same channel. If one were to create a channel for no reason and then masquerade it, odd things could happen such as a channel not getting hung up. aragon: Pretty sure that is a separate issue. The original reported issue has already been solved, but another issue was also mentioned and will be committed shortly. Should have probably been in a separate bug report so this one could be closed already. By: Digium Subversion (svnbot) 2008-10-03 15:34:16 Repository: asterisk Revision: 146129 U branches/1.4/include/asterisk/features.h U branches/1.4/res/res_agi.c U branches/1.4/res/res_features.c ------------------------------------------------------------------------ r146129 | jpeeler | 2008-10-03 15:34:16 -0500 (Fri, 03 Oct 2008) | 7 lines (closes issue ASTERISK-12685) Reported by: mdu113 Tested by: mdu113 Similar to r143204, masquerade the channel in the case of Park being called from AGI. ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=146129 |