Summary: | ASTERISK-12842: [patch] Crash in cdr code in specific one-touch parking scenario | ||
Reporter: | mdu113 (mdu113) | Labels: | |
Date Opened: | 2008-10-07 12:34:16 | Date Closed: | 2008-10-29 08:11:09 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | CDR/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 13640.14.planc4.diffs ( 1) 13640.trunk.planc3.diffs ( 2) bt.txt | |
Description: | Asterisk crashes in cdr code in ast_bridge_call() in the following one-touch parking scenario: A calls B, B answers, A dials feature code to park B. Asterisk plays MOH to B and parking announcement to A. Now if B hangs up during parking announcement is being played to the A then asterisk crashes immediately after it ends playing parking announcement to A. I'm attaching backtrace in bt.txt | ||
Comments: | By: mdu113 (mdu113) 2008-10-09 10:49:56 Guys, I'm sorry. I don't know what I've been thinking when opening this bug report, but the issue description is incorrect. Probably trying to do too many things at a time. :( The crash and backtrace is from the following scenario: A calls B, B answers, A dials feature code to park B. Asterisk plays MOH to B and parking announcement to A. Now if B hangs up during parking announcement is being played to the A then asterisk crashes immediately after it ends playing parking announcement. Sorry for the confusion. By: Steve Murphy (murf) 2008-10-22 14:44:00 OK, Russell did not like the fact that my change required a call to the channel_find func at the end of every bridge, so I changed it back, and spent the last two days theorizing, plotting, formulating, experimenting, and came to these conclusions: 1. The park stuff should NOT be calling ast_hangup(), which will immediately terminate a channel that still has refs in the bridge floating around. 2. Most channel drivers will mark the channel hung up when you hang it up. At least, the dahdi chans appear to, even if the channel struct itself is still alive and in the channel list. 3. The Dial app has provisions to NOT hangup the peer channel when it sees NO_HANGUP_PEER as a return value from the ast_bridge_call, which is pretty much what's going to happen if the builtin_xfer or park_call_full funcs are activated via the feature stuff. The diff is that if peer hung up in the process, the ast_check_hangup() call return TRUE. I've attached patches for 1.4 and trunk, that exploit the above facts to free up the peer channel after the bridge ends, rather than immediately on hangup of the peer phone, and do it ONLY if the peer channel has been hung up on the far end. Also, I've got some debug thrown in that I'll remove before committing (I promise), and I had to apply some fixes to the trunk version of dial, who was 'poisoning' the result of the ast_bridge_call with peer hangup/gosub execution. (I added that stuff, how sloppy of me). Left to do: look at the other cases of calling ast_hangup() in the parking code; and make sure any similar code in app_queue is also fixed in a similar manner. By: mdu113 (mdu113) 2008-10-22 15:10:37 murf, thanks for you work. We're now in the process of moving our offices to the new location and I won't be able to test your patch until things are a little settle down here. I'll do my testing as soon as I can, but probably no earlier than some time next week. By: Steve Murphy (murf) 2008-10-23 08:53:24 mdu113- It is just as well... It is hoped the new patches I just posted here, labeled "plan C", will be acceptable to reviewers. I would especially like to thank Josh Colp for the time he took to explain to me not only the rules behind parking and what goes on, but why. I would like to think that his time with me was not in vain! The previous attempts to fix this problem were done with big misunderstandings about what was going on with parking, or just big gaps in my understanding. As it turns out, the last attempt was like trying to grab the animal at the wrong end, and getting bitten in the process. The trick is using the results of ast_bridge_call to determine whether or not the peer pointer is still valid, and separating park results from other transfers. To do this, I invented a new result code, AST_PBX_NO_HANGUP_PEER_PARKED, and having the parking code return these, as opposed to the builtins for blind and attended transfers. All peer referencing code, after the call to ast_bridge_call, should be done inside tests for these result codes. I have documented some of the reasoning in the features.c/ res_features.c code itself. By: Steve Murphy (murf) 2008-10-23 12:34:28 Hold the presses. Plan C still survives, but crashes when the dialing party gets parked, and hangs up while the answering party is getting the announcement. It's a matter now of AST_PBX_KEEPALIVE being used to keep you away from accessing CHAN. Another patch will be forthcoming... By: Steve Murphy (murf) 2008-10-23 16:07:49 OK, I've tested and reviewed the code once more. The KEEPALIVES after the ast_bridge_call in both app_dial and app_queue are in place. Major weaknesses in the code are that when a park is performed, you most likely will not get the hangup extension, and some channel variables will not be updated. the planc3 stuff has been tested on a symmetric test set: 1. A calls B; B answers; A parks B; B hangs up while A is getting the parking slot announcement, immediately after being put on hold. 2. A calls B; B answers; A parks B; B hangs up after A has been hung up, but before the park times out. 3. A calls B; B answers; B parks A; A hangs up while B is getting the parking slot announcement, immediately after being put on hold. 4. A calls B; B answers; B parks A; A hangs up after B has been hung up, but before the park times out. If you can survive all 4 of these scenarios, then parking is solid. Then, I ran all 4 scenarios under valgrind, and had a problem with the pbx thread not dying fast enough, and got some memory violations when the pbx thread kept running on a freed channel... By: mdu113 (mdu113) 2008-10-27 17:07:42 I can't apply 13640.14.planc3.diffs to current svn in 1.4 branch: root@devel:~/src/voip/asterisk-1.4/svn/src# svn info Path: . URL: http://svn.digium.com/svn/asterisk/branches/1.4 Repository Root: http://svn.digium.com/svn/asterisk Repository UUID: 4a481080-769f-4faf-86ef-b2ff1d3f910b Revision: 152216 Node Kind: directory Schedule: normal Last Changed Author: tilghman Last Changed Rev: 152215 Last Changed Date: 2008-10-27 17:32:00 -0400 (Mon, 27 Oct 2008) root@devel:~/src/voip/asterisk-1.4/svn/src# patch -p0 < 13640.14.planc3.diffs patching file apps/app_dial.c Hunk #2 FAILED at 111. Hunk #3 FAILED at 1787. Hunk #4 FAILED at 1815. Hunk ASTERISK-1 FAILED at 1947. Hunk ASTERISK-2 FAILED at 1987. 5 out of 6 hunks FAILED -- saving rejects to file apps/app_dial.c.rej patching file apps/app_queue.c Hunk #1 FAILED at 3757. Hunk #2 succeeded at 3196 (offset -590 lines). 1 out of 2 hunks FAILED -- saving rejects to file apps/app_queue.c.rej patching file include/asterisk/features.h Hunk #1 FAILED at 61. 1 out of 1 hunk FAILED -- saving rejects to file include/asterisk/features.h.rej patching file include/asterisk/pbx.h Hunk #1 FAILED at 37. 1 out of 1 hunk FAILED -- saving rejects to file include/asterisk/pbx.h.rej can't find file to patch at input line 294 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- |Index: main/features.c |=================================================================== |--- main/features.c (revision 151682) |+++ main/features.c (working copy) -------------------------- File to patch: By: Steve Murphy (murf) 2008-10-27 18:18:30 Hmmm. I updated 1.4; and reran the diff; I uploaded the diffs. See 13640.14.planc4.diffs) Before applying the patches, verify with "svn status" that no other patches are present. By: mdu113 (mdu113) 2008-10-28 17:12:06 Seems to work fine now. Can't reproduce the crash. Thanks a lot. By: Steve Murphy (murf) 2008-10-29 08:11:07 Hmmm. I thought I did the template thing in the commit message, but I must have forgot. See 1.4 v. 152535, trunk v. 152536, and 1.6.1 v. 152537 |