[Home]

Summary:ASTERISK-12842: [patch] Crash in cdr code in specific one-touch parking scenario
Reporter:mdu113 (mdu113)Labels:
Date Opened:2008-10-07 12:34:16Date Closed:2008-10-29 08:11:09
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:CDR/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 13640.14.planc4.diffs
( 1) 13640.trunk.planc3.diffs
( 2) bt.txt
Description:Asterisk crashes in cdr code in ast_bridge_call() in the following one-touch parking scenario: A calls B, B answers, A dials feature code to park B. Asterisk plays MOH to B and parking announcement to A. Now if B hangs up during parking announcement is being played to the A then asterisk crashes immediately after it ends playing parking announcement to A.
I'm attaching backtrace in bt.txt
Comments:By: mdu113 (mdu113) 2008-10-09 10:49:56

Guys, I'm sorry. I don't know what I've been thinking when opening this bug report, but the issue description is incorrect. Probably trying to do too many things at a time. :(
The crash and backtrace is from the following scenario: A calls B, B answers, A dials feature code to park B. Asterisk plays MOH to B and parking announcement to A. Now if B hangs up during parking announcement is being played to the A then asterisk crashes immediately after it ends playing parking announcement.
Sorry for the confusion.

By: Steve Murphy (murf) 2008-10-22 14:44:00

OK, Russell did not like the fact that my change required
a call to the channel_find func at the end of every bridge,
so I changed it back, and spent the last two days theorizing,
plotting, formulating, experimenting, and came to these
conclusions:

1. The park stuff should NOT be calling ast_hangup(),
which will immediately terminate a channel that still
has refs in the bridge floating around.

2. Most channel drivers will mark the channel hung up
when you hang it up. At least, the dahdi chans appear
to, even if the channel struct itself is still alive
and in the channel list.

3. The Dial app has provisions to NOT hangup
the peer channel when it sees NO_HANGUP_PEER
as a return value from the ast_bridge_call,
which is pretty much what's going to happen
if the builtin_xfer or park_call_full funcs
are activated via the feature stuff. The diff
is that if peer hung up in the process, the
ast_check_hangup() call return TRUE.

I've attached patches for 1.4 and trunk,
that exploit the above facts to free up
the peer channel after the bridge ends,
rather than immediately on hangup of the
peer phone, and do it ONLY if the peer
channel has been hung up on the far end.

Also, I've got some debug thrown in that
I'll remove before committing (I promise),
and I had to apply some fixes to the trunk
version of dial, who was 'poisoning' the
result of the ast_bridge_call with peer
hangup/gosub execution. (I added that stuff,
how sloppy of me).

Left to do: look at the other cases
of calling ast_hangup() in the parking
code; and make sure any similar code
in app_queue is also fixed in a similar
manner.

By: mdu113 (mdu113) 2008-10-22 15:10:37

murf, thanks for you work.
We're now in the process of moving our offices to the new location and I won't be able to test your patch until things are a little settle down here. I'll do my testing as soon as I can, but probably no earlier than some time next week.

By: Steve Murphy (murf) 2008-10-23 08:53:24

mdu113-

It is just as well...

It is hoped the new patches I just posted here,
labeled "plan C", will be acceptable to reviewers.

I would especially like to thank Josh
Colp for the time he took to explain
to me not only the rules behind parking
and what goes on, but why. I would like
to think that his time with me was not
in vain!

The previous attempts to fix this problem
were done with big misunderstandings about
what was going on with parking, or just
big gaps in my understanding. As it turns
out, the last attempt was like trying
to grab the animal at the wrong end, and
getting bitten in the process.

The trick is using the results of ast_bridge_call
to determine whether or not the peer pointer
is still valid, and separating park results
from other transfers. To do this, I invented
a new result code, AST_PBX_NO_HANGUP_PEER_PARKED,
and having the parking code return these, as opposed
to the builtins for blind and attended transfers.

All peer referencing code, after the call
to ast_bridge_call, should be done inside
tests for these result codes.

I have documented some of the reasoning
in the features.c/ res_features.c code
itself.

By: Steve Murphy (murf) 2008-10-23 12:34:28

Hold the presses. Plan C still survives, but crashes when the dialing party gets parked, and hangs up while the answering party is getting the announcement. It's a matter now of AST_PBX_KEEPALIVE being used to keep you away from accessing CHAN.
Another patch will be forthcoming...

By: Steve Murphy (murf) 2008-10-23 16:07:49

OK, I've tested and reviewed the code once more. The KEEPALIVES after the ast_bridge_call in both app_dial and app_queue are in place.

Major weaknesses in the code are that when a park is performed, you most
likely will not get the hangup extension, and some channel variables will not
be updated.

the planc3 stuff has been tested on a symmetric test set:

1. A calls B; B answers; A parks B; B hangs up while A is getting the parking
slot announcement, immediately after being put on hold.

2. A calls B; B answers; A parks B; B hangs up after A has been hung up, but
before the park times out.

3. A calls B; B answers; B parks A; A hangs up while B is getting the parking slot announcement, immediately after being put on hold.

4. A calls B; B answers; B parks A; A hangs up after B has been hung up, but before the park times out.

If you can survive all 4 of these scenarios, then parking is solid.

Then, I ran all 4 scenarios under valgrind, and had a problem with
the pbx thread not dying fast enough, and got some memory violations
when the pbx thread kept running on a freed channel...

By: mdu113 (mdu113) 2008-10-27 17:07:42

I can't apply 13640.14.planc3.diffs to current svn in 1.4 branch:
root@devel:~/src/voip/asterisk-1.4/svn/src# svn info
Path: .
URL: http://svn.digium.com/svn/asterisk/branches/1.4
Repository Root: http://svn.digium.com/svn/asterisk
Repository UUID: 4a481080-769f-4faf-86ef-b2ff1d3f910b
Revision: 152216
Node Kind: directory
Schedule: normal
Last Changed Author: tilghman
Last Changed Rev: 152215
Last Changed Date: 2008-10-27 17:32:00 -0400 (Mon, 27 Oct 2008)

root@devel:~/src/voip/asterisk-1.4/svn/src# patch -p0 < 13640.14.planc3.diffs
patching file apps/app_dial.c
Hunk #2 FAILED at 111.
Hunk #3 FAILED at 1787.
Hunk #4 FAILED at 1815.
Hunk ASTERISK-1 FAILED at 1947.
Hunk ASTERISK-2 FAILED at 1987.
5 out of 6 hunks FAILED -- saving rejects to file apps/app_dial.c.rej
patching file apps/app_queue.c
Hunk #1 FAILED at 3757.
Hunk #2 succeeded at 3196 (offset -590 lines).
1 out of 2 hunks FAILED -- saving rejects to file apps/app_queue.c.rej
patching file include/asterisk/features.h
Hunk #1 FAILED at 61.
1 out of 1 hunk FAILED -- saving rejects to file include/asterisk/features.h.rej
patching file include/asterisk/pbx.h
Hunk #1 FAILED at 37.
1 out of 1 hunk FAILED -- saving rejects to file include/asterisk/pbx.h.rej
can't find file to patch at input line 294
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: main/features.c
|===================================================================
|--- main/features.c    (revision 151682)
|+++ main/features.c    (working copy)
--------------------------
File to patch:

By: Steve Murphy (murf) 2008-10-27 18:18:30

Hmmm. I updated 1.4; and reran the diff; I uploaded the diffs. See  13640.14.planc4.diffs)

Before applying the patches, verify with "svn status" that no other
patches are present.

By: mdu113 (mdu113) 2008-10-28 17:12:06

Seems to work fine now. Can't reproduce the crash. Thanks a lot.

By: Steve Murphy (murf) 2008-10-29 08:11:07

Hmmm. I thought I did the template thing in the commit message, but I must
have forgot.

See 1.4 v. 152535, trunk v. 152536, and 1.6.1 v. 152537