Description:SIP blind transfer does not work: Transferree does not enter the transfer context.

****** STEPS TO REPRODUCE ******

1. Set global variable TRANSFER_CONTEXT=from-xfer
2. Set the context from-xfer to print anything:
 context from-xfer {
   _[!-z]. => Verbose(1,Transfer to ${EXTEN});
3. Establish a call from device A and device B and make B send a REFER to transfer A to C.

Verbose never reached.


This is likely not related to deadlock 18403

After some debugging, I understand perfectly now why the error happens, but have only a vague idea how to fix. Help appreciated.

1. REFER is handled in handle_request_refer() on B's channel. Line 22223 calls ast_async_goto() on A's channel to transfer it to the transfer context. Since chan->pbx exists, ast_async_goto() calls ast_explicit_goto(), and sets AST_SOFTHANGUP_ASYNCGOTO into A's chan->_softhangup.

2. On another thread, __ast_read() is called by generic_bridge(). In line 3622, a control hangup frame is enqueued:
  if (ast_check_hangup(chan)) { \\ ast_queue_control(chan, AST_CONTROL_HANGUP);

In the same function later, the frame is dequeued, and set to indicate hangup, line 3748:

if (f->frametype == AST_FRAME_CONTROL && f->subclass.integer == AST_CONTROL_HANGUP) {
 . . .
 f = NULL;

and later, line 4077,

 chan->_softhangup |= AST_SOFTHANGUP_DEV;

3. Next, on A's PBX service thread, in __ast_pbx_run() line 4724, the following condition is supposed to break out of the loop and begin processing of the extension previously

 } else if (c->_softhangup == AST_SOFTHANGUP_ASYNCGOTO) {
   c->_softhangup = 0;

but it does not happen, because _softhangup is now set to AST_SOFTHANGUP_DEV|AST_SOFTHANGUP_ASYNCGOTO. So the while loop is not broken, and PBX service ends on the next iteration by way _softhangup being non-zero in ast_check_hangup.

A little surprising how that is supposed to work. I cannot be only one out there hitting the race condition at this point.

As for fixing that, I need an advice what is the right way to do that:
1. Most radical: ignore AST_SOFTHANGUP_ASYNCGOTO bit when testing _softhangup in ast_check_hangup(). Perhaps too radical?
2. Ignore AST_SOFTHANGUP_ASYNCGOTO only in __ast_read when calling ast_check_hangup().
3. Other?
By: Kirill Katsnelson (kkm) 2010-12-21 23:56:22.000-0600

And that changeset indeed fixes the reported issue.

By: Kirill Katsnelson (kkm) 2010-12-22 01:15:44.000-0600

And it is definitely the same issue as ASTERISK-16847. D-oh!

By: John Hass (john8675309) 2010-12-23 15:31:33.000-0600

Even after this patch some call transfers work and others do not, I can do the redirect 10 times and it will work perfectly but sometimes, it will hangup after the 11th sometimes it will hangup on the first.

By: Kirill Katsnelson (kkm) 2010-12-23 22:45:49.000-0600

john8675309: could you please check if the attached patch 18516-kkm-maybefix-1.patch fixes your problem?

By: John Hass (john8675309) 2010-12-24 10:30:41.000-0600

kkm: yes the kkm patch stops it from hanging up, however now when doing a redirect with ExtraChannel: the ExtraChannel is hung up on.  I did a clean install of with just the kkm patch.

By: Kirill Katsnelson (kkm) 2010-12-24 14:59:28.000-0600

john8675309: Looks like there is more problems to it, and I am just an asterisk user like you, trying to come up with immediate fixes. The patch I attached is how I fixed the problem in this ticket, before deciding to go with the "official" changeset from the 1.8.2 branch.

I suggest you open a new ticket and give a reproduction for your problem, stating which fixes you tried and what they changed. Your scenario of use is clearly different than mine.

Try also 1.8.2: it is already in rc1, might be stable enough to use.

By: Leif Madsen (lmadsen) 2011-01-04 14:16:55.000-0600

So this issue can be closed as resolved then?

By: Kirill Katsnelson (kkm) 2011-01-04 14:20:26.000-0600

Yes please. The change from the 1.8.2 branch fixed it completely for me.

Also the attached "patch" may be confusing. It was a very temporary experimental fix which is incorrect. Maybe it is better to delete it, so people who search the tracker won't be confused? Up to you, anyhow.

By: gb_delti (gb_delti) 2011-01-06 07:32:24.000-0600

I have the same issue here on an Asterisk system. I could provide log files. Do I have to create a new issue for or will the patch fix it in the next 1.6.2.x version?

By: Malcolm Davenport (mdavenport) 2011-01-19 08:17:16.000-0600

The above-referenced patch didn't go into 1.6.2.x until