[Home]

Summary:ASTERISK-11939: SIP channel protocol illegally reverses direction when ringing channel AMI redirected (to parked channel)
Reporter:David Woolley (davidw)Labels:
Date Opened:2008-04-29 09:47:17Date Closed:2011-06-07 14:03:18
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) \\sipreverse.txt
( 1) calltoIAXnoringback.txt
( 2) SIPchannelstates.txt
Description:An outgoing call to a SIP channel was initiated using AMI Originate; the first channel was a Local/ channel.  Whilst the call was still ringing, the Local channel was AMI Redirected to a parked channel in the parking lot (and drops out of the configuration).  A short time later, the call fails within Asterisk, with the following diagnostics:

[Apr 27 11:09:03] WARNING[10117]: chan_sip.c:1948 retrans_pkt: Maximum retries e
xceeded on transmission 4dd8e816320e6fac293336971fb18544@192.168.130.116 for seq
no 102 (Critical Response)
[Apr 27 11:09:03] WARNING[10117]: chan_sip.c:1972 retrans_pkt: Hanging up call 4
dd8e816320e6fac293336971fb18544@192.168.130.116 - no reply to our critical packe
t.

The destination phone continues to ring for some time after this.

If the call is answered before the Redirect, there is no problem.

Looking at the SIP traces indicates that Asterisk starts sending OK's (for NOTIFY) in the same direction as it sent the NOTIFY's for the call setup.  The called end ignores these.  If it answers after Asterisk has abandoned it, it Asterisk ignores its OK's, in the correct direction, resulting in the phone timing out before it considers the call dead.


****** ADDITIONAL INFORMATION ******

I've put this down against the SIP channel, but it possible that it is somewhere else that is giving it bad instructions.  Although this is a transfer situation within Asterisk, it is not a transfer for SIP.

I'll upload SIP traces as soon as I've checked them for personal and commercially sensitive information.

The same happens for an X-Lite (the trace), using a named channel, and for an anonymous trunk.

The same happens whether the parked channel is SIP or IAX.

I think, when we tried to answer the call after the redirect, but before the timeout, we lost the call, but there may have been a race condition with the timeout.

I am not convinced that the Local channel or the use of AMI Originate are factors here; they are just difficult to eliminate from the test configuration.
Comments:By: David Woolley (davidw) 2008-04-29 12:43:14

Although it is not a particularly important configuration for us, we tried it with IAX as the destination.  It appeared to work with the exception that there was no ring back tone after the transfer (the IAX phone continued to ring and the phone that started parked did not clear).  The speech path established when the IAX was eventually answered.

The IAX channel showed an up state, whilst it was still ringing:

         State: Up (6)
         Rings: 0

I'll attach the fully version of that channel state.

By: David Woolley (davidw) 2008-04-29 12:51:40

I've also uploaded the SIP channel states for the outgoing channel:

1) before the redirect
2) after the redirect
3) after the protocol failure is detected.

They look to be the same, and don't reflect the fact that Asterisk has started sending OKs, rather than waiting for them.

By: David Woolley (davidw) 2008-05-01 12:52:41

I'm beginning to think the real problem here is the answer associated with the transition to up, and so not really in the SIP channel.

By: David Woolley (davidw) 2008-05-02 09:29:53

Could this possibly be re-categorised as Core/Channels.

Where I think things start to break is that ast_do_masquerade doesn't propagate the AST_FLAG_OUTGOING flag, so, when the dial to the parking lot tries to do ast_bridge_call, that calls ast_answer, which should drop out because it is an outgoing call, but instead calls the technology answer routine, which starts sending the OKs.

I don't know if this is enough, or whether there is are other problems in bridging a channel that is not yet up.

By: David Woolley (davidw) 2008-05-02 12:52:40

I've tried adding AST_FLAGS_OUTGOING into the ast_copy_flags call in ast_do_masquerade and it certainly gets rid of the premature terminations of the calls.  It doesn't give ringback tone though and I can't be sure that there isn't collateral damage, as, whilst the change I've made is very small it is possibly taking the code into unexplored territory and I don't think I understand it well enough yet to be confident I can predict problems.

In the end, we may choose to block this case in our user interface, if the alternative is to use local patches.  It's a public holiday on Monday, so, whilst I can answer from home, I can't try anything until Tuesday.

(Incidentally, the fault case for when the call was answered before it failed is that it worked for the same time as it did for the ringing out case, and then failed for the same reason.)

By: Joshua C. Colp (jcolp) 2008-05-07 09:22:00

oej: This should be solved with your NOTIFY work.

By: David Woolley (davidw) 2008-05-15 11:28:37

Can I stress that although this originally manifested as a SIP protocol error, it is really a couple of problems with masquerading:

1) the outgoing status of a channel gets lost;
2) indications are lost/not re-instated properly on the new destination.

As an example of point (2), we worked round point (1), without a code change, by interposing a Local channel, that would optimise out when the call completed.  When the incoming side of this Local channel is redirected to the parked call, the music on hold for the parked call is replaced by silence, until the the other party actually answers.  The SIP channel is no longer directly involved in the masquerade, so the protocol does not get confused.

Just to clarify this, the problem with indications isn't to do with indications directed towards the device directly on the channel, but indications directed towards the channel that it is bridged with.  There is code (20rc2 main/channel.c:3585) that deals with indications on the directly attached device (which shouldn't change), but none to deal with indications being output on the bridged device, which may well change.



By: David Woolley (davidw) 2008-05-15 13:08:46

Actually, it seems that the main masquerade logic doesn't deal with bridging at all, at the moment.  One of the funnies in this case is that it would normally be AppDial that dealt with Ring indications, and it would be running on the previously bridged peer, whereas bridging is actually running and it assumes that the both channels are already up, even though it mostly works when one isn't.  Whilst UnPark could omit setting up the bridge, without the bridge, there needs to be something else to supervise the connection.

I have a feeling that acknowledging that bridging can run on channels where one is not up, may be the better way forward.

By: David Woolley (davidw) 2008-05-16 04:57:40

On further thought, the indications shouldn't change until the masquerade is committed so there needs to be involvement of the masquerade code or the code above both masquerade and bridging (i.e. the unparking code, in this case).

By: James Golovich (jamesgolovich) 2009-01-19 14:14:55.000-0600

There have been some recent masquerade changes lately (within the past 5 days).  I wonder if this resolves the issue.  Any chance you could test it out?  The 1.4 branch in SVN has the changes and they also are in asterisk-1.4.23-rc4

By: David Woolley (davidw) 2009-01-20 09:47:41.000-0600

Unfortunately we've already worked round this and I don't think we could justify restoring the original configuration to test it.

We worked round this by interposing a local channel so we didn't have to touch the SIP channel.  Although that had its problems, we also worked round those (although some now have official fixes that we will look at when we go into the next active development phase).

By: David Woolley (davidw) 2009-03-13 13:29:46

Looking at the description for ASTERISK-12930, I think there is a good chance that the fix for that covered this issue.  As noted, though we are committed to our work around now, and it probably isn't going to be practicable to recreate the exact scenario.  (I have a feeling we have since take advantage of the local channel for other reasons.)

The test described in the review does sound as though it covers our original scenario, though.

In the next week or two we will try a version that includes the ASTERISK-12930 fix, so should soon know if that fixes the problem for our workaround version, although I'm fairly confident that it will.

By: Joshua C. Colp (jcolp) 2009-04-13 09:56:56

After looking over this issue's notes and the referenced issues I am confident this issue has been resolved. I am closing it out, but if this is untrue feel free to reopen.