Summary:ASTERISK-13782: sip call setup bug
Reporter:genie (genie)Labels:
Date Opened:2009-03-19 08:04:36Date Closed:2009-04-02 09:29:52
Versions:Frequency of
Environment:Attachments:( 0) asterisk.pcap
( 1) clientA.pcap
( 2) patchpending
( 3) Visio-call_setup_no_reinvite.pdf
Description:I'm testing SIP protocol on Asterisk. While performing tests on performance with packet loss I came across a bug of Asterisk while call initiation. In my Asterisk SIP configuration I have 'canreinvite=no' set. That is why normal call setup looks like that:

However, if the first ACK packet of the flow is lost and the ClientA who does not know it sends Invite_with_authorisation packet the call will never be set up, ending up with a infinte loop where client sends ACK messages and asterisk '491 request pending' message. The wireshark files captured on both ClientA and Asterisk can be found on bug.


Comments:By: genie (genie) 2009-03-19 08:18:51

i wish i could remove links to the files mentioned in the bug report as i have attached them to the report, but i dont know how to edit the post.

By: Ravindra Devadiga (ravindrad) 2009-03-30 11:12:30

Retransmission handling in asterisk for incoming requests are cancelled only if the pending invite and Cseq of the acknowledgment request received are same.

If the CSeq request received are higher than the current pending invite CSeq the ack matching will fail and the retransmission continues even after receiving ack.

So following patches fixes the bug retransmission.
But I dont think 491 request pending should be generated for this scenario. This
is a valid scenario and when packet loss scenario happens in network, calls  
will be dropped with this way of handling in asterisk. I already have worked on a fix for it. Need some more testing. I should be able to update the patch for this by tomorrow.

By: genie (genie) 2009-04-01 21:20:59

i'm not sure if i got you right ravindrad but i did some extra tests and here is what i've found:
UserA start one sesion (CSeq = 1) and without finishing it starts another one (CSeq = 2). Therefore astersik sends '491 request pending'. However the 491 message holds Cseq=2 info. When i was using ekiga, it resonded fur such a 491 message with ACK-Cseq=2, while it was ACK-CSeq=1 missing. So as for me the bug is that '491 messages' hold wrong CSeq.
I hope it was the same what you have found out.

By: snuffy (snuffy) 2009-04-02 07:33:12

I've assigned to you dvossel, as you last worked on 491 recently..
Is it likely that you may have fixed this in your latest patch?

By: Ravindra Devadiga (ravindrad) 2009-04-02 09:03:57

I am in totally sync with you Genie. I was commenting about the 491 response sent for the second invite with cseq =2, I think it is not handled according to RFC 3261. The 491 response should be sent when you send an outgoing invite which is not yet completed and at the same time you receive incoming invite for the same session, this happens only in case of reinvite. But here in asterisk it is handled bit differently.

RFC 3261, in section 14,2 he says "A UAS that receives a second INVITE before it sends the final  response to a first INVITE with a lower CSeq sequence number on the same dialog MUST return a 500 (Server Internal Error) response to the   second INVITE and MUST include a Retry-After header field with a randomly chosen value of between 0 and 10 seconds".

A UAS that receives an INVITE on a dialog while an INVITE it had sent on that dialog is in progress MUST return a 491 (Request Pending) response to the received INVITE

In case of incoming invite even the first invite is not completed, we can allow the user to try "max retry" times for a call or transaction timeout.

If we take the above scenario, ack for the first invite is dropped(cseq=1), Unaware of this UA sends invite with authentcation(cseq=2). In this case asterisk should continue retransmit 401 until it receives ack for that and should process this 2nd invite. If the max retries are reached and still the transaction are not completed then call should be disconnected.

I already worked on the fix for it. It is working fine, needs some more testing... Could not test it since got busy with some other things. I think Monday I can update the patch for this.

By: David Vossel (dvossel) 2009-04-02 09:29:51

committed a patch April 1st that should have taken care of this. Updating to the latest branch code should fix it.  If not, feel free to re-open this issue and I'll take a closer look at it.