[Home]

Summary:ASTERISK-10979: Asterisk hangup calls if two identical invites are received
Reporter:Jorge Merlino (drencrom)Labels:
Date Opened:2007-12-05 08:06:16.000-0600Date Closed:2011-06-07 14:03:10
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/Interoperability
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk-1.2.24-fix_reinvite_retransmit.patch
( 1) error.log
( 2) error_keyyo.log
( 3) FULL_LOG_with_DEBUG_and_VERBOSE_set_to_5.txt
( 4) ok.log
Description:Context:
Asterisk server receiving incoming SIP calls from a PortaSIP softswitch

Problem:
Sometimes the softswitch sends two identical INVITES in a row as if the first OK from Asterisk was lost. When this happens Asterisk keeps resending the 200 OK message although it in fact receives the PortaSIP ACKs and the debug shows that it supposedly stopped retransmitting the 200 OK packet. Then, after 6 retransmissions of the OK packet (all of them ACK'd by the remote party) Asterisk hangs up a perfectly valid and working call.
The call works fine if there is no retransmission of the first INVITE packet.

Debug:
I'll attach two log files with verbose and debug enabled. Also SIP debug was active. One of them shows a valid call and the other shows the problem I described above.
Comments:By: Olle Johansson (oej) 2007-12-05 08:48:31.000-0600

1.2 is not supported any more. Please test with 1.4. Thank you!

By: Thomas Blanc (tomba) 2007-12-06 09:15:04.000-0600

Hello!
I have the same error with call from Keyyo.net (phonesystem in france) with Asterisk 1.4.13.
cf error keyyo.log

By: Olle Johansson (oej) 2007-12-06 13:36:41.000-0600

Tomba: I need more logging... set verbose to 5 and debug to 5 and try to recapture. Thanks.

By: Olle Johansson (oej) 2007-12-06 13:46:38.000-0600

2007-12-05 01:23:25 VERBOSE[18665] logger.c: Ignoring this INVITE request
2007-12-05 01:23:25 VERBOSE[18665] logger.c: We're at 200.40.166.178 port 19000
2007-12-05 01:23:25 VERBOSE[18665] logger.c: Video is at 200.40.166.178 port 16018
2007-12-05 01:23:25 VERBOSE[18665] logger.c: Adding codec 0x100 (g729) to SDP
2007-12-05 01:23:25 VERBOSE[18665] logger.c: Adding non-codec 0x1 (telephone-event) to SDP
2007-12-05 01:23:25 VERBOSE[18665] logger.c: Reliably Transmitting (no NAT) to 66.114.203.144:5060:

If we're ignoring the request, the response should not be re-sent reliably.

By: Jorge Merlino (drencrom) 2007-12-06 13:50:21.000-0600

FYI: I needed this working in asterisk 1.2 so I did a quick patch to solve this. What I saw was that the second Invite starts a new thread to send OK messages and the ACK only stops one of them.
I changed the behaviour so that the second invite only sends one OK (transmits not reliably) so that there is only one thread to stop.
I don't know if this also happens in Asterisk 1.4 but I don't have time to test it now.

By: Thomas Blanc (tomba) 2007-12-07 06:00:26.000-0600

Ok here's the full log, debug to 5 and verbose to 5.
I agree, "If we're ignoring the request, the response should not be re-sent reliably."
But it does re-send it.
And even more, if the reply comes from the trunk (keyyo), * doesn't seem to "hear" it.



Asterisk                     TRUNK

 <----------------INVITE 102


OK 102---------------->


                  INVITE 102 (retransmited)


  <---------------ACK 102


OK (for second INVITE 102, even if saying it ignored it)------>


  <---------------ACK 102 (for second INVITE response but * doesn't "hear" it and continue retransmiting ok for INVITE 102)


Retransmiting OK 102--------------->      |
                                         |   6 times  
                                         |   the same dialog of the deaf
   <---------------ACK 102               |   then hangup with warning: maximum
                                         |   retransmition of seqno 102.


I think it's the case in the log I uploaded.
Thanks for your answer.



By: Thomas Blanc (tomba) 2007-12-07 06:11:14.000-0600

By the way, I replaced phone number  password and stuff by xxxxxx in the log.

By: lvazquez (lvazquez) 2007-12-07 12:02:06.000-0600

As drencrom noted above, the problem in 1.2 is because when the second INVITE is received, asterisk is transmiting the second OK using sip_reliable_xmit() ; after having transmited the first one the same way. This is done in the handle_request_invite function:
<pre>
handle_request_invite() {
//ignore = 1
...
if (!p->lastinvite && !ignore && !p->owner) {
  ..
} else {
  reinvite = 1;
  c = p->owner;
}
...
if (c) {
  switch(c->_state) {
  ...
  case AST_STATE_UP:
    // 1 MEANS RETRANSMIT NOT CRITICAL
    transmit_response_with_sdp(p, "200 OK", req, (ignore || reinvite)? 1:2);
    break;
  ...
  }
}
}
</pre>
The second reliable response is scheduled as "not critical".
transmit_response_with_sdp() will call send_response() with the "retrans" parameter set to 1 and it in turn will call __sip_reliable_xmit() that will re-schedule a new retransmition timer (a NON CRITICAL one this time) but WITHOUT CANCEALING the previous (CRITICAL) one and overwriting the  pkt->retransid field in the struct sip_pkt:
<pre>
__sip_reliable_xmit(...) {
...
pkt->retransid = ast_sched_add_variable(sched, siptimer_a, retrans_pkt, pkt, 1);
...
}
</pre>
So when the next ACK would arrive, the last (NON CRITICAL) retransmission will be cancelled, but not the first (CRITICAL) one.

The system where the problem is happening is in productin state, so we are not able to test it with 1.4, but the corresponding code (the  case AST_STATE_UP in the handle_request_invite function) is very diferently so I thing it must be fixed in asterisk 1.4 and above.



By: Thomas Blanc (tomba) 2007-12-17 04:20:40.000-0600

Hi!
This bug sounds like this one http://bugs.digium.com/view.php?id=10332 ... could you confirm.
I think I saw two other bug reports witch sounds really close or caused by the same behavior of the chan_sip.

I tried  what neutrino88 submitted but it doesn't seem to correct it.
Should I post another log?

I'm still trying to figure out why and where the response to the 2nd invite is considered as critical.

By: lvazquez (lvazquez) 2007-12-18 08:02:16.000-0600

Hi Tomba, I'm not sure if it's the same bug, but at least they seem to be very similar and/or relatead to each other.
In respect to the reason for the problem I thought it was clear in my previous message, but what is not clear is my very limited english skills :( (sorry!)
The 2nd response IS NOT considered critical, BUT the problem is:
if it's sent as reliable it will not cancel AND instead will overwrite the reference (in the sip_pkt struct for the channel) to the retransmition thread of the first CRITICAL response.
It's done here:
...
pkt->retransid = ast_sched_add_variable(sched, siptimer_a, retrans_pkt, pkt, 1);
...
That's the problem, and so is the 1st response's critical retransmission that finally tears the call down.
It seems it's an implementation problem and not a logical one.
My attached patch seems to fix the problem (by not scheduling retransmission on the second response), but may be the more correct solution would be to cancel the first retransmission before scheduling the second one.

I'm in a very busy time now, but as soon as possible I will arrange to test if this situation arises in asterisk 1.4

By: Thomas Blanc (tomba) 2007-12-18 08:40:05.000-0600

Thanks for your answer.
I see.
I think we could find a workaround by sending the second response "ok"  not reliably (with XMIT_UNRELIABLE : without retransmission), but it would be a workaround only.
Even if overwriting the scheduled retransmission of the first response is normal (as it's a response of a retransmitted messages), asterisk still ignores the ACK sent back.

By: Joshua C. Colp (jcolp) 2008-03-04 13:18:46.000-0600

I've just spent the past two hours messing with a sipp scenario trying to get this to happen and have failed miserably. Is this still an issue or am I chasing something that has been fixed?

By: Thomas Blanc (tomba) 2008-03-04 19:09:21.000-0600

It Seems it has been fixed with the next update.
It happened on two differents systems, both work fine now.
I still don't know if it was a asterisk issue or my configuration.
Sorry for your time.

By: Joshua C. Colp (jcolp) 2008-03-05 11:04:02.000-0600

Closed per comment and my own testing.