[Home]

Summary:ASTERISK-13679: [patch] Asterisk does not stop retransmission
Reporter:klaus3000 (klaus3000)Labels:
Date Opened:2009-03-02 09:22:45.000-0600Date Closed:2009-06-24 13:57:13
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/Interoperability
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 14584_v2.patch
( 1) 14584.patch
( 2) debugging.patch.1.4.23.txt
( 3) full-trace.txt
Description:Scenario:

<pre>
client                Asterisk
 ------INV,407,ACK-------->
 ------INV---------------->
                           -----INV---->
                           <---183------
 <-------183---------------
                           <---180------
 <-------180---------------
 --------CANCEL----------->
   X----487, 200-(loss)----

 ---CANCEL retransmission->
 <----487, 200-------------
 ---ACK------------------->

 <----487------------------
 <----487------------------
 <----487------------------
 <----487------------------
 <----487------------------
</pre>

****** ADDITIONAL INFORMATION ******

I reproduced the scenario using sipp. Further I found out that the problem must be related to the retransmitted CANCEL, as the following scenario is fine:

<pre>
client                Asterisk
 ------INV,407,ACK-------->
 ------INV---------------->
                           -----INV---->
                           <---183------
 <-------183---------------
                           <---180------
 <-------180---------------
 --------CANCEL----------->
    X----487, 200-(loss)---
 --CANCEL retr. (loss)-X

 <----487------------------
 ---ACK------------------->
</pre>
Comments:By: klaus3000 (klaus3000) 2009-03-02 10:25:52.000-0600

I think I found the problem: The retransmitted CANCEL triggers again 487. But this is not an immediate replacement of the scheduled 487, but a new one. Thus, after the second CANCEL there are two 487 replies scheduled for retransmissions, and the ACK stops only one of these two. (take a look at the retransmission ID)

btw: attached patch helped me find the problem

By: Leif Madsen (lmadsen) 2009-06-10 12:11:02

Hey Mark!

I know I'm not supposed to assign any issues to you right now, but I thought this one was important enough, and potentially quick enough, to assign to you to review when you have a chance. Thanks!

By: Mark Michelson (mmichelson) 2009-06-11 10:47:12

I updated the diagrams inside the description and additional information fields to be wrapped in "pre" tags so that the original spacing is preserved. This makes them more readable. I'm about to take a look at the patch.

edit: need to escape angle brackets so that this note makes more sense.
edit2: ok, forget the brackets, I'll just use "pre" and hopefully that makes sense.



By: Mark Michelson (mmichelson) 2009-06-11 13:51:42

Give 14584.patch a try. Thanks for doing the legwork to find the source of the issue.

By: klaus3000 (klaus3000) 2009-06-18 09:33:29

Hi!

With this patch Asterisk 1.4.24 crashes on reception of the retransmitted CANCEL:

<--- Transmitting (no NAT) to 83.136.32.132:5061 --->
SIP/2.0 180 Ringing
Via: SIP/2.0/UDP 83.136.32.132:5061;branch=z9hG4bK-10506-1-4;received=83.136.32.132
From: <sip:klaus@83.136.32.165>;tag=1
To: <sip:01505641636@83.136.32.165:5060>;tag=as5434e245
Call-ID: 1-10506@83.136.32.132
CSeq: 2 INVITE
User-Agent: Asterisk PBX
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY
Supported: replaces
Contact: <sip:01505641636@83.136.32.165>
Content-Length: 0


<------------>

<--- SIP read from 83.136.32.132:5061 --->
CANCEL sip:01505641636@83.136.32.165:5060 SIP/2.0
Via: SIP/2.0/UDP 83.136.32.132:5061;branch=z9hG4bK-10506-1-8
From: <sip:klaus@83.136.32.165>;tag=1
To: <sip:01505641636@83.136.32.165:5060>
Call-ID: 1-10506@83.136.32.132
CSeq: 2 CANCEL
Max-Forwards: 70

<------------->
--- (7 headers 0 lines) ---
Sending to 83.136.32.132 : 5061 (no NAT)

<--- Reliably Transmitting (no NAT) to 83.136.32.132:5061 --->
SIP/2.0 487 Request Terminated
Via: SIP/2.0/UDP 83.136.32.132:5061;branch=z9hG4bK-10506-1-4;received=83.136.32.132
From: <sip:klaus@83.136.32.165>;tag=1
To: <sip:01505641636@83.136.32.165:5060>;tag=as5434e245
Call-ID: 1-10506@83.136.32.132
CSeq: 2 INVITE
User-Agent: Asterisk PBX
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY
Supported: replaces
Content-Length: 0


<------------>
[Jun 18 16:32:23] NOTICE[23306]: chan_sip.c:2067 __sip_reliable_xmit: Hey, I just set the response code for this packet to 487

<--- Transmitting (no NAT) to 83.136.32.132:5061 --->
SIP/2.0 200 OK
Via: SIP/2.0/UDP 83.136.32.132:5061;branch=z9hG4bK-10506-1-8;received=83.136.32.132
From: <sip:klaus@83.136.32.165>;tag=1
To: <sip:01505641636@83.136.32.165:5060>;tag=as5434e245
Call-ID: 1-10506@83.136.32.132
CSeq: 2 CANCEL
User-Agent: Asterisk PBX
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY
Supported: replaces
Content-Length: 0


<------------>
   -- Hungup 'DAHDI/1-1'
 == Spawn extension (direct, 01505641636, 9) exited non-zero on 'SIP/klaus-08fb17f8'

<--- SIP read from 83.136.32.132:5061 --->
CANCEL sip:01505641636@83.136.32.165:5060 SIP/2.0
Via: SIP/2.0/UDP 83.136.32.132:5061;branch=z9hG4bK-10506-1-12
From: <sip:klaus@83.136.32.165>;tag=1
To: <sip:01505641636@83.136.32.165:5060>
Call-ID: 1-10506@83.136.32.132
CSeq: 2 CANCEL
Max-Forwards: 70

<------------->
--- (7 headers 0 lines) ---
Sending to 83.136.32.132 : 5061 (no NAT)
Scheduling destruction of SIP dialog '1-10506@83.136.32.132' in 32000 ms (Method: CANCEL)
Segmentation fault (core dumped)



#0  0xb61b7cde in handle_request_cancel (p=0x8fb17f8, req=0xb6170050) at chan_sip.c:15366
15366                                   prev_pkt->next = pkt->next;
(gdb) bt
#0  0xb61b7cde in handle_request_cancel (p=0x8fb17f8, req=0xb6170050) at chan_sip.c:15366
#1  0xb61bb2e2 in handle_request (p=0x8fb17f8, req=0xb6170050, sin=0xb6170040, recount=0xb6170034, nounlock=0xb6170038) at chan_sip.c:16050
#2  0xb61bc10f in sipsock_read (id=0x8f7d7a0, fd=47, events=1, ignore=0x0) at chan_sip.c:16293
#3  0x080b0b54 in ast_io_wait (ioc=0x8f70d00, howlong=999) at io.c:279
#4  0xb61bcadc in do_monitor (data=0x0) at chan_sip.c:16512
ASTERISK-1  0x0810588d in dummy_start (data=0x8f7d690) at utils.c:856
ASTERISK-2  0xb7f1e4c0 in start_thread () from /lib/i686/cmov/libpthread.so.0
ASTERISK-3  0xb7e306de in clone () from /lib/i686/cmov/libc.so.6

By: Mark Michelson (mmichelson) 2009-06-18 09:37:40

Ah, thanks. I'll work out a new version really soon. It looks like I made a simple mistake here.

By: Mark Michelson (mmichelson) 2009-06-18 09:41:41

New patch uploaded (14584_v2.patch). This should correct the problem you had with the previous patch.

By: klaus3000 (klaus3000) 2009-06-19 06:54:21

Hi Michael! I tested the new patch in several scenarios and it seems to work fine now. Thanks.

By: Digium Subversion (svnbot) 2009-06-22 09:42:56

Repository: asterisk
Revision: 202341

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r202341 | mmichelson | 2009-06-22 09:42:55 -0500 (Mon, 22 Jun 2009) | 26 lines

Fix a situation in which Asterisk would not stop retransmitting 487s.

If a CANCEL were received by Asterisk, we would send a 487 in response
to the original INVITE and a 200 OK for the CANCEL. If there were a network
hiccup which caused the 200 OK and the 487 to be lost, then the UA communicating
with Asterisk may try to retransmit its CANCEL. Asterisk's response to this used
to be to try sending another 487 to the canceled INVITE and another 200 OK to the
CANCEL.

The problem here is that the originally-sent 487 was sent "reliably" meaning that
it will be retransmitted until it is received properly. So when we receive the second
CANCEL it is likely that the first batch of 487s we sent is still going strong and
reaches the UA. The result was that the second set of 487s would be retransmitted
constantly until the maximum number of retries had been reached.

The fix for this is that if we receive a second CANCEL for an INVITE, then we cancel
the retransmission of the first set of 487s and start a second set. This causes the
dialog to be terminated reasonably.

(closes issue ASTERISK-13679)
Reported by: klaus3000
Patches:
     14584_v2.patch uploaded by mmichelson (license 60)
Tested by: klaus3000


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202341

By: Digium Subversion (svnbot) 2009-06-22 09:58:25

Repository: asterisk
Revision: 202343

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r202343 | mmichelson | 2009-06-22 09:58:24 -0500 (Mon, 22 Jun 2009) | 36 lines

Merged revisions 202341-202342 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r202341 | mmichelson | 2009-06-22 09:42:55 -0500 (Mon, 22 Jun 2009) | 26 lines
 
 Fix a situation in which Asterisk would not stop retransmitting 487s.
 
 If a CANCEL were received by Asterisk, we would send a 487 in response
 to the original INVITE and a 200 OK for the CANCEL. If there were a network
 hiccup which caused the 200 OK and the 487 to be lost, then the UA communicating
 with Asterisk may try to retransmit its CANCEL. Asterisk's response to this used
 to be to try sending another 487 to the canceled INVITE and another 200 OK to the
 CANCEL.
 
 The problem here is that the originally-sent 487 was sent "reliably" meaning that
 it will be retransmitted until it is received properly. So when we receive the second
 CANCEL it is likely that the first batch of 487s we sent is still going strong and
 reaches the UA. The result was that the second set of 487s would be retransmitted
 constantly until the maximum number of retries had been reached.
 
 The fix for this is that if we receive a second CANCEL for an INVITE, then we cancel
 the retransmission of the first set of 487s and start a second set. This causes the
 dialog to be terminated reasonably.
 
 (closes issue ASTERISK-13679)
 Reported by: klaus3000
 Patches:
       14584_v2.patch uploaded by mmichelson (license 60)
 Tested by: klaus3000
........
 r202342 | mmichelson | 2009-06-22 09:44:58 -0500 (Mon, 22 Jun 2009) | 3 lines
 
 Remove an extra debug line left from previous commit.
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202343

By: Digium Subversion (svnbot) 2009-06-22 10:05:01

Repository: asterisk
Revision: 202344

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_sip.c

------------------------------------------------------------------------
r202344 | mmichelson | 2009-06-22 10:05:01 -0500 (Mon, 22 Jun 2009) | 43 lines

Merged revisions 202343 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202343 | mmichelson | 2009-06-22 09:58:24 -0500 (Mon, 22 Jun 2009) | 36 lines
 
 Merged revisions 202341-202342 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202341 | mmichelson | 2009-06-22 09:42:55 -0500 (Mon, 22 Jun 2009) | 26 lines
   
   Fix a situation in which Asterisk would not stop retransmitting 487s.
   
   If a CANCEL were received by Asterisk, we would send a 487 in response
   to the original INVITE and a 200 OK for the CANCEL. If there were a network
   hiccup which caused the 200 OK and the 487 to be lost, then the UA communicating
   with Asterisk may try to retransmit its CANCEL. Asterisk's response to this used
   to be to try sending another 487 to the canceled INVITE and another 200 OK to the
   CANCEL.
   
   The problem here is that the originally-sent 487 was sent "reliably" meaning that
   it will be retransmitted until it is received properly. So when we receive the second
   CANCEL it is likely that the first batch of 487s we sent is still going strong and
   reaches the UA. The result was that the second set of 487s would be retransmitted
   constantly until the maximum number of retries had been reached.
   
   The fix for this is that if we receive a second CANCEL for an INVITE, then we cancel
   the retransmission of the first set of 487s and start a second set. This causes the
   dialog to be terminated reasonably.
   
   (closes issue ASTERISK-13679)
   Reported by: klaus3000
   Patches:
         14584_v2.patch uploaded by mmichelson (license 60)
   Tested by: klaus3000
 ........
   r202342 | mmichelson | 2009-06-22 09:44:58 -0500 (Mon, 22 Jun 2009) | 3 lines
   
   Remove an extra debug line left from previous commit.
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202344

By: Digium Subversion (svnbot) 2009-06-22 10:10:53

Repository: asterisk
Revision: 202345

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_sip.c

------------------------------------------------------------------------
r202345 | mmichelson | 2009-06-22 10:10:53 -0500 (Mon, 22 Jun 2009) | 43 lines

Merged revisions 202343 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202343 | mmichelson | 2009-06-22 09:58:24 -0500 (Mon, 22 Jun 2009) | 36 lines
 
 Merged revisions 202341-202342 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202341 | mmichelson | 2009-06-22 09:42:55 -0500 (Mon, 22 Jun 2009) | 26 lines
   
   Fix a situation in which Asterisk would not stop retransmitting 487s.
   
   If a CANCEL were received by Asterisk, we would send a 487 in response
   to the original INVITE and a 200 OK for the CANCEL. If there were a network
   hiccup which caused the 200 OK and the 487 to be lost, then the UA communicating
   with Asterisk may try to retransmit its CANCEL. Asterisk's response to this used
   to be to try sending another 487 to the canceled INVITE and another 200 OK to the
   CANCEL.
   
   The problem here is that the originally-sent 487 was sent "reliably" meaning that
   it will be retransmitted until it is received properly. So when we receive the second
   CANCEL it is likely that the first batch of 487s we sent is still going strong and
   reaches the UA. The result was that the second set of 487s would be retransmitted
   constantly until the maximum number of retries had been reached.
   
   The fix for this is that if we receive a second CANCEL for an INVITE, then we cancel
   the retransmission of the first set of 487s and start a second set. This causes the
   dialog to be terminated reasonably.
   
   (closes issue ASTERISK-13679)
   Reported by: klaus3000
   Patches:
         14584_v2.patch uploaded by mmichelson (license 60)
   Tested by: klaus3000
 ........
   r202342 | mmichelson | 2009-06-22 09:44:58 -0500 (Mon, 22 Jun 2009) | 3 lines
   
   Remove an extra debug line left from previous commit.
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202345

By: Digium Subversion (svnbot) 2009-06-22 10:17:02

Repository: asterisk
Revision: 202346

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_sip.c

------------------------------------------------------------------------
r202346 | mmichelson | 2009-06-22 10:17:02 -0500 (Mon, 22 Jun 2009) | 43 lines

Merged revisions 202343 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202343 | mmichelson | 2009-06-22 09:58:24 -0500 (Mon, 22 Jun 2009) | 36 lines
 
 Merged revisions 202341-202342 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202341 | mmichelson | 2009-06-22 09:42:55 -0500 (Mon, 22 Jun 2009) | 26 lines
   
   Fix a situation in which Asterisk would not stop retransmitting 487s.
   
   If a CANCEL were received by Asterisk, we would send a 487 in response
   to the original INVITE and a 200 OK for the CANCEL. If there were a network
   hiccup which caused the 200 OK and the 487 to be lost, then the UA communicating
   with Asterisk may try to retransmit its CANCEL. Asterisk's response to this used
   to be to try sending another 487 to the canceled INVITE and another 200 OK to the
   CANCEL.
   
   The problem here is that the originally-sent 487 was sent "reliably" meaning that
   it will be retransmitted until it is received properly. So when we receive the second
   CANCEL it is likely that the first batch of 487s we sent is still going strong and
   reaches the UA. The result was that the second set of 487s would be retransmitted
   constantly until the maximum number of retries had been reached.
   
   The fix for this is that if we receive a second CANCEL for an INVITE, then we cancel
   the retransmission of the first set of 487s and start a second set. This causes the
   dialog to be terminated reasonably.
   
   (closes issue ASTERISK-13679)
   Reported by: klaus3000
   Patches:
         14584_v2.patch uploaded by mmichelson (license 60)
   Tested by: klaus3000
 ........
   r202342 | mmichelson | 2009-06-22 09:44:58 -0500 (Mon, 22 Jun 2009) | 3 lines
   
   Remove an extra debug line left from previous commit.
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202346