[Home]

Summary:ASTERISK-09912: chan_iax2 VNAK storm seen
Reporter:Steve Davies . (stevedavies)Labels:
Date Opened:2007-07-20 04:47:24Date Closed:2007-08-01 17:06:59
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:Saw this happening on a customer between two boxes:

Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29274ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29285ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29294ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29305ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29314ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29341ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29351ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29360ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29369ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29378ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[-01] -- OSeqno: 064 ISeqno: 060 Type: IAX     Subclass: LAGRP  
  Timestamp: 200069ms  SCall: 00061  DCall: 00059 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 011 ISeqno: 011 Type: IAX     Subclass: VNAK  
  Timestamp: 29387ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
etc etc etc etc

It did stop eventually when:

Rx-Frame Retry[ No] -- OSeqno: 010 ISeqno: 014 Type: IAX     Subclass: HANGUP
  Timestamp: 30238ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
  CAUSE CODE      : 0

Tx-Frame Retry[-01] -- OSeqno: 014 ISeqno: 011 Type: IAX     Subclass: ACK    
  Timestamp: 30238ms  SCall: 00061  DCall: 00047 [196.40.106.94:4569]
Rx-Frame Retry[ No] -- OSeqno: 014 ISeqno: 014 Type: IAX     Subclass: VNAK  
  Timestamp: 30245ms  SCall: 00047  DCall: 00061 [196.40.106.94:4569]
Tx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: INVAL  
  Timestamp: 00000ms  SCall: 00061  DCall: 00047 [196.40.106.94:4569]
Comments:By: Russell Bryant (russell) 2007-07-20 17:20:58

There have been some fixes in the past week related to this.  Please try with the latest code in the 1.4 branch.

By: Digium Subversion (svnbot) 2007-08-01 16:58:55

Repository: asterisk
Revision: 77887

------------------------------------------------------------------------
r77887 | russell | 2007-08-01 16:58:53 -0500 (Wed, 01 Aug 2007) | 23 lines

Fix some race conditions which have been causing weird problems in chan_iax2.
The most notable problem is that people have been seeing storms of VNAK frames
being sent due to really old frames mysteriously being in the retransmission
queue and never getting removed.

It was possible that a dynamic thread got created, but did not acquire its lock
before the thread that created it signals it to perform an action.  When this
happens, the thread will sleep until it hits a timeout, and then get destroyed.
So, the action never gets performed and in some cases, means a frame doesn't
get transmitted and never gets freed since the scheduler never gets a chance
to reschedule transmission.

Another less severe race condition is in the handling of a timeout for a dynamic
thread.  It was possible for it to be acquired to perform at action at the same
time that it hit a timeout.  When this occurs, whatever action it was acquired
for would never get performed.

(patch contributed by Mihai and SteveK)
(closes issue ASTERISK-9946)
(closes issue ASTERISK-9912)
(closes issue ASTERISK-9902)
(possibly related to issue ASTERISK-10004)

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-08-01 17:06:59

Repository: asterisk
Revision: 77889

------------------------------------------------------------------------
r77889 | russell | 2007-08-01 17:06:58 -0500 (Wed, 01 Aug 2007) | 31 lines

Merged revisions 77887 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r77887 | russell | 2007-08-01 17:16:17 -0500 (Wed, 01 Aug 2007) | 23 lines

Fix some race conditions which have been causing weird problems in chan_iax2.
The most notable problem is that people have been seeing storms of VNAK frames
being sent due to really old frames mysteriously being in the retransmission
queue and never getting removed.

It was possible that a dynamic thread got created, but did not acquire its lock
before the thread that created it signals it to perform an action.  When this
happens, the thread will sleep until it hits a timeout, and then get destroyed.
So, the action never gets performed and in some cases, means a frame doesn't
get transmitted and never gets freed since the scheduler never gets a chance
to reschedule transmission.

Another less severe race condition is in the handling of a timeout for a dynamic
thread.  It was possible for it to be acquired to perform at action at the same
time that it hit a timeout.  When this occurs, whatever action it was acquired
for would never get performed.

(patch contributed by Mihai and SteveK)
(closes issue ASTERISK-9946)
(closes issue ASTERISK-9912)
(closes issue ASTERISK-9902)
(possibly related to issue ASTERISK-10004)

........

------------------------------------------------------------------------