[Home]

Summary:ASTERISK-09902: chan_iax2 qualify marks peers unreachable when they are reachable
Reporter:Simon Kirby (sim)Labels:
Date Opened:2007-07-18 16:51:09Date Closed:2007-08-01 17:06:59
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 10.50.255.251.log
Description:while running the 1.4 branch (and the 1.0 branch), we have noticed that the IAX2 qualify testing has often been producing false unreachable messages.  Calls active at the time continue, pings continue without a single drop, etc.

Attached is a trace of an otherwise-idle session between two Asterisk servers approximately 78 ms apart, running 1.4.7.1 locally and 1.4.8 remotely.

See the log entries vs the trace timestamps.  The trace was run on the 10.100.0.60 host.  It seems the received PONG is either ignored or not ACKed properly.  The remote end retransmits an identical PONG, to which INVAL is sent.  Two seconds after that, UNREACHABLE is logged.  The next POKE succeeds and the qualification is satisfied.

****** ADDITIONAL INFORMATION ******

[Jul 18 14:05:06] NOTICE[5035] chan_iax2.c: Peer 'austin' is now UNREACHABLE! Time: 77
[Jul 18 14:05:16] NOTICE[5028] chan_iax2.c: Peer 'austin' is now REACHABLE! Time: 78

 1 14:05:02.006271  10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 7, timestamp 6ms POKE
 2 14:05:02.084248 10.50.255.251 -> 10.100.0.60  IAX2 IAX, source call# 6, timestamp 6ms PONG
 3 14:05:04.083658 10.50.255.251 -> 10.100.0.60  IAX2 IAX, source call# 6, timestamp 6ms PONG
 4 14:05:04.084658  10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 7, timestamp 0ms INVAL
 5 14:05:16.007138  10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 14, timestamp 7ms POKE
 6 14:05:16.085115 10.50.255.251 -> 10.100.0.60  IAX2 IAX, source call# 3, timestamp 7ms PONG
 7 14:05:16.086115  10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 14, timestamp 7ms ACK
 8 14:05:38.880391 10.50.255.251 -> 10.100.0.60  IAX2 IAX, source call# 11, timestamp 11ms POKE
 9 14:05:38.881391  10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 12, timestamp 11ms PONG
10 14:05:38.958369 10.50.255.251 -> 10.100.0.60  IAX2 IAX, source call# 11, timestamp 11ms ACK
Comments:By: Russell Bryant (russell) 2007-07-18 17:22:55

Can you post a wireshark dump so I can look at frame details?

By: Simon Kirby (sim) 2007-07-18 17:49:19

hey russell,

The above log is tshark output, and the attached 10.50.255.251.log file is a pcap file which was the result of a tcpdump running on the 10.100.0.60 host with "host 10.50.255.251" as the BPF arguments.

By: Russell Bryant (russell) 2007-07-18 20:36:15

Ah, ok.  I didn't know that file was a pcap file.  Thanks ..

By: pj (pj) 2007-07-18 22:12:23

I have similar issue, as I reported before (0009677). as workaround my issue can be solved, when I completelly restart asterisk (reloading chan_iax has no efect to issue).
do you use some non default values in 'qualifyfreq(not)ok'? yesterdays I removed my non default settings and wait, if something changes...

By: Simon Kirby (sim) 2007-07-20 18:04:05

Hi PJ... Just using the standard "qualify=yes", and nothing else.  The problem does not seem to relate to uptime.

By: Digium Subversion (svnbot) 2007-08-01 16:59:11

Repository: asterisk
Revision: 77887

------------------------------------------------------------------------
r77887 | russell | 2007-08-01 16:58:53 -0500 (Wed, 01 Aug 2007) | 23 lines

Fix some race conditions which have been causing weird problems in chan_iax2.
The most notable problem is that people have been seeing storms of VNAK frames
being sent due to really old frames mysteriously being in the retransmission
queue and never getting removed.

It was possible that a dynamic thread got created, but did not acquire its lock
before the thread that created it signals it to perform an action.  When this
happens, the thread will sleep until it hits a timeout, and then get destroyed.
So, the action never gets performed and in some cases, means a frame doesn't
get transmitted and never gets freed since the scheduler never gets a chance
to reschedule transmission.

Another less severe race condition is in the handling of a timeout for a dynamic
thread.  It was possible for it to be acquired to perform at action at the same
time that it hit a timeout.  When this occurs, whatever action it was acquired
for would never get performed.

(patch contributed by Mihai and SteveK)
(closes issue ASTERISK-9946)
(closes issue ASTERISK-9912)
(closes issue ASTERISK-9902)
(possibly related to issue ASTERISK-10004)

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-08-01 17:06:59

Repository: asterisk
Revision: 77889

------------------------------------------------------------------------
r77889 | russell | 2007-08-01 17:06:58 -0500 (Wed, 01 Aug 2007) | 31 lines

Merged revisions 77887 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r77887 | russell | 2007-08-01 17:16:17 -0500 (Wed, 01 Aug 2007) | 23 lines

Fix some race conditions which have been causing weird problems in chan_iax2.
The most notable problem is that people have been seeing storms of VNAK frames
being sent due to really old frames mysteriously being in the retransmission
queue and never getting removed.

It was possible that a dynamic thread got created, but did not acquire its lock
before the thread that created it signals it to perform an action.  When this
happens, the thread will sleep until it hits a timeout, and then get destroyed.
So, the action never gets performed and in some cases, means a frame doesn't
get transmitted and never gets freed since the scheduler never gets a chance
to reschedule transmission.

Another less severe race condition is in the handling of a timeout for a dynamic
thread.  It was possible for it to be acquired to perform at action at the same
time that it hit a timeout.  When this occurs, whatever action it was acquired
for would never get performed.

(patch contributed by Mihai and SteveK)
(closes issue ASTERISK-9946)
(closes issue ASTERISK-9912)
(closes issue ASTERISK-9902)
(possibly related to issue ASTERISK-10004)

........

------------------------------------------------------------------------