Summary: | ASTERISK-09902: chan_iax2 qualify marks peers unreachable when they are reachable | ||
Reporter: | Simon Kirby (sim) | Labels: | |
Date Opened: | 2007-07-18 16:51:09 | Date Closed: | 2007-08-01 17:06:59 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_iax2 |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 10.50.255.251.log | |
Description: | while running the 1.4 branch (and the 1.0 branch), we have noticed that the IAX2 qualify testing has often been producing false unreachable messages. Calls active at the time continue, pings continue without a single drop, etc. Attached is a trace of an otherwise-idle session between two Asterisk servers approximately 78 ms apart, running 1.4.7.1 locally and 1.4.8 remotely. See the log entries vs the trace timestamps. The trace was run on the 10.100.0.60 host. It seems the received PONG is either ignored or not ACKed properly. The remote end retransmits an identical PONG, to which INVAL is sent. Two seconds after that, UNREACHABLE is logged. The next POKE succeeds and the qualification is satisfied. ****** ADDITIONAL INFORMATION ****** [Jul 18 14:05:06] NOTICE[5035] chan_iax2.c: Peer 'austin' is now UNREACHABLE! Time: 77 [Jul 18 14:05:16] NOTICE[5028] chan_iax2.c: Peer 'austin' is now REACHABLE! Time: 78 1 14:05:02.006271 10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 7, timestamp 6ms POKE 2 14:05:02.084248 10.50.255.251 -> 10.100.0.60 IAX2 IAX, source call# 6, timestamp 6ms PONG 3 14:05:04.083658 10.50.255.251 -> 10.100.0.60 IAX2 IAX, source call# 6, timestamp 6ms PONG 4 14:05:04.084658 10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 7, timestamp 0ms INVAL 5 14:05:16.007138 10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 14, timestamp 7ms POKE 6 14:05:16.085115 10.50.255.251 -> 10.100.0.60 IAX2 IAX, source call# 3, timestamp 7ms PONG 7 14:05:16.086115 10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 14, timestamp 7ms ACK 8 14:05:38.880391 10.50.255.251 -> 10.100.0.60 IAX2 IAX, source call# 11, timestamp 11ms POKE 9 14:05:38.881391 10.100.0.60 -> 10.50.255.251 IAX2 IAX, source call# 12, timestamp 11ms PONG 10 14:05:38.958369 10.50.255.251 -> 10.100.0.60 IAX2 IAX, source call# 11, timestamp 11ms ACK | ||
Comments: | By: Russell Bryant (russell) 2007-07-18 17:22:55 Can you post a wireshark dump so I can look at frame details? By: Simon Kirby (sim) 2007-07-18 17:49:19 hey russell, The above log is tshark output, and the attached 10.50.255.251.log file is a pcap file which was the result of a tcpdump running on the 10.100.0.60 host with "host 10.50.255.251" as the BPF arguments. By: Russell Bryant (russell) 2007-07-18 20:36:15 Ah, ok. I didn't know that file was a pcap file. Thanks .. By: pj (pj) 2007-07-18 22:12:23 I have similar issue, as I reported before (0009677). as workaround my issue can be solved, when I completelly restart asterisk (reloading chan_iax has no efect to issue). do you use some non default values in 'qualifyfreq(not)ok'? yesterdays I removed my non default settings and wait, if something changes... By: Simon Kirby (sim) 2007-07-20 18:04:05 Hi PJ... Just using the standard "qualify=yes", and nothing else. The problem does not seem to relate to uptime. By: Digium Subversion (svnbot) 2007-08-01 16:59:11 Repository: asterisk Revision: 77887 ------------------------------------------------------------------------ r77887 | russell | 2007-08-01 16:58:53 -0500 (Wed, 01 Aug 2007) | 23 lines Fix some race conditions which have been causing weird problems in chan_iax2. The most notable problem is that people have been seeing storms of VNAK frames being sent due to really old frames mysteriously being in the retransmission queue and never getting removed. It was possible that a dynamic thread got created, but did not acquire its lock before the thread that created it signals it to perform an action. When this happens, the thread will sleep until it hits a timeout, and then get destroyed. So, the action never gets performed and in some cases, means a frame doesn't get transmitted and never gets freed since the scheduler never gets a chance to reschedule transmission. Another less severe race condition is in the handling of a timeout for a dynamic thread. It was possible for it to be acquired to perform at action at the same time that it hit a timeout. When this occurs, whatever action it was acquired for would never get performed. (patch contributed by Mihai and SteveK) (closes issue ASTERISK-9946) (closes issue ASTERISK-9912) (closes issue ASTERISK-9902) (possibly related to issue ASTERISK-10004) ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-08-01 17:06:59 Repository: asterisk Revision: 77889 ------------------------------------------------------------------------ r77889 | russell | 2007-08-01 17:06:58 -0500 (Wed, 01 Aug 2007) | 31 lines Merged revisions 77887 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r77887 | russell | 2007-08-01 17:16:17 -0500 (Wed, 01 Aug 2007) | 23 lines Fix some race conditions which have been causing weird problems in chan_iax2. The most notable problem is that people have been seeing storms of VNAK frames being sent due to really old frames mysteriously being in the retransmission queue and never getting removed. It was possible that a dynamic thread got created, but did not acquire its lock before the thread that created it signals it to perform an action. When this happens, the thread will sleep until it hits a timeout, and then get destroyed. So, the action never gets performed and in some cases, means a frame doesn't get transmitted and never gets freed since the scheduler never gets a chance to reschedule transmission. Another less severe race condition is in the handling of a timeout for a dynamic thread. It was possible for it to be acquired to perform at action at the same time that it hit a timeout. When this occurs, whatever action it was acquired for would never get performed. (patch contributed by Mihai and SteveK) (closes issue ASTERISK-9946) (closes issue ASTERISK-9912) (closes issue ASTERISK-9902) (possibly related to issue ASTERISK-10004) ........ ------------------------------------------------------------------------ |