ASTERISK-14929: Call abort after wrong behaviour of VNAK transmission

[Home]

Summary: ASTERISK-14929: Call abort after wrong behaviour of VNAK transmission

Reporter: Florian Floimair (ffloimair) Labels:

Date Opened: 2009-10-02 03:22:13 Date Closed: 2011-07-26 14:27:43

Priority: Major Regression? No

Status: Closed/Complete Components: Channels/chan_iax2

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments: ( 0) IAX_VNAK_Bug_Displayed.pcap
( 1) issues.debug
( 2) issues.debug.vnak.gz
( 3) issues.verbose

Description: Asterisk sends VNAK to signal a retransmission of a packet. After retransmission the request is sent again and again... Abort of the call is the result after a few retries.
Overall, asterisk very often sends VNAK after receiving LAGRQ and PING consecutively. Why does asterisk know that packet 2 after 1 is missing if there was no packet 3 so far (numbers referring to ascending sequence numbers in IAX Header)? This is confusing since the RFC states the below text requiring to retransmit all frames with a higher sequence number than the one received with the VNAK.

Please see the attached pcap file.

There was no significant output in the asterisk console, so I do not post this as of now. If required I can post this later.

****** ADDITIONAL INFORMATION ******

6.9.3. VNAK: Voice Negative Acknowledgement Message

A VNAK is sent when a message is received out of order, particularly
when a Mini Frame is received before the first full voice frame on a
call. It is a request for retransmission of dropped messages. A
message is considered out of sequence if the received iseqno is
different than the expected iseqno. On receipt of a VNAK, a peer
MUST retransmit all frames with a higher sequence number than the
VNAK message's iseqno.

Comments: By: Florian Floimair (ffloimair) 2009-10-02 03:23:36

Note: in the above pcap file the Asterisk system has the IP address 172.16.0.11
By: c0rnoTa (c0rnota) 2009-10-14 20:20:39

I have the same issue. Under my control are 2 servers (asterisk 1.4.26 and 1.4.26.1). There are some queues on each server. Operator receives call from queue, that came to station via DAHDI, and transfers it via IAX to another queue on another server. Second operator receives the call. After a call from DAHDI by server 1 and operator on server 2 bridged together + a few seconds, and a call drops.

I'm digging log all night and found what: both my stations time to time have deadlocks:
[Oct 14 13:51:23] DEBUG[11495] channel.c: Avoiding initial deadlock for channel '0xb6605278'
But sometimes deadlocks have something like this:
[Oct 14 11:22:26] DEBUG[11495] channel.c: Avoiding initial deadlock for channel '0xb6105fc0'
[Oct 14 11:22:26] DEBUG[11509] chan_iax2.c: Packet arrived out of order (expecting 3, got 5) (frametype = 2, subclass = 256)
I'm using "trunk=yes" option in my configuration (encryption=yes). Cound deadlock be by chan_iax? while trunk channel locked, we are receiving VNAK message, and then
[Oct 14 11:22:33] DEBUG[11505] chan_iax2.c: Immediately destroying 16096, having received hangup
Could it be true or it's absurd? It's only my theoretical supposition.

I'm so sorry, but VNAK message does not exist permanently in my debug log before drops. But messages about "chan_iax2.c: Packet arrived out of order" does. And there are over 23000 messages "Received VNAK: resending outstanding frames" having different time with call drops.

I'll try to attach my log files. Call drop was at 11:22:33 (amazing!)

By: c0rnoTa (c0rnota) 2009-10-14 20:23:37

sorry for 2 different files (one for verbose and one for debug). i'm storing info in different files, and didn't know how to combine it.

And sorry 4 my english.

By: c0rnoTa (c0rnota) 2009-10-14 20:35:52

Attached debug messages with VNAK. There is another interesting line:
"iseqno 11 now within window 12->17"

And can anybody tell me, what does mean "FRAME_CONTROL (-1)"? It seems, that issue appear after that message.

Thx!
By: Florian Floimair (ffloimair) 2009-10-20 03:36:01

Any news on this issue? Please let me know if you need more information.
By: c0rnoTa (c0rnota) 2009-10-20 03:40:12

It seams that my call drops was not wired with this issue.
By: Thomas Karl (tkarl) 2009-10-27 04:39:01

I think I have the same problem in one of my projects in Spain, where I have an IAX trunk set up.
The asterisk drops all conversations to this trunk after some time with the verbose notice "Max retries exceeded to host...".
It seem like this is happening more often when I have multiple conversations at the same time. When I set up only 2 or 3 conversation I have to keep those conversations for hours to reproduce this problem. When I have more than 16 conversations it is very likely that the problem occurs after a few minutes.
After sniffing the network traffic, I recognized that before the error message is printed, more VNAKs are present so I found this issue and saw the same wrong VNAK behavior.
I don’t think it is necessary to provide my Wireshark trace at this point of time; if you need any more information I will do my best to provide all required data.
Unfortunately my customer is getting a little bit nervous so it would be great to have a time schedule of when this issue will be processed. Thanks a lot.

By: dreamer2000 (dreamer2000) 2009-10-28 05:16:57

I looked through the pcap file and i think i have 2 ideas for possible reasons why the calls are dropped after "max tries exceeded...":

1.
one part sends PING (SEQ-ID) and LAGRQ (SEQ-ID+1), after this asterisk answers with VNAK (SEQ-ID) which looks like a bug at all because of IAX2-RFC: 6.9.3. VNAK: Voice Negative Acknowledgement Message - you must repeat frames with a higher seq-id as in VNAK - in case that PING is lost, the VNAK should say SEQ-ID-1! PING must be the frame that is lost because otherwise asterisk would not know that one frame is lost after there is no third packet.
2.
after receiving 2 frames in a row in a very small time slot it seams that maybe the receiving queue is read out wrong. I do not know how this is handled - is it FIFO ? or is the queue handled via iax-timestamps? in this case the two frames have the same timestamp and may be mixed up all the time until time out error..
By: Leif Madsen (lmadsen) 2011-02-18 14:36:30.000-0600

Since this issue is quite old now, is it still an issue?
By: Leif Madsen (lmadsen) 2011-07-26 14:27:37.268-0500

Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested. Further information can be found at http://www.asterisk.org/developers/bug-guidelines