[Home]

Summary:ASTERISK-09905: chan_iax2 sometimes incorrectly destroys channel immediately after sending NEW and getting AUTHREQ
Reporter:Steve Davies . (stevedavies)Labels:
Date Opened:2007-07-19 11:38:13Date Closed:2007-08-02 09:50:53
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:

See this debug log.

[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06] Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: NEW    
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    Timestamp: 00009ms  SCall: 00093  DCall: 00000 [196.40.106.60:4569]
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    VERSION         : 2
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    CALLED NUMBER   : 0216736730
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    CODEC_PREFS     : (g729|ilbc|gsm|speex|alaw|ulaw)
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]     -- Called phonehome-1/0216736730
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    CALLING NUMBER  : 0741003202
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    CALLING PRESNTN : 3
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    CALLING TYPEOFN : 1
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    CALLING TRANSIT : 0
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    CALLING NAME    :
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    LANGUAGE        : en
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    USERNAME        : voipconnect
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    FORMAT          : 8
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    CAPABILITY      : 1632014
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    ADSICPE         : 2
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    DATE TIME       : 2007-07-19  16:40:06
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06] Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: AUTHREQ
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    Timestamp: 00010ms  SCall: 00008  DCall: 00093 [196.40.106.60:4569]
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    AUTHMETHODS     : 2
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    CHALLENGE       : 227338768
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]    USERNAME        : voipconnect
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]
[Jul 19 16:40:06] DEBUG[4509] chan_iax2.c: Received packet 0, (6, 8)
[Jul 19 16:40:06] DEBUG[4509] chan_iax2.c: Cancelling transmission of packet 0
[Jul 19 16:40:06] DEBUG[4509] chan_iax2.c: Really destroying 93, having been acked on final message
[Jul 19 16:40:06] DEBUG[4509] chan_iax2.c: IAX subclass 8 received
[Jul 19 16:40:06] DEBUG[4509] devicestate.c: Changing state for IAX2/phonehome-1 - state 6 (Ringing)
[Jul 19 16:40:06] DEBUG[4509] chan_iax2.c: For call=93, set last=10
[Jul 19 16:40:06] DEBUG[4509] channel.c: Hanging up channel 'IAX2/phonehome-1-93'
[Jul 19 16:40:06] DEBUG[4509] chan_iax2.c: We're hanging up IAX2/phonehome-1-93 now...
[Jul 19 16:40:06] DEBUG[4509] chan_iax2.c: Really destroying IAX2/phonehome-1-93 now...
[Jul 19 16:40:06] VERBOSE[4509] logger.c: [Jul 19 16:40:06]     -- Hungup 'IAX2/phonehome-1-93'


In it you see a NEW being sent.
Then, when the AUTHREQ arrives, the code is busy marking packets in the send queue as ACKED.
It finds the NEW, and for some reason that NEW is marked as "final", so the code hangs up the channel.

Here's the code loop:

                           /* The acknowledgement is within our window.  Time to acknowledge everything
                              that it says to */
                           for (x=iaxs[fr->callno]->rseqno; x != fr->iseqno; x++) {
                                   /* Ack the packet with the given timestamp */
                                   if (iaxdebug)
                                           ast_debug(1, "Cancelling transmission of packet %d\n", x);
                                   call_to_destroy = 0;
                                   AST_LIST_LOCK(&frame_queue);
                                   AST_LIST_TRAVERSE(&frame_queue, cur, list) {
                                           /* If it's our call, and our timestamp, mark -1 retries */
                                           if ((fr->callno == cur->callno) && (x == cur->oseqno)) {
                                                   cur->retries = -1;
                                                   /* Destroy call if this is the end (final message has been acked) */
                                                   if (cur->final)
                                                           call_to_destroy = fr->callno;
                                           }
                                   }
                                   AST_LIST_UNLOCK(&frame_queue);
                                   if (call_to_destroy) {
                                           if (iaxdebug)
                                                   ast_debug(1, "Really destroying %d, having been acked on final message\n", call_to_destroy);
                                           iax2_destroy(call_to_destroy);
                                   }
                           }

I cannot figure out how the NEW got marked as final, so I suppose we must be looking at some race meaning we are looking at the wrong frame, or else some memory overwrite.

We do have lots of registering hosts, so lots of short REGREQ "calls", so quite a few "final" packets in the queue.

Steve
Comments:By: Steve Davies . (stevedavies) 2007-07-20 01:53:10

Further to this, I added some extra debugging to my source:

[Jul 20 08:43:47] DEBUG[11808] chan_iax2.c: Sending 1 on 84/0 to 196.40.106.60:4569
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47] Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: NEW    
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    Timestamp: 00001ms  SCall: 00084  DCall: 00000 [196.40.106.60:4569]
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    VERSION         : 2
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    CALLED NUMBER   : 0216736730
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    CODEC_PREFS     : (g729|ilbc|gsm|speex|alaw|ulaw)
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    CALLING NUMBER  : 0741003202
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    CALLING PRESNTN : 3
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    CALLING TYPEOFN : 1
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    CALLING TRANSIT : 0
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    CALLING NAME    :
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    LANGUAGE        : en
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    USERNAME        : voipconnect
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    FORMAT          : 8
[Jul 20 08:43:47] DEBUG[11808] devicestate.c: Changing state for IAX2/phonehome-1 - state 6 (Ringing)
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    CAPABILITY      : 1632014
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    ADSICPE         : 2
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    DATE TIME       : 2007-07-20  08:43:46
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47] Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: AUTHREQ
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    Timestamp: 00004ms  SCall: 00012  DCall: 00084 [196.40.106.60:4569]
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    AUTHMETHODS     : 2
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    CHALLENGE       : 586202265
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]    USERNAME        : voipconnect
[Jul 20 08:43:47] VERBOSE[11808] logger.c: [Jul 20 08:43:47]
[Jul 20 08:43:47] DEBUG[11808] chan_iax2.c: Received packet 0, (6, 8)
[Jul 20 08:43:47] DEBUG[11808] chan_iax2.c: Cancelling transmission of packet 0
[Jul 20 08:43:47] DEBUG[11808] chan_iax2.c: We think we got acked on 0 (6,3) frame on call 84 with final flag set(arriving frame is for call 84)
[Jul 20 08:43:47] DEBUG[11808] chan_iax2.c: Really destroying 84, having been acked on final message

So frametype 6 is IAX, as expected.  But subtype 3 is a PONG, which is NOT expected because the only frame we sent on this call (call 84) was a NEW (subtype=1)

By: Russell Bryant (russell) 2007-07-21 01:37:14

For better or worse, this is now the second report I have seen where entries in the global frame queue appear to be corrupted.  There is definitely something going on here.  I am interested to look into this in more detail, but it will most likely be a couple weeks before I can ...

By: Steve Davies . (stevedavies) 2007-07-21 10:41:50

Hi Russell,

I added a little more logging, and also added an extra check in this code loop - if the frame in the frame queue is already dead (retries < 0) then I don't kill the call.

This is on a tentative theory that there was a "stale" frame stuck in the queue from a previous use of that call number.  We have about 100 systems registering and qualifying to us, so lots of short "calls".  We've had a few days with lots of problems on the South African Internet, leading to lost packets etc etc - so perhaps there is an issue somewhere with cleaning up those calls which lead to an old packet sticking around long enough to still be there when the call number gets reused.

Since then I tried to recreate the problem and couldn't.

By the way, I have one guy here who says this bug has always been around - even in 1.2.

I might try to "instrument" the frame queue a bit more so I can see exactly what's in it and how old they are.  

This suprises me a bit.  I have no active calls on the box at the moment, but still:

voipconnect*CLI> core show channels
Channel              Location             State   Application(Data)
0 active channels0 of 200 max active calls ( 0.00% of capacity)

voipconnect*CLI> iax2 show stats

   IAX Statistics
---------------------
Outstanding frames: 17 (0 ingress, 17 egress)
0 timed and 0 untimed transmits; MTU 178/178/1240
Packets in transmit queue: 14 dead, 14 final, 17 total

Suprises me there are 17 packets in the transmit queue, many of which must be both dead and final.  

Maybe there is a logic error in the new multi-threaded stuff that results in packets occasionally being left in the transmit queue.

Steve

By: Steve Davies . (stevedavies) 2007-08-02 00:49:00

Steve thinks this issue will probably be solved by the patch on bug 10289.

By: Russell Bryant (russell) 2007-08-02 09:50:53

I'm closing this as fixed as the patch from 10289 probably fixed this problem.  If you still have a problem after upgrading, please let me know.  Thanks!