[Home]

Summary:ASTERISK-13185: [patch] IAXy runs ok for a while, then goes bonkers
Reporter:John Covert (jcovert)Labels:
Date Opened:2008-12-09 09:18:45.000-0600Date Closed:2009-03-09 14:30:44
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) chan_iax2.c.test-patch
( 1) chan_iax2.c.test-patch-2
( 2) chan_iax2.c.test-patch-3
( 3) chan_iax2-svn-167265-debug-option.patch
( 4) iaxy
Description:After up to a month, or as little as a day or two, both the IAXy I originally purchased, and a replacement given to me by Stephen Burcham in October (to try to eliminate a specific piece of hardware) become unreliable.  In some cases it just stops processing calls, in the more likely case, documented in the attached log, it goes into a reachable/unreachable loop like this:

[Dec  9 10:01:45] NOTICE[26712]: chan_iax2.c:8832 __iax2_poke_noanswer: Peer 'x29' is now UNREACHABLE! Time: 5
[Dec  9 10:01:45] NOTICE[26712]: chan_iax2.c:7912 socket_process: Peer 'x29' is now REACHABLE! Time: 10
[Dec  9 10:02:29] NOTICE[26712]: chan_iax2.c:8832 __iax2_poke_noanswer: Peer 'x29' is now UNREACHABLE! Time: 10
[Dec  9 10:02:29] NOTICE[26712]: chan_iax2.c:7912 socket_process: Peer 'x29' is now REACHABLE! Time: 13

Since it provides service to the phone next to the main place at which "she who expects phones to work" generally sits, this causes me a major headache.
Comments:By: Russell Bryant (russell) 2008-12-09 10:29:59.000-0600

Since this is an issue with a Digium product, please pursue this issue through support@digium.com

By: John Covert (jcovert) 2009-01-18 19:54:31.000-0600

Hi Russell.  This has nothing to do with the IAXy.  I've reproduced it talking to another Asterisk.  It's a bug in chan_iax2.c.  Here's what's happening:

We POKE at an inopportune time for the peer.  The peer responds with INVAL.  We don't do the necessary call to ast_sched_del(sched, peer->pokeexpire), and when that fires, we get the Peer ... is now UNREACHABLE message (and condition), which clears as soon as the next Poke/Pong sequence works.

For example, look at the "unreachable" at line 697 in the attachment.  Then work your way back to the POKE at line 670.  The SCall (05627) in the POKE matches the DCall in the INVAL at line 672.  But the code handling an INVAL doesn't delete the scheduled event, which fires with the result at line 697.

Every example of an "unreachable" message in the entire attachment has a similar cause.  It can be slightly more complex.  The "unreachable" at line 1343 was caused by the POKE/INVAL pair at lines 1306/1308, even though there was an intervening POKE/PONG pair at lines 1314/1316.  The scheduled event from the poke at line 1306 is completely independent of any other event (of course), and it fires, and we get marked unreachable, and the peer doesn't work.

Although I for a while thought this was caused by a memory leak in the IAXy, since it takes a while for it to start happening, after I noticed that I could make it go away by forcing provisioning of the IAXy, and also noticed that it does sometimes happen to another non-IAXy IAX2 peer, I realized that what happens is that the Registration sequence and the POKE/PONG sequence drift independently over time, and eventually get into a synchronization where PONGs are likely to get the INVAL response.  Jamming a provision down the IAXy's throat changes the timing, and the problem goes away for awhile.

So, an INVAL to a POKE needs to cancel the scheduled event.

May I please have my Karma points back?  :-)  And maybe a few more for figuring this bug out?

/john

By: John Covert (jcovert) 2009-01-18 23:35:05.000-0600

I'm now running with the test patch (against 1.6.0.3) that I've uploaded.  This is the point in routine socket_process where packets with scallno==0 are about to be thrown away (the "return 1").

/john

By: John Covert (jcovert) 2009-01-19 01:11:41.000-0600

test-patch-2 reschedules the poke.  works better.  there might need to be some adjustment of the statistics in here, too, in case we get in synch with something and keep getting invals; or maybe the pokefreq[not]ok intervals need to have a randomizing factor applied to them.

By: John Covert (jcovert) 2009-01-20 10:55:18.000-0600

I've discovered a related separate problem.  Every restart of asterisk causes asterisk to reprovision the IAXy with provisioning version "1624236442".  The IAXy accepts this, and from then on the IAXy responds to REGACKs with INVAL.

Submitting an IAX2 PROVISION command causes Asterisk to offer up both provisioning version "1624236442" and "3538259848" to the IAXy.  By issuing the IAX2 PROVISION command several times, it's possible to get the IAXy to accept version "3538259848".  Only after this is everything "ok".

I'm debugging this now.  To assist in this, I needed to add the ability to put "debug=yes" into iax.conf, a useful feature that some other modules have.  I'm uploading a patch to trunk which implements this.

/john

By: John Covert (jcovert) 2009-01-20 11:02:51.000-0600

Here's what happens right after startup:

Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: REGREQ
  Timestamp: 00002ms  SCall: 11939  DCall: 00000 [192.168.0.17:4569]
  USERNAME        : x29
  REFRESH         : 60
  DEVICE TYPE     : iaxy2
  SERVICE IDENT   : 000000000cc8
  PROVISIONG VER  : 3538259848

Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: REGACK
  Timestamp: 00001ms  SCall: 08831  DCall: 11939 [192.168.0.17:4569]
  USERNAME        : x29
  DATE TIME       : 2009-01-20  11:30:44
  REFRESH         : 60
  APPARENT ADDRES : IPV4 192.168.0.17:4569
  CALLING NUMBER  : 29
  CALLING NAME    : John Covert IAXy

Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: PROVISN
  Timestamp: 00005ms  SCall: 00794  DCall: 00000 [192.168.0.17:4569]
  PROVISIONING    :
      USERNAME        : x29
      PASSWORD        : ............
      LANGUAGE        :
      BINDPORT        : 4569
      SERVERIP        : 192.168.0.100
      SERVERPORT      : 4569
      ALTSERVERIP     : 0.0.0.0
      FLAGS           : 1 (register)
      FORMAT          : 4
      TYPEOFSERVICE   : 0
      PROV VERSION    : 1624236442


Rx-Frame Retry[ No] -- OSeqno: 001 ISeqno: 001 Type: IAX     Subclass: ACK
  Timestamp: 00001ms  SCall: 11939  DCall: 08831 [192.168.0.17:4569]
Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: ACCEPT
  Timestamp: 00002ms  SCall: 06647  DCall: 00794 [192.168.0.17:4569]

By: John Covert (jcovert) 2009-01-20 11:08:26.000-0600

Here is an unsuccessful attempt to get the updated provisioning into the IAXy.  Note that asterisk serves up two different versions, even though there is only one template:

Provisioning '192.168.0.17' with template 'si-000000000cc8'
Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: PROVISN
  Timestamp: 00012ms  SCall: 06920  DCall: 00000 [192.168.0.17:4569]
  PROVISIONING    :
      USERNAME        : x29
      PASSWORD        : ...........
      BINDPORT        : 4569
      SERVERIP        : 192.168.0.100
      SERVERPORT      : 4569
      FLAGS           : 1 (register)
      FORMAT          : 4
      PROV VERSION    : 3538259848


Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: INVAL  
  Timestamp: 00000ms  SCall: 00000  DCall: 06920 [192.168.0.17:4569]
Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: POKE  
  Timestamp: 00004ms  SCall: 00006  DCall: 00000 [71.139.142.37:4569]
Rx-Frame Retry[ No] -- OSeqno: 001 ISeqno: 001 Type: IAX     Subclass: ACK    
  Timestamp: 00004ms  SCall: 00006  DCall: 00001 [71.139.142.37:4569]
Rx-Frame Retry[Yes] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: REGREQ
  Timestamp: 00002ms  SCall: 06767  DCall: 00000 [192.168.0.17:4569]
  USERNAME        : x29
  REFRESH         : 60
  DEVICE TYPE     : iaxy2
  SERVICE IDENT   : 000000000cc8
  PROVISIONG VER  : 1624236442

Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: REGACK
  Timestamp: 00010ms  SCall: 01264  DCall: 06767 [192.168.0.17:4569]
  USERNAME        : x29
  DATE TIME       : 2009-01-20  11:33:24
  REFRESH         : 60
  APPARENT ADDRES : IPV4 192.168.0.17:4569
  CALLING NUMBER  : 29
  CALLING NAME    : John Covert IAXy

Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: PROVISN
  Timestamp: 00012ms  SCall: 16169  DCall: 00000 [192.168.0.17:4569]
  PROVISIONING    :
      USERNAME        : x29
      PASSWORD        : ..........
      LANGUAGE        :
      BINDPORT        : 4569
      SERVERIP        : 192.168.0.100
      SERVERPORT      : 4569
      ALTSERVERIP     : 0.0.0.0
      FLAGS           : 1 (register)
      FORMAT          : 4
      TYPEOFSERVICE   : 0
      PROV VERSION    : 1624236442


Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: INVAL  
  Timestamp: 00000ms  SCall: 06767  DCall: 01264 [192.168.0.17:4569]
Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: INVAL  
  Timestamp: 00000ms  SCall: 00000  DCall: 16169 [192.168.0.17:4569]

By: John Covert (jcovert) 2009-01-20 11:12:55.000-0600

Here's a successful reprovision, but unfortunately asterisk sends both templates, the good "3xxxxx" template followed by the bad "1xxxxx" template, and the IAXy dutifully accepts both, and end up no better off:

Tx-Frame Retry[003] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: PROVISN
  Timestamp: 00012ms  SCall: 06920  DCall: 00000 [192.168.0.17:4569]
  PROVISIONING    :
      USERNAME        : x29
      PASSWORD        : .........
      BINDPORT        : 4569
      SERVERIP        : 192.168.0.100
      SERVERPORT      : 4569
      FLAGS           : 1 (register)
      FORMAT          : 4
      PROV VERSION    : 3538259848


Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: ACCEPT
  Timestamp: 00002ms  SCall: 03484  DCall: 06920 [192.168.0.17:4569]
Tx-Frame Retry[-01] -- OSeqno: 001 ISeqno: 001 Type: IAX     Subclass: ACK    
  Timestamp: 00002ms  SCall: 06920  DCall: 03484 [192.168.0.17:4569]
Tx-Frame Retry[003] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: PROVISN
  Timestamp: 00012ms  SCall: 16169  DCall: 00000 [192.168.0.17:4569]
  PROVISIONING    :
      USERNAME        : x29
      PASSWORD        : ...........
      LANGUAGE        :
      BINDPORT        : 4569
      SERVERIP        : 192.168.0.100
      SERVERPORT      : 4569
      ALTSERVERIP     : 0.0.0.0
      FLAGS           : 1 (register)
      FORMAT          : 4
      TYPEOFSERVICE   : 0
      PROV VERSION    : 1624236442


Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: ACCEPT
  Timestamp: 00002ms  SCall: 07441  DCall: 16169 [192.168.0.17:4569]
Tx-Frame Retry[-01] -- OSeqno: 001 ISeqno: 001 Type: IAX     Subclass: ACK    
  Timestamp: 00002ms  SCall: 16169  DCall: 07441 [192.168.0.17:4569]
Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: REGREQ
  Timestamp: 00002ms  SCall: 08955  DCall: 00000 [192.168.0.17:4569]
  USERNAME        : x29
  REFRESH         : 60
  DEVICE TYPE     : iaxy2
  SERVICE IDENT   : 000000000cc8
  PROVISIONG VER  : 1624236442

Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: REGACK
  Timestamp: 00013ms  SCall: 08445  DCall: 08955 [192.168.0.17:4569]
  USERNAME        : x29
  DATE TIME       : 2009-01-20  11:33:48
  REFRESH         : 60
  APPARENT ADDRES : IPV4 192.168.0.17:4569
  CALLING NUMBER  : 29
  CALLING NAME    : John Covert IAXy

Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: INVAL  
  Timestamp: 00000ms  SCall: 08955  DCall: 08445 [192.168.0.17:4569]

By: John Covert (jcovert) 2009-01-20 11:14:24.000-0600

Just for the record, here's the output of

asterisk*CLI> iax2 show provisioning
== si-000000000cc8 ==
Base Templ:   default
Username:     x29
Secret:       ...........
Language:     <unspecified>
Bind Port:    4569
Server:       192.168.0.100
Server Port:  4569
Alternate:    <unspecified>
Flags:        register
Format:       ulaw
TOS:          0x0

== default ==
Base Templ:   <none>
Username:     <unspecified>
Secret:       <unspecified>
Language:     <unspecified>
Bind Port:    4569
Server:       <unspecified>
Server Port:  4569
Alternate:    <unspecified>
Flags:        register
Format:       ulaw
TOS:          0x0

By: John Covert (jcovert) 2009-01-20 11:25:23.000-0600

Eventually we're lucky, and the last PROVISN coming out of Asterisk is the one for version "3538259848", and after that all is quiet until the next reload of Asterisk, when this whole thing starts over again.

Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: REGREQ
  Timestamp: 00002ms  SCall: 12963  DCall: 00000 [192.168.0.17:4569]
  USERNAME        : x29
  REFRESH         : 60
  DEVICE TYPE     : iaxy2
  SERVICE IDENT   : 000000000cc8
  PROVISIONG VER  : 3538259848

Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 001 Type: IAX     Subclass: REGACK
  Timestamp: 00019ms  SCall: 14937  DCall: 12963 [192.168.0.17:4569]
  USERNAME        : x29
  DATE TIME       : 2009-01-20  11:38:16
  REFRESH         : 60
  APPARENT ADDRES : IPV4 192.168.0.17:4569
  CALLING NUMBER  : 29
  CALLING NAME    : John Covert IAXy

Rx-Frame Retry[ No] -- OSeqno: 001 ISeqno: 001 Type: IAX     Subclass: ACK    
  Timestamp: 00019ms  SCall: 12963  DCall: 14937 [192.168.0.17:4569]



By: John Covert (jcovert) 2009-01-20 11:33:27.000-0600

Work-around for now: comment out entry in iaxprov.conf.  I need to work on some other things and will return to this later unless someone else figures it out.

By: John Covert (jcovert) 2009-01-23 08:17:47.000-0600

Test-Patch-3 includes a tweak in chunk @@ -8179,6 to avoid a possible crash situation, as well as including the work I did for "immediate=yes" (issue ASTERISK-13389) and "debug=yes".  Relative to 1.6.0.3.



By: David Vossel (dvossel) 2009-02-24 15:10:56.000-0600

It doesn't make any sense to me why we would ever receive an INVAL in response to a poke message.  If we do, something is clearly wrong.  Can you reproduce this between two boxes not involving an iaxy and if so could you provide a packet capture that can be opened in wireshark.  

Please open a separate issue for the provisioning problem.

By: David Vossel (dvossel) 2009-03-09 14:29:18

I haven't heard anything about this in awhile.  Feel free to re-open this issue if there's still a problem and its recreatable without involving an iaxy.