Summary: | ASTERISK-13185: [patch] IAXy runs ok for a while, then goes bonkers | ||
Reporter: | John Covert (jcovert) | Labels: | |
Date Opened: | 2008-12-09 09:18:45.000-0600 | Date Closed: | 2009-03-09 14:30:44 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_iax2 |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) chan_iax2.c.test-patch ( 1) chan_iax2.c.test-patch-2 ( 2) chan_iax2.c.test-patch-3 ( 3) chan_iax2-svn-167265-debug-option.patch ( 4) iaxy | |
Description: | After up to a month, or as little as a day or two, both the IAXy I originally purchased, and a replacement given to me by Stephen Burcham in October (to try to eliminate a specific piece of hardware) become unreliable. In some cases it just stops processing calls, in the more likely case, documented in the attached log, it goes into a reachable/unreachable loop like this: [Dec 9 10:01:45] NOTICE[26712]: chan_iax2.c:8832 __iax2_poke_noanswer: Peer 'x29' is now UNREACHABLE! Time: 5 [Dec 9 10:01:45] NOTICE[26712]: chan_iax2.c:7912 socket_process: Peer 'x29' is now REACHABLE! Time: 10 [Dec 9 10:02:29] NOTICE[26712]: chan_iax2.c:8832 __iax2_poke_noanswer: Peer 'x29' is now UNREACHABLE! Time: 10 [Dec 9 10:02:29] NOTICE[26712]: chan_iax2.c:7912 socket_process: Peer 'x29' is now REACHABLE! Time: 13 Since it provides service to the phone next to the main place at which "she who expects phones to work" generally sits, this causes me a major headache. | ||
Comments: | By: Russell Bryant (russell) 2008-12-09 10:29:59.000-0600 Since this is an issue with a Digium product, please pursue this issue through support@digium.com By: John Covert (jcovert) 2009-01-18 19:54:31.000-0600 Hi Russell. This has nothing to do with the IAXy. I've reproduced it talking to another Asterisk. It's a bug in chan_iax2.c. Here's what's happening: We POKE at an inopportune time for the peer. The peer responds with INVAL. We don't do the necessary call to ast_sched_del(sched, peer->pokeexpire), and when that fires, we get the Peer ... is now UNREACHABLE message (and condition), which clears as soon as the next Poke/Pong sequence works. For example, look at the "unreachable" at line 697 in the attachment. Then work your way back to the POKE at line 670. The SCall (05627) in the POKE matches the DCall in the INVAL at line 672. But the code handling an INVAL doesn't delete the scheduled event, which fires with the result at line 697. Every example of an "unreachable" message in the entire attachment has a similar cause. It can be slightly more complex. The "unreachable" at line 1343 was caused by the POKE/INVAL pair at lines 1306/1308, even though there was an intervening POKE/PONG pair at lines 1314/1316. The scheduled event from the poke at line 1306 is completely independent of any other event (of course), and it fires, and we get marked unreachable, and the peer doesn't work. Although I for a while thought this was caused by a memory leak in the IAXy, since it takes a while for it to start happening, after I noticed that I could make it go away by forcing provisioning of the IAXy, and also noticed that it does sometimes happen to another non-IAXy IAX2 peer, I realized that what happens is that the Registration sequence and the POKE/PONG sequence drift independently over time, and eventually get into a synchronization where PONGs are likely to get the INVAL response. Jamming a provision down the IAXy's throat changes the timing, and the problem goes away for awhile. So, an INVAL to a POKE needs to cancel the scheduled event. May I please have my Karma points back? :-) And maybe a few more for figuring this bug out? /john By: John Covert (jcovert) 2009-01-18 23:35:05.000-0600 I'm now running with the test patch (against 1.6.0.3) that I've uploaded. This is the point in routine socket_process where packets with scallno==0 are about to be thrown away (the "return 1"). /john By: John Covert (jcovert) 2009-01-19 01:11:41.000-0600 test-patch-2 reschedules the poke. works better. there might need to be some adjustment of the statistics in here, too, in case we get in synch with something and keep getting invals; or maybe the pokefreq[not]ok intervals need to have a randomizing factor applied to them. By: John Covert (jcovert) 2009-01-20 10:55:18.000-0600 I've discovered a related separate problem. Every restart of asterisk causes asterisk to reprovision the IAXy with provisioning version "1624236442". The IAXy accepts this, and from then on the IAXy responds to REGACKs with INVAL. Submitting an IAX2 PROVISION command causes Asterisk to offer up both provisioning version "1624236442" and "3538259848" to the IAXy. By issuing the IAX2 PROVISION command several times, it's possible to get the IAXy to accept version "3538259848". Only after this is everything "ok". I'm debugging this now. To assist in this, I needed to add the ability to put "debug=yes" into iax.conf, a useful feature that some other modules have. I'm uploading a patch to trunk which implements this. /john By: John Covert (jcovert) 2009-01-20 11:02:51.000-0600 Here's what happens right after startup: Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: REGREQ Timestamp: 00002ms SCall: 11939 DCall: 00000 [192.168.0.17:4569] USERNAME : x29 REFRESH : 60 DEVICE TYPE : iaxy2 SERVICE IDENT : 000000000cc8 PROVISIONG VER : 3538259848 Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 001 Type: IAX Subclass: REGACK Timestamp: 00001ms SCall: 08831 DCall: 11939 [192.168.0.17:4569] USERNAME : x29 DATE TIME : 2009-01-20 11:30:44 REFRESH : 60 APPARENT ADDRES : IPV4 192.168.0.17:4569 CALLING NUMBER : 29 CALLING NAME : John Covert IAXy Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: PROVISN Timestamp: 00005ms SCall: 00794 DCall: 00000 [192.168.0.17:4569] PROVISIONING : USERNAME : x29 PASSWORD : ............ LANGUAGE : BINDPORT : 4569 SERVERIP : 192.168.0.100 SERVERPORT : 4569 ALTSERVERIP : 0.0.0.0 FLAGS : 1 (register) FORMAT : 4 TYPEOFSERVICE : 0 PROV VERSION : 1624236442 Rx-Frame Retry[ No] -- OSeqno: 001 ISeqno: 001 Type: IAX Subclass: ACK Timestamp: 00001ms SCall: 11939 DCall: 08831 [192.168.0.17:4569] Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX Subclass: ACCEPT Timestamp: 00002ms SCall: 06647 DCall: 00794 [192.168.0.17:4569] By: John Covert (jcovert) 2009-01-20 11:08:26.000-0600 Here is an unsuccessful attempt to get the updated provisioning into the IAXy. Note that asterisk serves up two different versions, even though there is only one template: Provisioning '192.168.0.17' with template 'si-000000000cc8' Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: PROVISN Timestamp: 00012ms SCall: 06920 DCall: 00000 [192.168.0.17:4569] PROVISIONING : USERNAME : x29 PASSWORD : ........... BINDPORT : 4569 SERVERIP : 192.168.0.100 SERVERPORT : 4569 FLAGS : 1 (register) FORMAT : 4 PROV VERSION : 3538259848 Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: INVAL Timestamp: 00000ms SCall: 00000 DCall: 06920 [192.168.0.17:4569] Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE Timestamp: 00004ms SCall: 00006 DCall: 00000 [71.139.142.37:4569] Rx-Frame Retry[ No] -- OSeqno: 001 ISeqno: 001 Type: IAX Subclass: ACK Timestamp: 00004ms SCall: 00006 DCall: 00001 [71.139.142.37:4569] Rx-Frame Retry[Yes] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: REGREQ Timestamp: 00002ms SCall: 06767 DCall: 00000 [192.168.0.17:4569] USERNAME : x29 REFRESH : 60 DEVICE TYPE : iaxy2 SERVICE IDENT : 000000000cc8 PROVISIONG VER : 1624236442 Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 001 Type: IAX Subclass: REGACK Timestamp: 00010ms SCall: 01264 DCall: 06767 [192.168.0.17:4569] USERNAME : x29 DATE TIME : 2009-01-20 11:33:24 REFRESH : 60 APPARENT ADDRES : IPV4 192.168.0.17:4569 CALLING NUMBER : 29 CALLING NAME : John Covert IAXy Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: PROVISN Timestamp: 00012ms SCall: 16169 DCall: 00000 [192.168.0.17:4569] PROVISIONING : USERNAME : x29 PASSWORD : .......... LANGUAGE : BINDPORT : 4569 SERVERIP : 192.168.0.100 SERVERPORT : 4569 ALTSERVERIP : 0.0.0.0 FLAGS : 1 (register) FORMAT : 4 TYPEOFSERVICE : 0 PROV VERSION : 1624236442 Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: INVAL Timestamp: 00000ms SCall: 06767 DCall: 01264 [192.168.0.17:4569] Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: INVAL Timestamp: 00000ms SCall: 00000 DCall: 16169 [192.168.0.17:4569] By: John Covert (jcovert) 2009-01-20 11:12:55.000-0600 Here's a successful reprovision, but unfortunately asterisk sends both templates, the good "3xxxxx" template followed by the bad "1xxxxx" template, and the IAXy dutifully accepts both, and end up no better off: Tx-Frame Retry[003] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: PROVISN Timestamp: 00012ms SCall: 06920 DCall: 00000 [192.168.0.17:4569] PROVISIONING : USERNAME : x29 PASSWORD : ......... BINDPORT : 4569 SERVERIP : 192.168.0.100 SERVERPORT : 4569 FLAGS : 1 (register) FORMAT : 4 PROV VERSION : 3538259848 Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX Subclass: ACCEPT Timestamp: 00002ms SCall: 03484 DCall: 06920 [192.168.0.17:4569] Tx-Frame Retry[-01] -- OSeqno: 001 ISeqno: 001 Type: IAX Subclass: ACK Timestamp: 00002ms SCall: 06920 DCall: 03484 [192.168.0.17:4569] Tx-Frame Retry[003] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: PROVISN Timestamp: 00012ms SCall: 16169 DCall: 00000 [192.168.0.17:4569] PROVISIONING : USERNAME : x29 PASSWORD : ........... LANGUAGE : BINDPORT : 4569 SERVERIP : 192.168.0.100 SERVERPORT : 4569 ALTSERVERIP : 0.0.0.0 FLAGS : 1 (register) FORMAT : 4 TYPEOFSERVICE : 0 PROV VERSION : 1624236442 Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX Subclass: ACCEPT Timestamp: 00002ms SCall: 07441 DCall: 16169 [192.168.0.17:4569] Tx-Frame Retry[-01] -- OSeqno: 001 ISeqno: 001 Type: IAX Subclass: ACK Timestamp: 00002ms SCall: 16169 DCall: 07441 [192.168.0.17:4569] Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: REGREQ Timestamp: 00002ms SCall: 08955 DCall: 00000 [192.168.0.17:4569] USERNAME : x29 REFRESH : 60 DEVICE TYPE : iaxy2 SERVICE IDENT : 000000000cc8 PROVISIONG VER : 1624236442 Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 001 Type: IAX Subclass: REGACK Timestamp: 00013ms SCall: 08445 DCall: 08955 [192.168.0.17:4569] USERNAME : x29 DATE TIME : 2009-01-20 11:33:48 REFRESH : 60 APPARENT ADDRES : IPV4 192.168.0.17:4569 CALLING NUMBER : 29 CALLING NAME : John Covert IAXy Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: INVAL Timestamp: 00000ms SCall: 08955 DCall: 08445 [192.168.0.17:4569] By: John Covert (jcovert) 2009-01-20 11:14:24.000-0600 Just for the record, here's the output of asterisk*CLI> iax2 show provisioning == si-000000000cc8 == Base Templ: default Username: x29 Secret: ........... Language: <unspecified> Bind Port: 4569 Server: 192.168.0.100 Server Port: 4569 Alternate: <unspecified> Flags: register Format: ulaw TOS: 0x0 == default == Base Templ: <none> Username: <unspecified> Secret: <unspecified> Language: <unspecified> Bind Port: 4569 Server: <unspecified> Server Port: 4569 Alternate: <unspecified> Flags: register Format: ulaw TOS: 0x0 By: John Covert (jcovert) 2009-01-20 11:25:23.000-0600 Eventually we're lucky, and the last PROVISN coming out of Asterisk is the one for version "3538259848", and after that all is quiet until the next reload of Asterisk, when this whole thing starts over again. Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: REGREQ Timestamp: 00002ms SCall: 12963 DCall: 00000 [192.168.0.17:4569] USERNAME : x29 REFRESH : 60 DEVICE TYPE : iaxy2 SERVICE IDENT : 000000000cc8 PROVISIONG VER : 3538259848 Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 001 Type: IAX Subclass: REGACK Timestamp: 00019ms SCall: 14937 DCall: 12963 [192.168.0.17:4569] USERNAME : x29 DATE TIME : 2009-01-20 11:38:16 REFRESH : 60 APPARENT ADDRES : IPV4 192.168.0.17:4569 CALLING NUMBER : 29 CALLING NAME : John Covert IAXy Rx-Frame Retry[ No] -- OSeqno: 001 ISeqno: 001 Type: IAX Subclass: ACK Timestamp: 00019ms SCall: 12963 DCall: 14937 [192.168.0.17:4569] By: John Covert (jcovert) 2009-01-20 11:33:27.000-0600 Work-around for now: comment out entry in iaxprov.conf. I need to work on some other things and will return to this later unless someone else figures it out. By: John Covert (jcovert) 2009-01-23 08:17:47.000-0600 Test-Patch-3 includes a tweak in chunk @@ -8179,6 to avoid a possible crash situation, as well as including the work I did for "immediate=yes" (issue ASTERISK-13389) and "debug=yes". Relative to 1.6.0.3. By: David Vossel (dvossel) 2009-02-24 15:10:56.000-0600 It doesn't make any sense to me why we would ever receive an INVAL in response to a poke message. If we do, something is clearly wrong. Can you reproduce this between two boxes not involving an iaxy and if so could you provide a packet capture that can be opened in wireshark. Please open a separate issue for the provisioning problem. By: David Vossel (dvossel) 2009-03-09 14:29:18 I haven't heard anything about this in awhile. Feel free to re-open this issue if there's still a problem and its recreatable without involving an iaxy. |