[Home]

Summary:ASTERISK-16914: Intermittent loss of IP address of SIP peer prevents outbound calls
Reporter:essd (essd)Labels:
Date Opened:2010-11-04 19:06:33Date Closed:
Priority:MinorRegression?No
Status:Open/NewComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) log-snippet-1.txt
( 1) log-snippet-2.txt
( 2) sip-config.txt
( 3) sip-log.txt
( 4) sip-show-peer.txt
Description:Running Asterisk on an embedded ar71xx device (running OpenWRT r20254). This problem occurs under Asterisk 1.6.2.13, although I was also previously running 1.6.2.1 for about six months prior with identical symptoms.

Every 2-3 weeks, the PBX gets stuck in a state where it can receive calls from a peer, but it becomes completely unable to originate outgoing calls to the same peer until I do a "sip reload". After doing the SIP reload, everything immediately begins works perfectly again. I can make outgoing calls (and receive incoming) until it gets stuck again several weeks later. This is a low-traffic system where it is often a few days between outgoing calls.

When in the failure state and trying to place a call, after Asterisk has transmitted the "100 Trying" message to the local calling extension (an ATA), I see the following "Unable to create channel of type SIP" message in the log:

(see log-snippet-1.txt)

The calling extension then receives a "480 Temporarily Unavailable" and a fast busy.

At this point, doing a "sip show peers" also results in seemingly-normal results, as does "sip show registry". When I do a detailed "sip show mypeername", the one anomalous thing I see is that that the "Addr->IP" setting for my peer is listed as "(Unspecified)".

* Name       : voipms
 Secret       : <Set>
[...]
 ToHost       : dallas.voip.ms
 Addr->IP     : (Unspecified) Port 5060
 Defaddr->IP  : 0.0.0.0 Port 5060
 Prim.Transp. : UDP
 Allowed.Trsp : UDP
[...]

In contrast, after I do a "sip reload", outbound calls start working again and the "sip show" output looks identical except for showing the correct address under "Addr->IP":

* Name       : voipms
 Secret       : <Set>
[...]
 ToHost       : dallas.voip.ms
 Addr->IP     : 74.54.54.178 Port 5060
 Defaddr->IP  : 0.0.0.0 Port 5060
 Prim.Transp. : UDP
 Allowed.Trsp : UDP
[...]

(The full text of "sip show peer mypeername", when in the latter good state, is attached as sip-show-peer.txt. The only difference between this and the bad-state version is the Addr->IP.)

It seems like Asterisk has somehow lost the IP address of the peer, but I do not know why.

One thing potentially related is that, in the previous registration to the peer (two minutes prior to my failed call), we do the usual REGISTER/Unauthorized+Nonce/REGISTER+Response/OK business. Immediately after that, we get a NOTIFY from the remote, which Asterisk responds to with a 489 Bad Event:

(see log-snippet-2.txt)

Immediately after that exchange, I saw the following curious set of
messages:

[Oct 27 18:44:58] DEBUG[1676]: chan_sip.c:3608 __sip_xmit: Trying to put
'SIP/2.0 489' onto UDP socket destined for 74.54.54.178:5060
[Oct 27 18:44:58] DEBUG[1676]: chan_sip.c:22105 handle_request_do: Invalid
SIP message - rejected , no callid, len 541

Could those messages be related to my problem? I see this "489 Bad Event" issue may be related to ASTERISK-16133, but it's unclear if this can somehow cause the SIP remote peer address to get lost, as opposed to just being potentially bad manners with our SIP peer.

My sip.conf is as follows:

(see sip-config.txt)

A full debug log of trying to place an outbound call is listed here. The call attempt starts at 19:02:17.

(see sip-log.txt)

Note that I enabled these logs after I discovered the problem, so the system was already in some sort of invalid state from the beginning of the detailed debug log.

Is any additional information needed to look into this? I am trying now to capture full debug logging 24/7 (which is rather tricky given space considerations on the embedded device), just in case I can somehow trace down the source of the problem as it happens.

It may be another couple of weeks before the problem surfaces again, and since I don't know exactly what I am looking for, I'd love to find out if anyone has specific advice or types of debug output to capture. I've already reviewed the issue submission guidelines, and checking in with the asterisk-users list and #asterisk channels did not yield any suggestions.
Comments:By: Paul Belanger (pabelanger) 2010-11-04 19:35:07

You might benefit setting:
qualify=yes

for your voip.ms peer.

From the looks of it, it's loosing registration.

By: essd (essd) 2010-11-04 19:41:54

OK, I will try qualify=yes (with qualifyfreq=300) and see what happens.

Any ideas on what types of things cause Asterisk to lose registration?

By: Paul Belanger (pabelanger) 2010-11-04 21:27:48

At the moment, no.  However qualify should atleast help monitor the issue. I don't see anything specific to the log file you uploaded.  If you are able to reproduce this, you'll need to capture a debug log (see below) which will hopefully show the problem.

However, your solutions may also be to decrease the amount of time between registration attempts.

---
We require a complete debug log to help triage the issue.

This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue:

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: essd (essd) 2010-11-07 12:59:07.000-0600

I have added the appropriate instrumentation (including hourly checkpoints of various SIP data) to our Asterisk installation. Hopefully, it will catch something the next time it happens, which may be a few weeks...

By: Gabriel Ortiz Lour (elbriga) 2010-11-17 18:52:00.000-0600

Dont know if has to do with it, but I have problems with dynamic changing IP addresses on SIP trunk hosts.

I have a trunk configured using no-ip's name as host, and when the no-ip's host change IP asterisk don resolve it again, and it became UNREACHEABLE until a sip reload is issued.

By: essd (essd) 2010-11-17 19:54:46.000-0600

Hmm...interesting. Although my SIP trunk host has a permanent IP address, on my side, my ISP forces me to change IPs with annoying frequency (every 12 hours). I wonder if perhaps it gets stuck when the periodic registration interval happens to occur during the time when my DHCP lease is being renewed and I have no public IP address?

By: essd (essd) 2010-12-30 20:07:59.000-0600

The qualify statement did not stop this problem from occurring, since it just happened again today. Regrettably, I had rebooted my device a few days earlier and I forgot to restart the debug logging, so I do not yet have any non-cursory logs that document the issue. I will keep trying to catch it.