Summary:ASTERISK-06381: User/friend registrations timing out prematurely
Reporter:J.R. (jrast)Labels:
Date Opened:2006-02-20 19:54:45.000-0600Date Closed:2006-03-08 01:43:53.000-0600
Versions:Frequency of
Description:I have several phones which are configured as friends with "host=dynamic" and no defaultip.  They register OK, after which, "sip show peers" shows, for example:
1234/1234            D          5060     Unmonitored
A typical registration log is:
   -- Registered SIP '1234' at port 5060 expires 3600
   -- Saved useragent "Grandstream HT286" for peer 1234
However, after much less than the 1 hour registration timeout that was given, I notice that A* has lost the registration, showing:
1234/1234                  (Unspecified)    D          0        Unmonitored
in the "sip show peers" output.  The valid registration lasts only around 10 minutes, or so (I haven't timed it precisely).  Eventually, when the phone reregisters 1 hour after its previous registraion, A* is happy once more, but again, only for a short time.

Obviously, for the majority of the time once the registration has been lost, A* will not pass calls to this phone.  Adding a "defaultip" fixes that, but this is not a proper fix as it doesn't help DHCP-assigned users.


Running on FreeBSD, using port net/asterisk-current, up-to-date A* download; also using Bamby's codec-negotiation patch from bug 4825.

This problem does not occur when using A* 1.2.4 also with codec-nego patch.  On 1.2.4, the registrations remain valid right until reregistration.
Comments:By: Michael Willmott (navgar) 2006-02-21 08:26:48.000-0600

Issue also present here.  Running a clean build from svn trunk revision 10558 (built last night).

By: Olle Johansson (oej) 2006-02-22 08:53:33.000-0600

As required by the bug guidelines, we need a SIP debug output. Set debug to 4, verbose to 4 and enable sip debugging.

By: J.R. (jrast) 2006-02-22 13:19:56.000-0600

I'm not sure what SIP debug output to include.  The problem is precisely that WAY before any SIP is due from the phone, the registration is being expired.  There is no useful SIP debug to include in this bug report.  I do have some other useful info, though.

I am running with debug on and DUMP_SCHEDULER enabled for sched.c at the moment.  I have noticed that old expire_register scheduler events are not being cleared over an A* reload, causing an existing expire_register event to remain scheduled.  Since the phone's registration is reloaded from astdb, that's probably OK because you still want to expire the reloaded registration at some point.  However, after the A* reload, "sip show peer 1234" shows -1 for the "Expire" scheduler event for the phone meaning that we no longer know that an expire_register event is scheduled for this phone.

Now, if the phone is then forced to reregister (e.g., by rebooting it), a second expire_register event is then scheduled for the same phone.  This seems wrong.

In fact, at this very moment, I have four expire_registers scheduled for my test phone, which I have done simply by performing four asterisk reload followed by phone reboot sequences.  And sure enough, I am seeing the phone become "Unregistered" when the first of these scheduler events occurs.

The problem may be that we need to delete any existing expire_register scheduler event when we a processing a new register from a phone.

I am still trying to see if the code here has changed between A*-1.2.4 and A*-current; however, I need to go do something else now, so I'm posting these findings for the time being, in case this helps anyone else realize what the fix to this problem may be.

By: Olle Johansson (oej) 2006-02-23 06:10:27.000-0600

I want to see a full debug of the registrations, to make sure they are ok. Set debug to 4, verbose to 4 and turn on siphistory, dumphistory and sip debugging. Thanks.

By: J.R. (jrast) 2006-02-23 08:40:32.000-0600

Things are working fine on A*-1.2.4 and things have also been fine for ages using earlier versions, ergo, I am inferring that the SIP registration is fine and that the problem lies in a change between 1.2.4 and current.

Having looked further into this, there IS a difference in the behavior between 1.2.4 and current.  On both 1.2.4 and current, when I issue a reload, all scheduler events are retained.  However, on 1.2.4, after the reload, "sip show peer xxxx" shows p->expire remains valid (pointing to the old scheduler event's ID number).  On current, p->expire is cleared to -1, essentially orphaning the expire_register event.  This is the problem.  On 1.2.4, since we can still identify the old scheduler event, it will be deleted and replaced with a new one when the phone reregisters.  However, on current, the orphaned expire_register event is no longer identifiable to that phone, so it will remain scheduled and a duplicate event is created when the phone reregisters.

The two (or more) expire_register events ensure that the phone will remain permanently unregistered because, in the long term, the phone's reregistration will be followed (10 seconds later) by execution of the orphaned expire_register event which unregisters the phone -- and also orphans the current expire_register event too -- ensuring that the condition will be perpetuated for each subsequent reregister.

I'll try to spend some time on comparing the code that performs the reload to figure out where p->expire is being cleared, or not reset, later today.

To repeat this, set up a sip.conf entry with type=friend and host=dynamic.  Set the phone's registation period to something long.  Start A* and allow the phone to register.  Run "sip show peer xxxx" and see "Expire" has the ID number of the expire_register scheduler event.  (Running also with DUMP_SCHEDULER defined when compiling sched.c will confirm this by displaying the scheduler events periodically.)  Issue a "reload" command.  Run "sip show peer xxxx" again.  On 1.2.4, "Expire" will remain the same scheduler_event number.  On current, it is now -1.

By: J.R. (jrast) 2006-02-23 09:36:33.000-0600

OK, here's the problem and the patch.

In build_peer(), set_peer_defaults() is now being called regardless of whether this is an existing peer or a new one, which causes an existing peer's p->expire to be lost.  The fix is to only call set_peer_defaults() if this is a new peer:
--- channels/chan_sip.c.orig    Sun Feb 19 12:01:11 2006
+++ channels/chan_sip.c Thu Feb 23 11:33:44 2006
@@ -12132,8 +12132,8 @@
       peer->lastmsgssent = -1;
       oldha = peer->ha;
       peer->ha = NULL;
-       set_peer_defaults(peer);        /* Set peer defaults */
       if (!found) {
+               set_peer_defaults(peer);        /* Set peer defaults */
               if (name)
                       ast_copy_string(peer->name, name, sizeof(peer->name));
               peer->addr.sin_port = htons(DEFAULT_SIP_PORT);

By: deti (deti) 2006-02-23 17:15:10.000-0600

Thanks, this patch definitely fixes the problem for me.

By: Olle Johansson (oej) 2006-03-07 14:59:42.000-0600

Fixed in trunk. Try latest svn trunk. Thanks.

By: Matt O'Gorman (mogorman) 2006-03-07 15:55:02.000-0600

thanks oej what was the commit

By: Olle Johansson (oej) 2006-03-08 01:43:53.000-0600

This was fixed when fixing another issue - 6627. Revision 12400.