[Home]

Summary:ASTERISK-12869: sip show inuse count is negative
Reporter:mjc (mjc)Labels:
Date Opened:2008-10-10 15:02:29Date Closed:2008-11-18 14:23:55.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) hintfix_rev151414.patch
( 1) hintfix_trunk_rev152649.patch
Description:I have a bunch of snom 360s and asterisk-1.6.0-rc6. The snom BLF LEDs monitor hints. However, the hints are getting confused and inverting state sometimes. That is, unused extensions are being shown as in-use, and vice-versa.

If I run "sip show inuse" during these times, I see entries like this:

steerpike*CLI> sip show inuse
* User name               In use          Limit
mjc_server                0               10
mjc_library               0               10
mjc_lab                   0               10
mjc_office                0               10
mjc_home                  0               10
* Peer name               In use          Limit
mjc_server                0/0/0           10
mjc_library               0/0/0           10
mjc_lab                   -1/0/0          10
mjc_office                0/0/0           10
mjc_home                  0/0/0           10

Note the "-1"! This is nonsense, and results in an "in use" hint:

steerpike*CLI> core show hints
                   601@internal            : SIP/mjc_office&SIP/m  State:InUse           Watchers 11

where the defined hint is:

exten => 601,hint,SIP/mjc_office&SIP/mjc_home&SIP/mjc_lab&SIP/mjc_server&SIP/mjc_library


Somehow, the call count is being decremented by 2 after a hangup (from 1 to -1). Call counts should be clamped at zero, I would suggest.

This problem did not occur on the same system when we had 1.2.x and 1.4.x running.

- Mike
Comments:By: mjc (mjc) 2008-10-10 15:23:48

This also happens on 1.6.0.1.

- Mike

By: Mark Michelson (mmichelson) 2008-10-10 15:44:50

There  are checks in the code to make sure that the inuse count on a sip user/peer does not drop below 0. However, it looks like even though we use atomic operations to change the values, there is no lock around the code, which still allows for race conditions. I think it may be possible to resolve this by making sure to hold a lock around the critical section.

Since there is just one function in all of the SIP code which adjusts this value, it seems to me as though adding a lock is exactly what is needed to get this to work correctly.

I normally like to upload patches and get users to test and report back, but considering how randomly this issue probably occurs and the fact that it may take anywhere from a few days to a few weeks with the patch applied to see if the problem recurs, I'm going to apply the locking idea I described and commit the fix to trunk/1.6.0/1.6.1 and close this issue. If you encounter the issue again after using an updated version, then please do not hesitate to re-open this bug. Thanks for the report.

By: Mark Michelson (mmichelson) 2008-10-10 15:51:43

Wow, ignore my last comment about the value being made to stay above 0. It appears that someone changed the logic in 1.6.0 and didn't realize what they were doing. I still am going to place the lock around the section because it does need it.

By: Mark Michelson (mmichelson) 2008-10-10 16:05:22

Eh, my mind must be off. The atomic additions are exactly what makes it so we don't need a lock here. I'll just add the check to keep the count above 0 exactly as you posed in the original description.

By: Digium Subversion (svnbot) 2008-10-10 16:07:50

Repository: asterisk
Revision: 148373

U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r148373 | mmichelson | 2008-10-10 16:07:50 -0500 (Fri, 10 Oct 2008) | 8 lines

Make sure that the inUse and inRinging fields for
a sip peer cannot go below zero. This is a regression
from 1.4 and so it will be applied to 1.6.0 as well.

(closes issue ASTERISK-12869)
Reported by: mjc


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=148373

By: Mark Michelson (mmichelson) 2008-10-22 16:34:47

wolfelectronic on IRC brought up the fact that he still has this issue, so I am re-opening this.

By: Chris Walton (crjw) 2008-10-22 17:24:20

I am using 1.6.0-rc6 on Solaris.
I have not tried the trunk version which has a new fix for the negative counters.
But looking at the new code, it looks like the checking for "less than zero" happens before the critical section.  Methinks this checking should be part of the critical section.

I can consistently generate an "inuse" count of "-1" using the following two line dialplan:
exten => 750,1,Dial(SIP/x300,10);
exten => 750,2,Dial(SIP/x300);

Basically we are dialing an extenstion... timing out after 10 seconds and then dialing the same extension again.  To make the counters go negative, the call must be answered sometime after 10 seconds.  If the call is never answered or it is answered during the first 10 seconds the probelem won't occur.

### BEFORE THE CALL IS PLACED - Counters all show zero
thrip*CLI> sip show inuse
* User name               In use          Limit
x000                      0               999
x300                      0               999
* Peer name               In use          Limit
x000                      0/0/0           999
x300                      0/0/0           999

### RINGING STATE - 1ST DIAL COMMAND - Counters are correct
thrip*CLI> sip show inuse
* User name               In use          Limit
x000                      0               999
x300                      0               999
* Peer name               In use          Limit
x000                      1/0/0           999
x300                      1/1/0           999


### RINGING STATE - 2nd DIAL COMMAND - Inuse counter for x300 0; it should be 1.
* User name               In use          Limit
x000                      0               999
x300                      0               999
* Peer name               In use          Limit
x000                      1/0/0           999
x300                      0/1/0           999

### AFTER THE CALL IS ANSWERED - Inuse counter for x300 is 0; it should be 1.
thrip*CLI> sip show  inuse
* User name               In use          Limit
x000                      0               999
x300                      0               999
* Peer name               In use          Limit
x000                      1/0/0           999
x300                      0/0/0           999


### AFTER THE CALL HAS ENDED - Inuse counter for x300 is -1; it should be 0.
* User name               In use          Limit
x000                      0               999
x300                      0               999
* Peer name               In use          Limit
x000                      0/0/0           999
x300                      -1/0/0          999

Note: placing a wait of 0.1 seconds between the two Dial commands usually makes the problem go away.
e.g.
exten => 750,1,Dial(SIP/x300,10);
exten => 750,2,Wait(.1);
exten => 750,3,Dial(SIP/x300);

By: Helmut Celina (rzuw) 2008-10-23 06:56:54

We are using 1.6.0.1 and this problem is reproducible at our installation as well. It is pretty mission-critical for us, as all of our group-functionality is based on using the Inuse-counter to decide what to do with a call.

By: Bernhard Wolf (wolfelectronic) 2008-10-23 17:56:56

i am preparing a patch for 1.6.0.1 and will upload it after testing at customer site

By: Bernhard Wolf (wolfelectronic) 2008-10-24 18:36:50

I have provided a patch hintfix_rev151414.patch for channels/chan_sip.c of 1.6.0-rc6 which should also work for 1.6.0.1 as the file is identical.
First tests at the customer site showed no problems, but at the moment i cannot say if the issue is fixed.
If tests next week show that the issue is solved i will provide a patch for the latest trunk version.



By: Chris Walton (crjw) 2008-10-24 23:59:32

I upgraded to 1.6.0.1+hintfix_rev151414.patch.
I tried re-testing the sequence of events that used to always give me a negative INUSE counter (see my earlier note in this bug report).
SO FAR SO GOOD... the INUSE counter is now correct for all phases of the test.

Admittedly, the system has only been up for 30 minutes, and being 1:00am it is somewhat idle.  I will post again if I see any problems.

By: Bernhard Wolf (wolfelectronic) 2008-10-29 18:40:09

I just provided a patch hintfix_trunk_rev152649.patch for the trunk revision 152649 of chan_sip.c.
Notes for the reviewer who commits to svn:
1) please be aware that i was not able to test the patch for the trunk revision on a live system. I just checked if it compiles. In my opinion the changes between 1.6.0.1 and the trunk revision in the function update_call_counter (removal of find_user) should not have any side effects to the patch.
2) the main part of the patch expands the locks to be arround the conditions including flag set/clear and counter increment/decrement and not only arround the increment / decrement of the counters.
3) except one part (see 4) i kept the original logic and just modified the locks
4) the part where i changed the original logic is between the /* continue */ and the if(sipdebug). As i found no reason why the flag SIP_INC_COUNT was not checked here i added the check. (i discussed this with putnopvut in IRC).
5) i opted to not just silently correct the counters if something wired happens to the counters but to log a debug message. While this could be (and was) useful for verifing the patch they could be removed when committing to svn
6) i added some additional checks in the lines between /* remember murphy */  and /* end of removable code */. These lines also help in verifing the patch. Before commiting to svn you could just remove these lines, put some #ifdef PESSIMISITIC arround them ore check the debug level before executing them. They should not be left unmodified because they use additional locks.
7) somparing to the svn revision the number of locks increased just by one (without the "murphy lines" of 6) so hopefully there should not arise a perfomance issue from this patch. Only ast_xxx_flag functions are called inside the locks so they should not take much time.
8) the patch hintfix_rev151414.patch works at a customer site for 1 week without any problem.
9) if you have any additional questions please drop a note to this bug or try on IRC.



By: mjc (mjc) 2008-11-17 12:48:34.000-0600

putnopvut, will you be reviewing this patch?

- Mike

By: Mark Michelson (mmichelson) 2008-11-17 17:40:36.000-0600

I gave this patch a review. As far as fixing the bug is concerned, this does the trick. Good job with this, wolfelectric!

The problem I have with the patch is that it does an absolutely horrendous job of conforming to the coding guidelines as outlined in doc/CODING-GUIDELINES in the Asterisk source. Specifically, the mixed use of tabs and spaces at the beginnings of lines and the lack of spaces to separate tokens are things that jumped out at me.

Other things that are frowned upon in the patch include the large block of code that is marked for removal pending successful tests. Patches submitted should be submitted in such a way that they can be committed as-is. Also, referring to bug numbers in debug messages is not encouraged.

All of this being said, the actual implementation of the patch is good, and I will commit it after I have cleaned it up to conform to coding guidelines. I have run out of time today, but I expect to do this when I get in to work tomorrow morning.

By: Digium Subversion (svnbot) 2008-11-18 14:23:54.000-0600

Repository: asterisk
Revision: 157427

U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r157427 | mmichelson | 2008-11-18 14:23:54 -0600 (Tue, 18 Nov 2008) | 13 lines

* Add a lock to be used in the update_call_counter function.
* Revert logic to mirror 1.4's in the sense that it will not allow
 the call counter to dip below 0.

These two measures prevent potential races that could cause a SIP peer
to appear to be busy forever.

(closes issue ASTERISK-12869)
Reported by: mjc
Patches:
     hintfix_trunk_rev152649.patch uploaded by wolfelectronic (license 586)


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=157427