|Summary:||ASTERISK-21303: qualifygap SIP general setting appears broken|
|Date Opened:||2013-03-20 03:28:10||Date Closed:||2013-04-22 09:29:56|
|Description:||Running Asterisk 11 with around 1600 realtime peers, all of which are qualified behind remote NAT.
I am seeing an extremely painful issue with sip reloads generating massive storms of OPTIONS messages, most of which can't be responded to within 2000 ms, so the peer goes missing. In the case where SRV endpoints are used, this generates a storm of traffic to secondary/tertiary servers, which puts the entire system in a loop.
My initial solution was to use the qualifygap setting in sip.conf [general]. However, this appears to actually be broken. When setting this setting to 500 or even 1200, I am still seeing many, many options messages / time period. I am not sure I have implemented qualifygap correctly, as there is effectively no documentation on it. I am not sure what the "group" referenced in the comments ("Number of milliseconds between each group of peers being qualified") refers to. The sip_poke_all_peers seems to not differentiate anything on the basis of any sort of group.
|Comments:||By: Michael L. Young (elguero) 2013-03-20 12:08:26.589-0500|
JoshE, from looking at the source, the "groups" has to do with how many peers at a time get poked. The setting for that is "qualifypeers". The default is 1 peer every 10th of a second. So, you can send a qualify to 10 peers every 100ms or 10 peers every 500ms or keep it at the default of 1 peer every 100ms, as an example.
In sip_poke_all peers, the check "if (num == global_qualify_peers)", is the part that handles this "group" mentioned in sip.conf.
1200ms is only 1.2 seconds. Have you tried a higher setting? With your peers being in realtime, hopefully you are not having to do a "sip reload" too often.
By: Rusty Newton (rnewton) 2013-03-29 13:55:12.113-0500
;qualifyfreq=60 ; Qualification: How often to check for the host to be up in seconds
; and reported in milliseconds with sip show settings.
; Set to low value if you use low timeout for NAT of UDP sessions
; Default: 60
;qualifygap=100 ; Number of milliseconds between each group of peers being qualified
; Default: 100
;qualifypeers=1 ; Number of peers in a group to be qualified at the same time
; Default: 1
The documentation in the sample seems pretty clear. JoshE - you can provide a documentation patch if there is a way to provide additional clarity.
Other than that, does it appear to be behaving as expected per the documentation and Michael's comments?
By: JoshE (n8ideas) 2013-03-29 14:34:27.595-0500
Yes and no. I think the qualifygap is in fact doing what it's supposed to, but there appears to be an issue with ODBC responsiveness, especially when there are two database handle entries for sippeers in extconfig.
I'll get a patch to explain qualifygap a little bit better. I may need to open another ticket for the way the database responded when reloads happened.
By: Rusty Newton (rnewton) 2013-04-02 16:33:40.158-0500
I'll leave this in "Waiting for Feedback" for the documentation patch then. Yeah if you find a potential bug in ODBC open that up in a separate issue. Of course you may want to run it by others on the asterisk-users list first to help verify.
By: Rusty Newton (rnewton) 2013-04-22 09:29:56.585-0500
Closing this out since it doesn't appear to be a bug. Please file a new issue for the documentation patch.
By: D KULL (kulldominique) 2013-06-20 13:34:39.410-0500
I am experiencing the same issue with a smaller setup (about 500 qualified users). There doesn't seem to be a good solution to this with the configuration options. qualifygap/qualifypeers seems not be able to address the situation. The issue appears randomly and Asterisk is unable to throttle the OPTIONS storm that ensues. The only solution is a complete restart.
By: Rusty Newton (rnewton) 2013-06-20 15:47:23.166-0500
I'd recommend running it by some other users on the asterisk-users list. Then if you feel there is a bug, you could file a new report. When posting on the list you would definitely need to include more detail on two points for anyone to help:
1. "qualifygap/qualifypeers seems not be able to address the situation"
2. "Asterisk is unable to throttle the OPTIONS storm that ensues"
You would probably also want to provide pastebin links to logs that demonstrate the specific issue and example configs showing what you did to try and remedy it.
If you filed a new bug. You'll want to be clear whether you are saying that the qualify options are not behaving as "expected" by documentation. Or whether you are claiming that Asterisk has a bug that prevents it from handling OPTIONS requests in an expected way.