[Home]

Summary:ASTERISK-13430: [patch] SIP/realtime problems => 100 % CPU
Reporter:pkempgen (pkempgen)Labels:
Date Opened:2009-01-22 07:50:55.000-0600Date Closed:2009-10-05 11:58:26
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20090205__bug14309.diff.txt
( 1) 20090216__bug14309.diff.txt
( 2) 20090504__devstate_speed_optimization__bug14309.diff.txt
( 3) 20090629__issue14309.diff.txt
( 4) 20090924__issue14309.diff.txt
( 5) callgrind.out.18480
( 6) callgrind.out.29393.tgz
Description:As reported on
http://lists.digium.com/pipermail/asterisk-users/2009-January/225274.html :

> Carlos Chavez schrieb:
>> Since 1.4.22 realtime status for sip peers seems to be broken.  If I do
>> a "sip show peers" from the CLI I get this:
>>
>> 2001/2001                  192.168.2.234    D          5060     UNKNOWN
>> Cached RT
>>
>> It is arbitrary which peers will say OK and which will say UNKNOWN and
>> it changes over time.
>
>> This used to work fine until 1.4.21
>> so why did it break and why wasn't it fixed on 1.4.23?  This only
>> happens when using realtime peers, static peers will always show the
>> proper status.
>
> It might be related to this issue (thread in German)
> http://www.mail-archive.com/gemeinschaft-users@lists.amooma.de/msg00083.html
> http://www.mail-archive.com/gemeinschaft-users@lists.amooma.de/msg00152.html
> http://www.mail-archive.com/gemeinschaft-users@lists.amooma.de/msg00153.html
> etc.
>
> Several people, including me, have confirmed that since Asterisk
> 1.4.22 you get hundreds (probably depends on the number of SIP
> peers/users) of SQL queries per second and the CLI is being flooded
> with
> ---cut---
> [Oct  8 12:22:56] NOTICE[2487]: chan_sip.c:16223 sip_poke_noanswer: Peer
> '8280' is now UNREACHABLE!  Last qualify: 0
> ---cut---
> leading to 100 % CPU usage.
>
> 1.4.21.2 didn't have the problem.


****** ADDITIONAL INFORMATION ******

Might be related to ASTERISK-11905
Comments:By: Olle Johansson (oej) 2009-01-23 08:30:55.000-0600

I think there's another bug report covering this issue.

By: Sylvain Boily (sboily_proformatique) 2009-02-04 04:23:22.000-0600

Hello, i have the same issue with 1.4.23.1. I have a loop and my cpu increase to 100%. For resolv this :

CLI> database deltree SIP
CLI> restart now

But after some hours or i maybe after a reload i have the same problem. The issue 12508 is closed but i think this is an issue maybe different. I haven't enabled qualify. I don't find how it's fully reproductible ...

By: Sylvain Boily (sboily_proformatique) 2009-02-04 05:10:01.000-0600

It seems it reproductible :

Realtime with rtcachefriends set to yes, ignoreregexpire set to yes, rtupdate set to yes

Log with a phone, delog your phone, reload asterisk and after some hours or minute (seems depend time to expire) you have some request and after a loop and CPU increase to 100%. For me the defaultexpirey is 120 for the test. After 20 minutes i have many many select like this :

[Feb  4 11:57:23] DEBUG[7447] res_config_mysql.c: MySQL RealTime: Everything is fine.
[Feb  4 11:57:23] DEBUG[7447] res_config_mysql.c: MySQL RealTime: Retrieve SQL: SELECT * FROM usersip WHERE NOT commented AND name = 'r00226' AND host = 'dynamic'

By: Tilghman Lesher (tilghman) 2009-02-04 08:13:01.000-0600

The string 'NOT commented' does not appear in any of the realtime drivers.  Does this occur on an UNMODIFIED Asterisk?

By: Sylvain Boily (sboily_proformatique) 2009-02-04 08:30:20.000-0600

hum yes ! Sorry i resent you with asterisk from asterisk.org sources.

By: Tilghman Lesher (tilghman) 2009-02-05 12:25:17.000-0600

I think the issue is that we're not really ignoring expiration when ignoreregexpire is set.  Patch uploaded.

By: pkempgen (pkempgen) 2009-02-05 12:28:21.000-0600

FWIW:
All of the users on the mailing list mentioned above use
rtcachefriends=yes
rtupdate=no
ignoreregexpire=yes

By: pkempgen (pkempgen) 2009-02-05 12:36:56.000-0600

> Patch uploaded.

Posted to gemeinschaft-users so whoever uses Asterisk >= 1.4.22
can test.

By: pkempgen (pkempgen) 2009-02-14 01:46:05.000-0600

It was reported that the problem affects Asterisk 1.4.23.1
(unpatched) as well.
( http://groups.google.de/group/gemeinschaft-users/browse_thread/thread/f14542402a302c6a )

By: pkempgen (pkempgen) 2009-02-14 02:30:30.000-0600

Ok, as a first step I have reproduced the problem in Asterisk
1.4.23.1 myself. Will now apply the patch ...

By: pkempgen (pkempgen) 2009-02-14 02:45:12.000-0600

The patch didn't solve the problem.

By: pkempgen (pkempgen) 2009-02-14 08:40:37.000-0600

"suspicious" commits between 1.4.21.2 (2008 Jul 23) and
1.4.22 (2008 Oct 02) include:

r133488
http://svn.digium.com/view/asterisk/branches/1.4/channels/chan_sip.c?r1=132790&r2=133488
"Fix rtautoclear and rtcachefriends (Closes issue ASTERISK-12068)"

r138258
http://svn.digium.com/view/asterisk/branches/1.4/channels/chan_sip.c?r1=133572&r2=138258
"More fixes for realtime peers. (closes issue ASTERISK-12253)"

By: pkempgen (pkempgen) 2009-02-14 10:06:29.000-0600

The problem was introduced by r138258.
r138257 works fine.

By: Tilghman Lesher (tilghman) 2009-02-16 14:25:37.000-0600

Okay, here's a patch that directly addresses that issue.  Perhaps we're getting the address needlessly when it has already expired?

By: pkempgen (pkempgen) 2009-02-16 15:00:57.000-0600

Tested 20090216__bug14309.diff.txt against current branches/1.4 (r176212).
Doesn't fix the problem.

By: Tilghman Lesher (tilghman) 2009-02-16 18:06:44.000-0600

Well, then, I'm stumped.

By: Tilghman Lesher (tilghman) 2009-02-16 18:11:35.000-0600

Let's try the output of "valgrind --tool=callgrind /usr/sbin/asterisk -vvvvvc"

By: Tilghman Lesher (tilghman) 2009-03-02 12:48:08.000-0600

Valgrind output required to continue.

By: pkempgen (pkempgen) 2009-03-03 05:26:51.000-0600

Uploaded callgrind.out.18480. I hope it's the correct file as
I have lots of callgrind output files on that machine. :-)

However I guess reverting r138258 would solve the problem
(and reopen bug ASTERISK-12253).



By: Tilghman Lesher (tilghman) 2009-03-03 11:08:54.000-0600

I would remove DEBUG_THREADS and MALLOC_DEBUG from your compiler flags and try the callgrind again.  These two defines are distorting the output of this callgrind.

By: pkempgen (pkempgen) 2009-05-03 10:57:12

Finally: Here's the output from callgrind (callgrind.out.29393.tgz).
Asterisk 1.4.24.1, Addons 1.4.8

By: Tilghman Lesher (tilghman) 2009-05-04 17:42:55

This patch should massively affect the callgrind generated, by reducing the amount of time that SIP spends building realtime peers for device state changes alone.

By: Leif Madsen (lmadsen) 2009-05-11 15:50:28

What is the next step here? Looks like we're close to a resolution.

By: Tilghman Lesher (tilghman) 2009-05-11 16:18:00

Testing and feedback from the reporter is required.

By: pkempgen (pkempgen) 2009-05-12 10:06:46

> Testing and feedback from the reporter is required.

Forwarded to <gemeinschaft-users@googlegroups.com>.
I guess most users have reverted to using Asterisk <= 1.4.21
(or even less because of some device state problems)
but hopefully we'll get some feedback anyways.

By: pkempgen (pkempgen) 2009-05-21 11:24:42

20090504__devstate_speed_optimization__bug14309.diff.txt seems
to work but I think more testing is required.

By: pkempgen (pkempgen) 2009-05-21 11:27:21

This bug is definitely related to issue ASTERISK-12253 because that's
where the regression was committed.
So another approach would be to revert commit 138258 and
to find a better solution for the other issue.



By: Tilghman Lesher (tilghman) 2009-05-27 14:42:33

pkempgen:  if the patch works, why are we suggesting reverting something different?

By: Vadim Sherbakov (vinsik) 2009-05-27 15:11:32

I have run into the same problem on a my live system.
But my problem started from version asterisk-1.4.24.
Runs for max hour, then totally gets hanged and only restart helps.
I believe its related to the new method of storing lastms.

asterisk-1.4.23.1 runs fine... running with 200 realtime sip ua's.

If this helps anything... :)

By: pkempgen (pkempgen) 2009-05-27 15:15:26

> pkempgen: if the patch works, why are we suggesting reverting something different?

Reverting the changeset which caused the regression seemed the
most obvious thing to me but never mind.

Your patch seems to work. I just think at least one or two other
people should test it because I have no idea what side effects it
might have. OTOH if it should break anything else somebody will
complain sooner or later. :-)

By: Tilghman Lesher (tilghman) 2009-05-27 16:04:28

vinsik:  does the patch uploaded to this issue help the situation?

By: Vadim Sherbakov (vinsik) 2009-05-27 16:15:52

tilghman, i have not tested it yet, just stumbled upon it.
Ill try to test maybe this week , ill report back if this issue is still open.

By: Tilghman Lesher (tilghman) 2009-06-09 13:34:36

vinsik:  it is more than a week later.  Have you tested the patch yet?

By: Tilghman Lesher (tilghman) 2009-06-29 17:12:47

Patch updated to current 1.4.  Assigned to lmadsen for testing, as vinsik is unresponsive.

By: Leif Madsen (lmadsen) 2009-07-29 11:05:43

I can't seem to figure out how to reproduce this.

I have:

rtcachefriends=yes
rtupdate=no
ignoreregexpire=yes

in general I have qualify=yes

I have created 100 extensions, loaded them into memory with RT caching, set the registration to expire after 120 seconds.

I then waited for the registrations to expire. They are still loaded in memory, but the IP address is marked as (Unspecified).

What am I missing here for reproducing? This is latest 1.4 branch.

By: Leif Madsen (lmadsen) 2009-07-29 11:13:12

OK, I'm closing this issue because the people who were affected by it seem to be unresponsive, which means I can only come to the conclusion this issue has been resolved in a later version of Asterisk.

I attempted to reproduce as I explained in my note, and have been unable to come up with anything other than what I expected to see.

The original reporter of the issue is more than welcome to reopen this issue if he has additional information which can lead to this issue being reproduced, and thus moved towards resolution.

If anyone else experiences this issue, please open a new issue, and report exact how to reproduce this on the latest 1.4 branch.

Thanks!
Leif.

By: Tilghman Lesher (tilghman) 2009-09-24 09:06:40

vrban tested the patch in ASTERISK-14877 and it solves the issue for him.

By: Leif Madsen (lmadsen) 2009-09-24 09:10:57

Technically he tested the patch in this issue, but opened the duplicate issue 15950.

By: Kristijan Vrban (vrban) 2009-09-24 09:45:12

...solves the issue for him.

Yes, but only tested with 1.4.25, because the patch does not fit with 1.4.27-rc1 or 1.4-svn

By: Tilghman Lesher (tilghman) 2009-09-24 14:32:31

Patch has been updated to current 1.4.  Just to be on the safe side, could you test this reformulated patch?

By: Kristijan Vrban (vrban) 2009-09-24 15:03:09

the updated patch applies to current 1.4 and fix the reported issue. first quick test looks good.

By: Digium Subversion (svnbot) 2009-09-29 13:37:26

Repository: asterisk
Revision: 220873

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r220873 | tilghman | 2009-09-29 13:37:25 -0500 (Tue, 29 Sep 2009) | 9 lines

Reduce CPU usage related to building a peer merely for devicestates.
This fixes a 100% CPU problem in the SIP driver, found by profiling
the driver while the problem was occurring.
(closes issue ASTERISK-13430)
Reported by: pkempgen
Patches:
      20090924__issue14309.diff.txt uploaded by tilghman (license 14)
Tested by: pkempgen, vrban

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=220873

By: Digium Subversion (svnbot) 2009-09-29 15:00:20

Repository: asterisk
Revision: 220906

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r220906 | tilghman | 2009-09-29 15:00:19 -0500 (Tue, 29 Sep 2009) | 16 lines

Merged revisions 220873 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r220873 | tilghman | 2009-09-29 12:59:26 -0500 (Tue, 29 Sep 2009) | 9 lines
 
 Reduce CPU usage related to building a peer merely for devicestates.
 This fixes a 100% CPU problem in the SIP driver, found by profiling
 the driver while the problem was occurring.
 (closes issue ASTERISK-13430)
  Reported by: pkempgen
  Patches:
        20090924__issue14309.diff.txt uploaded by tilghman (license 14)
  Tested by: pkempgen, vrban
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=220906

By: Digium Subversion (svnbot) 2009-09-29 15:47:56

Repository: asterisk
Revision: 220976

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_sip.c

------------------------------------------------------------------------
r220976 | tilghman | 2009-09-29 15:47:56 -0500 (Tue, 29 Sep 2009) | 23 lines

Merged revisions 220906 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r220906 | tilghman | 2009-09-29 14:57:37 -0500 (Tue, 29 Sep 2009) | 16 lines
 
 Merged revisions 220873 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r220873 | tilghman | 2009-09-29 12:59:26 -0500 (Tue, 29 Sep 2009) | 9 lines
   
   Reduce CPU usage related to building a peer merely for devicestates.
   This fixes a 100% CPU problem in the SIP driver, found by profiling
   the driver while the problem was occurring.
   (closes issue ASTERISK-13430)
    Reported by: pkempgen
    Patches:
          20090924__issue14309.diff.txt uploaded by tilghman (license 14)
    Tested by: pkempgen, vrban
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=220976

By: Digium Subversion (svnbot) 2009-09-29 17:10:42

Repository: asterisk
Revision: 220998

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_sip.c

------------------------------------------------------------------------
r220998 | tilghman | 2009-09-29 17:10:41 -0500 (Tue, 29 Sep 2009) | 23 lines

Merged revisions 220906 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r220906 | tilghman | 2009-09-29 14:57:37 -0500 (Tue, 29 Sep 2009) | 16 lines
 
 Merged revisions 220873 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r220873 | tilghman | 2009-09-29 12:59:26 -0500 (Tue, 29 Sep 2009) | 9 lines
   
   Reduce CPU usage related to building a peer merely for devicestates.
   This fixes a 100% CPU problem in the SIP driver, found by profiling
   the driver while the problem was occurring.
   (closes issue ASTERISK-13430)
    Reported by: pkempgen
    Patches:
          20090924__issue14309.diff.txt uploaded by tilghman (license 14)
    Tested by: pkempgen, vrban
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=220998

By: Digium Subversion (svnbot) 2009-09-29 17:30:42

Repository: asterisk
Revision: 221027

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_sip.c

------------------------------------------------------------------------
r221027 | tilghman | 2009-09-29 17:30:42 -0500 (Tue, 29 Sep 2009) | 23 lines

Merged revisions 220906 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r220906 | tilghman | 2009-09-29 14:57:37 -0500 (Tue, 29 Sep 2009) | 16 lines
 
 Merged revisions 220873 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r220873 | tilghman | 2009-09-29 12:59:26 -0500 (Tue, 29 Sep 2009) | 9 lines
   
   Reduce CPU usage related to building a peer merely for devicestates.
   This fixes a 100% CPU problem in the SIP driver, found by profiling
   the driver while the problem was occurring.
   (closes issue ASTERISK-13430)
    Reported by: pkempgen
    Patches:
          20090924__issue14309.diff.txt uploaded by tilghman (license 14)
    Tested by: pkempgen, vrban
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=221027