Summary: | ASTERISK-13430: [patch] SIP/realtime problems => 100 % CPU | ||
Reporter: | pkempgen (pkempgen) | Labels: | |
Date Opened: | 2009-01-22 07:50:55.000-0600 | Date Closed: | 2009-10-05 11:58:26 |
Priority: | Major | Regression? | Yes |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 20090205__bug14309.diff.txt ( 1) 20090216__bug14309.diff.txt ( 2) 20090504__devstate_speed_optimization__bug14309.diff.txt ( 3) 20090629__issue14309.diff.txt ( 4) 20090924__issue14309.diff.txt ( 5) callgrind.out.18480 ( 6) callgrind.out.29393.tgz | |
Description: | As reported on http://lists.digium.com/pipermail/asterisk-users/2009-January/225274.html : > Carlos Chavez schrieb: >> Since 1.4.22 realtime status for sip peers seems to be broken. If I do >> a "sip show peers" from the CLI I get this: >> >> 2001/2001 192.168.2.234 D 5060 UNKNOWN >> Cached RT >> >> It is arbitrary which peers will say OK and which will say UNKNOWN and >> it changes over time. > >> This used to work fine until 1.4.21 >> so why did it break and why wasn't it fixed on 1.4.23? This only >> happens when using realtime peers, static peers will always show the >> proper status. > > It might be related to this issue (thread in German) > http://www.mail-archive.com/gemeinschaft-users@lists.amooma.de/msg00083.html > http://www.mail-archive.com/gemeinschaft-users@lists.amooma.de/msg00152.html > http://www.mail-archive.com/gemeinschaft-users@lists.amooma.de/msg00153.html > etc. > > Several people, including me, have confirmed that since Asterisk > 1.4.22 you get hundreds (probably depends on the number of SIP > peers/users) of SQL queries per second and the CLI is being flooded > with > ---cut--- > [Oct 8 12:22:56] NOTICE[2487]: chan_sip.c:16223 sip_poke_noanswer: Peer > '8280' is now UNREACHABLE! Last qualify: 0 > ---cut--- > leading to 100 % CPU usage. > > 1.4.21.2 didn't have the problem. ****** ADDITIONAL INFORMATION ****** Might be related to ASTERISK-11905 | ||
Comments: | By: Olle Johansson (oej) 2009-01-23 08:30:55.000-0600 I think there's another bug report covering this issue. By: Sylvain Boily (sboily_proformatique) 2009-02-04 04:23:22.000-0600 Hello, i have the same issue with 1.4.23.1. I have a loop and my cpu increase to 100%. For resolv this : CLI> database deltree SIP CLI> restart now But after some hours or i maybe after a reload i have the same problem. The issue 12508 is closed but i think this is an issue maybe different. I haven't enabled qualify. I don't find how it's fully reproductible ... By: Sylvain Boily (sboily_proformatique) 2009-02-04 05:10:01.000-0600 It seems it reproductible : Realtime with rtcachefriends set to yes, ignoreregexpire set to yes, rtupdate set to yes Log with a phone, delog your phone, reload asterisk and after some hours or minute (seems depend time to expire) you have some request and after a loop and CPU increase to 100%. For me the defaultexpirey is 120 for the test. After 20 minutes i have many many select like this : [Feb 4 11:57:23] DEBUG[7447] res_config_mysql.c: MySQL RealTime: Everything is fine. [Feb 4 11:57:23] DEBUG[7447] res_config_mysql.c: MySQL RealTime: Retrieve SQL: SELECT * FROM usersip WHERE NOT commented AND name = 'r00226' AND host = 'dynamic' By: Tilghman Lesher (tilghman) 2009-02-04 08:13:01.000-0600 The string 'NOT commented' does not appear in any of the realtime drivers. Does this occur on an UNMODIFIED Asterisk? By: Sylvain Boily (sboily_proformatique) 2009-02-04 08:30:20.000-0600 hum yes ! Sorry i resent you with asterisk from asterisk.org sources. By: Tilghman Lesher (tilghman) 2009-02-05 12:25:17.000-0600 I think the issue is that we're not really ignoring expiration when ignoreregexpire is set. Patch uploaded. By: pkempgen (pkempgen) 2009-02-05 12:28:21.000-0600 FWIW: All of the users on the mailing list mentioned above use rtcachefriends=yes rtupdate=no ignoreregexpire=yes By: pkempgen (pkempgen) 2009-02-05 12:36:56.000-0600 > Patch uploaded. Posted to gemeinschaft-users so whoever uses Asterisk >= 1.4.22 can test. By: pkempgen (pkempgen) 2009-02-14 01:46:05.000-0600 It was reported that the problem affects Asterisk 1.4.23.1 (unpatched) as well. ( http://groups.google.de/group/gemeinschaft-users/browse_thread/thread/f14542402a302c6a ) By: pkempgen (pkempgen) 2009-02-14 02:30:30.000-0600 Ok, as a first step I have reproduced the problem in Asterisk 1.4.23.1 myself. Will now apply the patch ... By: pkempgen (pkempgen) 2009-02-14 02:45:12.000-0600 The patch didn't solve the problem. By: pkempgen (pkempgen) 2009-02-14 08:40:37.000-0600 "suspicious" commits between 1.4.21.2 (2008 Jul 23) and 1.4.22 (2008 Oct 02) include: r133488 http://svn.digium.com/view/asterisk/branches/1.4/channels/chan_sip.c?r1=132790&r2=133488 "Fix rtautoclear and rtcachefriends (Closes issue ASTERISK-12068)" r138258 http://svn.digium.com/view/asterisk/branches/1.4/channels/chan_sip.c?r1=133572&r2=138258 "More fixes for realtime peers. (closes issue ASTERISK-12253)" By: pkempgen (pkempgen) 2009-02-14 10:06:29.000-0600 The problem was introduced by r138258. r138257 works fine. By: Tilghman Lesher (tilghman) 2009-02-16 14:25:37.000-0600 Okay, here's a patch that directly addresses that issue. Perhaps we're getting the address needlessly when it has already expired? By: pkempgen (pkempgen) 2009-02-16 15:00:57.000-0600 Tested 20090216__bug14309.diff.txt against current branches/1.4 (r176212). Doesn't fix the problem. By: Tilghman Lesher (tilghman) 2009-02-16 18:06:44.000-0600 Well, then, I'm stumped. By: Tilghman Lesher (tilghman) 2009-02-16 18:11:35.000-0600 Let's try the output of "valgrind --tool=callgrind /usr/sbin/asterisk -vvvvvc" By: Tilghman Lesher (tilghman) 2009-03-02 12:48:08.000-0600 Valgrind output required to continue. By: pkempgen (pkempgen) 2009-03-03 05:26:51.000-0600 Uploaded callgrind.out.18480. I hope it's the correct file as I have lots of callgrind output files on that machine. :-) However I guess reverting r138258 would solve the problem (and reopen bug ASTERISK-12253). By: Tilghman Lesher (tilghman) 2009-03-03 11:08:54.000-0600 I would remove DEBUG_THREADS and MALLOC_DEBUG from your compiler flags and try the callgrind again. These two defines are distorting the output of this callgrind. By: pkempgen (pkempgen) 2009-05-03 10:57:12 Finally: Here's the output from callgrind (callgrind.out.29393.tgz). Asterisk 1.4.24.1, Addons 1.4.8 By: Tilghman Lesher (tilghman) 2009-05-04 17:42:55 This patch should massively affect the callgrind generated, by reducing the amount of time that SIP spends building realtime peers for device state changes alone. By: Leif Madsen (lmadsen) 2009-05-11 15:50:28 What is the next step here? Looks like we're close to a resolution. By: Tilghman Lesher (tilghman) 2009-05-11 16:18:00 Testing and feedback from the reporter is required. By: pkempgen (pkempgen) 2009-05-12 10:06:46 > Testing and feedback from the reporter is required. Forwarded to <gemeinschaft-users@googlegroups.com>. I guess most users have reverted to using Asterisk <= 1.4.21 (or even less because of some device state problems) but hopefully we'll get some feedback anyways. By: pkempgen (pkempgen) 2009-05-21 11:24:42 20090504__devstate_speed_optimization__bug14309.diff.txt seems to work but I think more testing is required. By: pkempgen (pkempgen) 2009-05-21 11:27:21 This bug is definitely related to issue ASTERISK-12253 because that's where the regression was committed. So another approach would be to revert commit 138258 and to find a better solution for the other issue. By: Tilghman Lesher (tilghman) 2009-05-27 14:42:33 pkempgen: if the patch works, why are we suggesting reverting something different? By: Vadim Sherbakov (vinsik) 2009-05-27 15:11:32 I have run into the same problem on a my live system. But my problem started from version asterisk-1.4.24. Runs for max hour, then totally gets hanged and only restart helps. I believe its related to the new method of storing lastms. asterisk-1.4.23.1 runs fine... running with 200 realtime sip ua's. If this helps anything... :) By: pkempgen (pkempgen) 2009-05-27 15:15:26 > pkempgen: if the patch works, why are we suggesting reverting something different? Reverting the changeset which caused the regression seemed the most obvious thing to me but never mind. Your patch seems to work. I just think at least one or two other people should test it because I have no idea what side effects it might have. OTOH if it should break anything else somebody will complain sooner or later. :-) By: Tilghman Lesher (tilghman) 2009-05-27 16:04:28 vinsik: does the patch uploaded to this issue help the situation? By: Vadim Sherbakov (vinsik) 2009-05-27 16:15:52 tilghman, i have not tested it yet, just stumbled upon it. Ill try to test maybe this week , ill report back if this issue is still open. By: Tilghman Lesher (tilghman) 2009-06-09 13:34:36 vinsik: it is more than a week later. Have you tested the patch yet? By: Tilghman Lesher (tilghman) 2009-06-29 17:12:47 Patch updated to current 1.4. Assigned to lmadsen for testing, as vinsik is unresponsive. By: Leif Madsen (lmadsen) 2009-07-29 11:05:43 I can't seem to figure out how to reproduce this. I have: rtcachefriends=yes rtupdate=no ignoreregexpire=yes in general I have qualify=yes I have created 100 extensions, loaded them into memory with RT caching, set the registration to expire after 120 seconds. I then waited for the registrations to expire. They are still loaded in memory, but the IP address is marked as (Unspecified). What am I missing here for reproducing? This is latest 1.4 branch. By: Leif Madsen (lmadsen) 2009-07-29 11:13:12 OK, I'm closing this issue because the people who were affected by it seem to be unresponsive, which means I can only come to the conclusion this issue has been resolved in a later version of Asterisk. I attempted to reproduce as I explained in my note, and have been unable to come up with anything other than what I expected to see. The original reporter of the issue is more than welcome to reopen this issue if he has additional information which can lead to this issue being reproduced, and thus moved towards resolution. If anyone else experiences this issue, please open a new issue, and report exact how to reproduce this on the latest 1.4 branch. Thanks! Leif. By: Tilghman Lesher (tilghman) 2009-09-24 09:06:40 vrban tested the patch in ASTERISK-14877 and it solves the issue for him. By: Leif Madsen (lmadsen) 2009-09-24 09:10:57 Technically he tested the patch in this issue, but opened the duplicate issue 15950. By: Kristijan Vrban (vrban) 2009-09-24 09:45:12 ...solves the issue for him. Yes, but only tested with 1.4.25, because the patch does not fit with 1.4.27-rc1 or 1.4-svn By: Tilghman Lesher (tilghman) 2009-09-24 14:32:31 Patch has been updated to current 1.4. Just to be on the safe side, could you test this reformulated patch? By: Kristijan Vrban (vrban) 2009-09-24 15:03:09 the updated patch applies to current 1.4 and fix the reported issue. first quick test looks good. By: Digium Subversion (svnbot) 2009-09-29 13:37:26 Repository: asterisk Revision: 220873 U branches/1.4/channels/chan_sip.c ------------------------------------------------------------------------ r220873 | tilghman | 2009-09-29 13:37:25 -0500 (Tue, 29 Sep 2009) | 9 lines Reduce CPU usage related to building a peer merely for devicestates. This fixes a 100% CPU problem in the SIP driver, found by profiling the driver while the problem was occurring. (closes issue ASTERISK-13430) Reported by: pkempgen Patches: 20090924__issue14309.diff.txt uploaded by tilghman (license 14) Tested by: pkempgen, vrban ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=220873 By: Digium Subversion (svnbot) 2009-09-29 15:00:20 Repository: asterisk Revision: 220906 _U trunk/ U trunk/channels/chan_sip.c ------------------------------------------------------------------------ r220906 | tilghman | 2009-09-29 15:00:19 -0500 (Tue, 29 Sep 2009) | 16 lines Merged revisions 220873 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r220873 | tilghman | 2009-09-29 12:59:26 -0500 (Tue, 29 Sep 2009) | 9 lines Reduce CPU usage related to building a peer merely for devicestates. This fixes a 100% CPU problem in the SIP driver, found by profiling the driver while the problem was occurring. (closes issue ASTERISK-13430) Reported by: pkempgen Patches: 20090924__issue14309.diff.txt uploaded by tilghman (license 14) Tested by: pkempgen, vrban ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=220906 By: Digium Subversion (svnbot) 2009-09-29 15:47:56 Repository: asterisk Revision: 220976 _U branches/1.6.0/ U branches/1.6.0/channels/chan_sip.c ------------------------------------------------------------------------ r220976 | tilghman | 2009-09-29 15:47:56 -0500 (Tue, 29 Sep 2009) | 23 lines Merged revisions 220906 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r220906 | tilghman | 2009-09-29 14:57:37 -0500 (Tue, 29 Sep 2009) | 16 lines Merged revisions 220873 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r220873 | tilghman | 2009-09-29 12:59:26 -0500 (Tue, 29 Sep 2009) | 9 lines Reduce CPU usage related to building a peer merely for devicestates. This fixes a 100% CPU problem in the SIP driver, found by profiling the driver while the problem was occurring. (closes issue ASTERISK-13430) Reported by: pkempgen Patches: 20090924__issue14309.diff.txt uploaded by tilghman (license 14) Tested by: pkempgen, vrban ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=220976 By: Digium Subversion (svnbot) 2009-09-29 17:10:42 Repository: asterisk Revision: 220998 _U branches/1.6.1/ U branches/1.6.1/channels/chan_sip.c ------------------------------------------------------------------------ r220998 | tilghman | 2009-09-29 17:10:41 -0500 (Tue, 29 Sep 2009) | 23 lines Merged revisions 220906 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r220906 | tilghman | 2009-09-29 14:57:37 -0500 (Tue, 29 Sep 2009) | 16 lines Merged revisions 220873 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r220873 | tilghman | 2009-09-29 12:59:26 -0500 (Tue, 29 Sep 2009) | 9 lines Reduce CPU usage related to building a peer merely for devicestates. This fixes a 100% CPU problem in the SIP driver, found by profiling the driver while the problem was occurring. (closes issue ASTERISK-13430) Reported by: pkempgen Patches: 20090924__issue14309.diff.txt uploaded by tilghman (license 14) Tested by: pkempgen, vrban ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=220998 By: Digium Subversion (svnbot) 2009-09-29 17:30:42 Repository: asterisk Revision: 221027 _U branches/1.6.2/ U branches/1.6.2/channels/chan_sip.c ------------------------------------------------------------------------ r221027 | tilghman | 2009-09-29 17:30:42 -0500 (Tue, 29 Sep 2009) | 23 lines Merged revisions 220906 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r220906 | tilghman | 2009-09-29 14:57:37 -0500 (Tue, 29 Sep 2009) | 16 lines Merged revisions 220873 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r220873 | tilghman | 2009-09-29 12:59:26 -0500 (Tue, 29 Sep 2009) | 9 lines Reduce CPU usage related to building a peer merely for devicestates. This fixes a 100% CPU problem in the SIP driver, found by profiling the driver while the problem was occurring. (closes issue ASTERISK-13430) Reported by: pkempgen Patches: 20090924__issue14309.diff.txt uploaded by tilghman (license 14) Tested by: pkempgen, vrban ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=221027 |