Summary: | ASTERISK-11038: Prodding channel... causes 100% utilization | ||
Reporter: | pj (pj) | Labels: | |
Date Opened: | 2007-12-14 08:37:35.000-0600 | Date Closed: | 2008-04-09 13:27:37 |
Priority: | Blocker | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) asterisk-locks.txt ( 1) asterisk-locks2.txt ( 2) asterisk-locks3.txt ( 3) asterisk-locks4.txt ( 4) gdb.txt ( 5) gdb2.txt ( 6) gdb3.txt ( 7) gdb4.txt ( 8) gdb5.txt ( 9) prodding-channel-crash.txt | |
Description: | from time to time, asterisk is locked (100% cpu utilization, but not crash) with "Prodding channel..." messages... grep "SIP/777-082ae2f8" /var/log/asterisk/messages | wc -l 856979 in my case it happened in entering simple diaplan, calling MoH... 555 => { Set(TIMEOUT(absolute)=60); MusicOnHold(); } user is connected over wifi, so maybe some packet was lost in some critical call stage, that causes sip channel starts looping? unfortunately, I was not able to do some debugging, because asterisk pc was completely locked, due to realtime priority of asterisk process... ****** ADDITIONAL INFORMATION ****** [Dec 14 09:40:21] ERROR[4145] chan_sip.c: We could NOT get the channel lock for SIP/777-082ae2f8! [Dec 14 09:40:21] ERROR[4145] chan_sip.c: SIP transaction failed: 1521191083@10.53.34.221 [Dec 14 09:41:18] WARNING[4145] chan_sip.c: Maximum retries exceeded on transmission 3021042242@10.53.34.221 for seqno 2 (Critical Response) [Dec 14 09:41:21] WARNING[4145] channel.c: Prodding channel 'SIP/777-082ae2f8' failed [Dec 14 09:41:21] WARNING[4145] channel.c: Prodding channel 'SIP/777-082ae2f8' failed [Dec 14 09:41:21] WARNING[4145] channel.c: Prodding channel 'SIP/777-082ae2f8' failed | ||
Comments: | By: Tilghman Lesher (tilghman) 2007-12-14 08:53:21.000-0600 Please try reproducing this, by excluding the "-p" flag when you start Asterisk. By: pj (pj) 2007-12-14 09:11:59.000-0600 OK, I will try to run asterisk without realtime priority, but please tell me, if that happened again, what should I do to debug this issue? By: Tilghman Lesher (tilghman) 2007-12-14 09:14:37.000-0600 Attach to the process with gdb and run: thread apply all bt then upload the output as a file here. By: pj (pj) 2007-12-14 09:23:42.000-0600 so I must recompile with 'debug thread' and "don't optimize"? because when I try this now 'gdb asterisk' and run 'thread apply all bt' it do nothing. By: Joshua C. Colp (jcolp) 2007-12-14 09:30:17.000-0600 Are you using zaptel timing as well? And once you do have it in a state where you can get debug info a core show locks would be useful, and the complete console output with debug enabled. By: Tilghman Lesher (tilghman) 2007-12-14 09:33:01.000-0600 No, the way to attach to a running asterisk is with: gdb -p <pid-of-asterisk> By: pj (pj) 2007-12-14 13:24:16.000-0600 some console output from today's 'proding channel...' issue, it's repeating many times. yes, I'm using ztdummy as timming source and have in asterisk.conf [options] internal_timing = yes it's pure voip gateway, without real isdn hardware. [Dec 14 10:02:19] -- Executing [555@from-pj-test:2] MusicOnHold("SIP/777-082ae2f8", "") in new stack [Dec 14 10:02:19] WARNING[4145]: channel.c:2723 ast_prod: Prodding channel 'SIP/777-082ae2f8' failed [Dec 14 10:02:19] -- Started music on hold, class 'default', on SIP/777-082ae2f8 [Dec 14 10:02:19] -- Stopped music on hold on SIP/777-082ae2f8 [Dec 14 10:02:19] == Spawn extension (from-pj-test, 555, 2) exited non-zero on 'SIP/777-082ae2f8' [Dec 14 10:02:19] -- Executing [555@from-pj-test:2] MusicOnHold("SIP/777-082ae2f8", "") in new stack [Dec 14 10:02:19] WARNING[4145]: channel.c:2723 ast_prod: Prodding channel 'SIP/777-082ae2f8' failed [Dec 14 10:02:19] -- Started music on hold, class 'default', on SIP/777-082ae2f8 [Dec 14 10:02:19] -- Stopped music on hold on SIP/777-082ae2f8 [Dec 14 10:02:19] == Spawn extension (from-pj-test, 555, 2) exited non-zero on 'SIP/777-082ae2f8' [Dec 14 10:02:19] -- Executing [555@from-pj-test:2] MusicOnHold("SIP/777-082ae2f8", "") in new stack By: pj (pj) 2007-12-15 03:35:17.000-0600 today asterisk crashes in next call during "prodding channel messages", so I'm not able to atach to asterisk process when in looping state, but attaching 'bt' from generated core.... By: pj (pj) 2007-12-26 12:36:56.000-0600 trunk-r93763 issue still persist and asterisk crash after many "prodding channel" cycles, so I'm not able to attach gdb to running asterisk process, crash dump is now not generated, even if I have in asterisk.conf [options] dumpcore = yes [Dec 26 15:06:59] ERROR[2914] chan_sip.c: We could NOT get the channel lock for SIP/777-0821fbd8! [Dec 26 15:06:59] ERROR[2914] chan_sip.c: SIP transaction failed: 936481778@10.53.34.221 [Dec 26 15:07:59] WARNING[2914] channel.c: Prodding channel 'SIP/777-0821fbd8' failed ....... ....... [Dec 26 15:09:52] WARNING[2914] channel.c: Prodding channel 'SIP/777-0821fbd8' failed [Dec 26 15:12:53] NOTICE[3675] cdr.c: CDR simple logging enabled. [Dec 26 15:12:54] NOTICE[3675] loader.c: 86 modules will be loaded. grep "SIP/777-0821fbd8" /var/log/asterisk/messages | wc -l 65028 By: Russell Bryant (russell) 2007-12-26 13:37:58.000-0600 There is a good chance this has already been fixed. Please update and try again. Also, if it still happens, try to grab "core show locks" output while it is freaking out. $ sudo asterisk -rx "core show locks" > showlocks.txt By: pj (pj) 2007-12-30 13:27:53.000-0600 SVN-trunk-r95139 "prodding channel" loop appears again, 'thread apply all bt' output attached... By: Tilghman Lesher (tilghman) 2007-12-31 10:18:33.000-0600 pj: were you able to grab a "core show locks" as requested? By: pj (pj) 2008-01-02 06:57:54.000-0600 I recompiled SVN-trunk-r95313 with DONT_OPTIMIZE and DEBUG_THREADS turned on, I will wait to another deadlock and will try to grab core show locks output. Do you still need also "thread apply all bt" from deadlocked asterisk, or it was useless? I don't know, if it's related, today this build crashed after calling party hangup ringing, but unanswered channel (maybe some issue with storing cdr in plain files?). bt, bt full attached... By: pj (pj) 2008-01-03 06:38:08.000-0600 after DEBUG_THREADS enabled, now I'm experiencing this messagess: NOTICE[12340]: chan_iax2.c:2105 iax2_destroy: Avoiding IAX destroy deadlock grep "Avoiding IAX destroy deadlock" /var/log/asterisk/messages | wc -l 1136 By: Tilghman Lesher (tilghman) 2008-01-03 12:20:18.000-0600 Actually, the gdb output suggests that you're getting memory corruption, as well, so you might want to follow the instructions in doc/valgrind.txt to get us that debugging information, as well. By: pj (pj) 2008-01-05 07:53:17.000-0600 another gdb attached, I'm not running under valgrind yet. Asterisk SVN-trunk-r96174 By: pj (pj) 2008-01-05 10:52:21.000-0600 I was finaly able to grab "core show locks" during prodding channel loop, log attached... By: pj (pj) 2008-01-07 05:40:39.000-0600 Asterisk SVN-trunk-r96645 uploading another deadlock asterisk-locks3.txt asterisk-locks4.txt please delete asterisk-locks2.txt it's mistake/duplicate. related console messages during last locks: [Jan 7 12:36:52] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing! [Jan 7 12:36:53] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing! [Jan 7 12:36:54] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing! [Jan 7 12:36:55] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing! [Jan 7 12:36:56] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing! [Jan 7 12:36:57] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing! [Jan 7 12:36:57] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for scheduling! [Jan 7 12:36:58] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing! [Jan 7 12:36:58] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for scheduling! [Jan 7 12:36:59] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing! [Jan 7 12:36:59] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for scheduling! [Jan 7 12:37:00] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing! By: pj (pj) 2008-01-08 11:25:38.000-0600 currently almost every call makes asterisk locks, as I posted before, disabling jitterbuffer has no efect, often audible click in sound over iax should I recompile with another debug options? By: pj (pj) 2008-01-12 14:15:45.000-0600 I was changed all my voip trunks from iax to sip, now it's works _much_ better, no crashes, no deathlocks, no sound clicks, it seems, that chan_iax is somehow buggy in trunk, I will monitor my asterisk if also "prodding channel.." will be gone. By: pj (pj) 2008-01-14 02:57:13.000-0600 crash again today, after call hangup, pure sip2sip environment, gdb4.txt attached Asterisk SVN-trunk-r98083, I will try to recompile without DEBUG_THREADS, maybe it's source of all my crashes. By: pj (pj) 2008-01-14 13:42:59.000-0600 Asterisk SVN-trunk-r98676 compiled without DEBUG_THREADS, but crashed again, after call hangup (congestion), atached gdb5.txt By: pj (pj) 2008-01-15 05:35:27.000-0600 it seems, that asterisk have some memory leak, it starts about 20MB memory, but after several hours with minimal call processing load (about 30 calls) eats >120MB! this is also indication for oncomming crash. I don't see this behaviour before. Asterisk SVN-trunk-r98676 memory usage, potential memory leak: root 28233 0.0 0.7 3812 884 pts/2 S Jan14 0:00 /bin/sh /etc/init.d/asterisk start asterisk 28456 1.2 74.9 123704 94556 pts/2 Sl Jan14 13:06 \_ asterisk -vvv -c memory usage after restart: root 28928 0.0 0.6 3812 844 pts/2 S 12:29 0:00 /bin/sh /etc/init.d/asterisk start asterisk 28935 0.3 3.9 11552 4968 pts/2 Sl 12:29 0:01 \_ asterisk -vvv -c By: pj (pj) 2008-01-17 15:25:50.000-0600 how can I debug potential memory leak in asterisk, as I mentioned above? By: pj (pj) 2008-02-06 02:46:00.000-0600 my first issue 'prodding channel' was avoided when I migrate from iax to sip and throw iax away. my second issue with memory leaks and crashes no longer remaining in last asterisk trunk revisions so, I think, you can close this tiket, thanks. By: jmls (jmls) 2008-02-17 13:05:41.000-0600 at reporter's request By: pj (pj) 2008-03-01 05:14:07.000-0600 I would like to reopen this issue, I found source of the 'prodding channel..." issue, it can be easy reproduced: I'm using softphone to call this simple dialplan: '555' => 1. Set(TIMEOUT(absolute)=60) [pbx_ael] 2. MusicOnHold() [pbx_ael] after call is established, I kill softphone to simulate broken connection, after absolute timeout expires asterisk starts looping in cycle below... when I remove 'Set(TIMEOUT(absolute)=60)' and repeat, channel remains open, even if I have 'rtptimeout=15' in sip.conf, this is related to # 0011562: "rtptimeout" doesn't terminate channel if RTP is lost during "Echo()" [Mar 1 12:00:47] WARNING[27792]: channel.c:2883 ast_prod: Prodding channel 'SIP/324z-082731b8' failed [Mar 1 12:00:47] -- Started music on hold, class 'default', on SIP/324z-082731b8 [Mar 1 12:00:47] -- Stopped music on hold on SIP/324z-082731b8 [Mar 1 12:00:47] == Spawn extension (from-pj-test, 555, 2) exited non-zero on 'SIP/324z-082731b8' [Mar 1 12:00:47] -- Executing [555@from-pj-test:2] MusicOnHold("SIP/324z-082731b8", "") in new stack [Mar 1 12:00:47] WARNING[27792]: channel.c:2883 ast_prod: Prodding channel 'SIP/324z-082731b8' failed [Mar 1 12:00:47] -- Started music on hold, class 'default', on SIP/324z-082731b8 [Mar 1 12:00:47] -- Stopped music on hold on SIP/324z-082731b8 [Mar 1 12:00:47] == Spawn extension (from-pj-test, 555, 2) exited non-zero on 'SIP/324z-082731b8' By: Digium Subversion (svnbot) 2008-04-09 12:43:37 Repository: asterisk Revision: 113836 U trunk/main/pbx.c ------------------------------------------------------------------------ r113836 | mmichelson | 2008-04-09 12:43:36 -0500 (Wed, 09 Apr 2008) | 14 lines There was a subtle logical difference between 1.4 and trunk with regards to how timeouts were handled. In 1.4, if the absolute timeout were reached on a call, no matter what the return value of ast_spawn_extension was, the pbx would attempt to go to the 'T' extension or hangup otherwise. The rearrangement of this function in trunk made this check only happen in the case that ast_spawn_extension returned 0. If ast_spawn_extension returned 1, then the fact that the timeout expired resulted in a no-op, and would cause an infinite loop to occur in __ast_pbx_run. This change fixes this problem. Now timeouts will behave as they did in 1.4 (closes issue ASTERISK-11038) Reported by: pj Tested by: putnopvut ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=113836 By: Digium Subversion (svnbot) 2008-04-09 12:45:04 Repository: asterisk Revision: 113837 _U branches/1.6.0/ U branches/1.6.0/main/pbx.c ------------------------------------------------------------------------ r113837 | mmichelson | 2008-04-09 12:45:03 -0500 (Wed, 09 Apr 2008) | 22 lines Merged revisions 113836 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r113836 | mmichelson | 2008-04-09 12:48:33 -0500 (Wed, 09 Apr 2008) | 14 lines There was a subtle logical difference between 1.4 and trunk with regards to how timeouts were handled. In 1.4, if the absolute timeout were reached on a call, no matter what the return value of ast_spawn_extension was, the pbx would attempt to go to the 'T' extension or hangup otherwise. The rearrangement of this function in trunk made this check only happen in the case that ast_spawn_extension returned 0. If ast_spawn_extension returned 1, then the fact that the timeout expired resulted in a no-op, and would cause an infinite loop to occur in __ast_pbx_run. This change fixes this problem. Now timeouts will behave as they did in 1.4 (closes issue ASTERISK-11038) Reported by: pj Tested by: putnopvut ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=113837 By: Digium Subversion (svnbot) 2008-04-09 13:27:37 Repository: asterisk Revision: 113872 _U team/group/codec_bits/ U team/group/codec_bits/channels/chan_h323.c U team/group/codec_bits/channels/chan_skinny.c U team/group/codec_bits/contrib/scripts/astcli U team/group/codec_bits/main/pbx.c ------------------------------------------------------------------------ r113872 | tilghman | 2008-04-09 13:27:35 -0500 (Wed, 09 Apr 2008) | 44 lines Merged revisions 113834,113836,113838,113840 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r113834 | qwell | 2008-04-09 12:41:09 -0500 (Wed, 09 Apr 2008) | 7 lines Move all messages wrapped in skinnydebug from debug to verbose. (closes issue ASTERISK-11651) Reported by: DEA Patches: chan_skinny-debug-log.txt uploaded by DEA (license 3) ........ r113836 | mmichelson | 2008-04-09 12:48:33 -0500 (Wed, 09 Apr 2008) | 14 lines There was a subtle logical difference between 1.4 and trunk with regards to how timeouts were handled. In 1.4, if the absolute timeout were reached on a call, no matter what the return value of ast_spawn_extension was, the pbx would attempt to go to the 'T' extension or hangup otherwise. The rearrangement of this function in trunk made this check only happen in the case that ast_spawn_extension returned 0. If ast_spawn_extension returned 1, then the fact that the timeout expired resulted in a no-op, and would cause an infinite loop to occur in __ast_pbx_run. This change fixes this problem. Now timeouts will behave as they did in 1.4 (closes issue ASTERISK-11038) Reported by: pj Tested by: putnopvut ........ r113838 | qwell | 2008-04-09 12:56:07 -0500 (Wed, 09 Apr 2008) | 2 lines Fix a small file handle "leak" pointed out by jjshoe on #asterisk. ........ r113840 | file | 2008-04-09 13:05:40 -0500 (Wed, 09 Apr 2008) | 4 lines Enable enough RTP bridging to allow P2P to work. (closes issue ASTERISK-11355) Reported by: pj ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=113872 |