[Home]

Summary:ASTERISK-11038: Prodding channel... causes 100% utilization
Reporter:pj (pj)Labels:
Date Opened:2007-12-14 08:37:35.000-0600Date Closed:2008-04-09 13:27:37
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk-locks.txt
( 1) asterisk-locks2.txt
( 2) asterisk-locks3.txt
( 3) asterisk-locks4.txt
( 4) gdb.txt
( 5) gdb2.txt
( 6) gdb3.txt
( 7) gdb4.txt
( 8) gdb5.txt
( 9) prodding-channel-crash.txt
Description:from time to time, asterisk is locked (100% cpu utilization, but not crash) with "Prodding channel..." messages...
grep "SIP/777-082ae2f8" /var/log/asterisk/messages | wc -l
856979

in my case it happened in entering simple diaplan, calling MoH...
555 => { Set(TIMEOUT(absolute)=60); MusicOnHold(); }
user is connected over wifi, so maybe some packet was lost in some critical call stage, that causes sip channel starts looping?

unfortunately, I was not able to do some debugging, because asterisk pc was completely locked, due to realtime priority of asterisk process...

****** ADDITIONAL INFORMATION ******

[Dec 14 09:40:21] ERROR[4145] chan_sip.c: We could NOT get the channel lock for SIP/777-082ae2f8!
[Dec 14 09:40:21] ERROR[4145] chan_sip.c: SIP transaction failed: 1521191083@10.53.34.221
[Dec 14 09:41:18] WARNING[4145] chan_sip.c: Maximum retries exceeded on transmission 3021042242@10.53.34.221 for seqno 2 (Critical Response)

[Dec 14 09:41:21] WARNING[4145] channel.c: Prodding channel 'SIP/777-082ae2f8' failed
[Dec 14 09:41:21] WARNING[4145] channel.c: Prodding channel 'SIP/777-082ae2f8' failed
[Dec 14 09:41:21] WARNING[4145] channel.c: Prodding channel 'SIP/777-082ae2f8' failed
Comments:By: Tilghman Lesher (tilghman) 2007-12-14 08:53:21.000-0600

Please try reproducing this, by excluding the "-p" flag when you start Asterisk.

By: pj (pj) 2007-12-14 09:11:59.000-0600

OK, I will try to run asterisk without realtime priority, but please tell me, if that happened again, what should I do to debug this issue?

By: Tilghman Lesher (tilghman) 2007-12-14 09:14:37.000-0600

Attach to the process with gdb and run:  thread apply all bt
then upload the output as a file here.

By: pj (pj) 2007-12-14 09:23:42.000-0600

so I must recompile with 'debug thread' and "don't optimize"? because when I try this now 'gdb asterisk' and run 'thread apply all bt' it do nothing.

By: Joshua C. Colp (jcolp) 2007-12-14 09:30:17.000-0600

Are you using zaptel timing as well? And once you do have it in a state where you can get debug info a core show locks would be useful, and the complete console output with debug enabled.

By: Tilghman Lesher (tilghman) 2007-12-14 09:33:01.000-0600

No, the way to attach to a running asterisk is with:

gdb -p <pid-of-asterisk>

By: pj (pj) 2007-12-14 13:24:16.000-0600

some console output from today's 'proding channel...' issue, it's repeating many times.
yes, I'm using ztdummy as timming source and have in asterisk.conf
[options]
internal_timing = yes
it's pure voip gateway, without real isdn hardware.


[Dec 14 10:02:19]     -- Executing [555@from-pj-test:2] MusicOnHold("SIP/777-082ae2f8", "") in new stack
[Dec 14 10:02:19] WARNING[4145]: channel.c:2723 ast_prod: Prodding channel 'SIP/777-082ae2f8' failed
[Dec 14 10:02:19]     -- Started music on hold, class 'default', on SIP/777-082ae2f8
[Dec 14 10:02:19]     -- Stopped music on hold on SIP/777-082ae2f8
[Dec 14 10:02:19]   == Spawn extension (from-pj-test, 555, 2) exited non-zero on 'SIP/777-082ae2f8'
[Dec 14 10:02:19]     -- Executing [555@from-pj-test:2] MusicOnHold("SIP/777-082ae2f8", "") in new stack
[Dec 14 10:02:19] WARNING[4145]: channel.c:2723 ast_prod: Prodding channel 'SIP/777-082ae2f8' failed
[Dec 14 10:02:19]     -- Started music on hold, class 'default', on SIP/777-082ae2f8
[Dec 14 10:02:19]     -- Stopped music on hold on SIP/777-082ae2f8
[Dec 14 10:02:19]   == Spawn extension (from-pj-test, 555, 2) exited non-zero on 'SIP/777-082ae2f8'
[Dec 14 10:02:19]     -- Executing [555@from-pj-test:2] MusicOnHold("SIP/777-082ae2f8", "") in new stack

By: pj (pj) 2007-12-15 03:35:17.000-0600

today asterisk crashes in next call during "prodding channel messages", so I'm not able to atach to asterisk process when in looping state, but attaching 'bt' from generated core....

By: pj (pj) 2007-12-26 12:36:56.000-0600

trunk-r93763
issue still persist and asterisk crash after many "prodding channel" cycles, so I'm not able to attach gdb to running asterisk process,
crash dump is now not generated, even if I have in asterisk.conf
[options]
dumpcore = yes


[Dec 26 15:06:59] ERROR[2914] chan_sip.c: We could NOT get the channel lock for SIP/777-0821fbd8!
[Dec 26 15:06:59] ERROR[2914] chan_sip.c: SIP transaction failed: 936481778@10.53.34.221
[Dec 26 15:07:59] WARNING[2914] channel.c: Prodding channel 'SIP/777-0821fbd8' failed
.......
.......
[Dec 26 15:09:52] WARNING[2914] channel.c: Prodding channel 'SIP/777-0821fbd8' failed
[Dec 26 15:12:53] NOTICE[3675] cdr.c: CDR simple logging enabled.
[Dec 26 15:12:54] NOTICE[3675] loader.c: 86 modules will be loaded.


grep "SIP/777-0821fbd8" /var/log/asterisk/messages | wc -l
65028



By: Russell Bryant (russell) 2007-12-26 13:37:58.000-0600

There is a good chance this has already been fixed.  Please update and try again.

Also, if it still happens, try to grab "core show locks" output while it is freaking out.

$ sudo asterisk -rx "core show locks" > showlocks.txt

By: pj (pj) 2007-12-30 13:27:53.000-0600

SVN-trunk-r95139
"prodding channel" loop appears again,
'thread apply all bt' output attached...

By: Tilghman Lesher (tilghman) 2007-12-31 10:18:33.000-0600

pj:  were you able to grab a "core show locks" as requested?

By: pj (pj) 2008-01-02 06:57:54.000-0600

I recompiled SVN-trunk-r95313 with DONT_OPTIMIZE and DEBUG_THREADS turned on, I will wait to another deadlock and will try to grab core show locks output. Do you still need also "thread apply all bt" from deadlocked asterisk, or it was useless?

I don't know, if it's related, today this build crashed after calling party hangup ringing, but  unanswered channel (maybe some issue with storing cdr in plain files?).
bt, bt full attached...



By: pj (pj) 2008-01-03 06:38:08.000-0600

after DEBUG_THREADS enabled, now I'm experiencing this messagess:
NOTICE[12340]: chan_iax2.c:2105 iax2_destroy: Avoiding IAX destroy deadlock

grep "Avoiding IAX destroy deadlock" /var/log/asterisk/messages | wc -l
1136

By: Tilghman Lesher (tilghman) 2008-01-03 12:20:18.000-0600

Actually, the gdb output suggests that you're getting memory corruption, as well, so you might want to follow the instructions in doc/valgrind.txt to get us that debugging information, as well.

By: pj (pj) 2008-01-05 07:53:17.000-0600

another gdb attached, I'm not running under valgrind yet.
Asterisk SVN-trunk-r96174

By: pj (pj) 2008-01-05 10:52:21.000-0600

I was finaly able to grab "core show locks" during prodding channel loop,
log attached...

By: pj (pj) 2008-01-07 05:40:39.000-0600

Asterisk SVN-trunk-r96645
uploading another deadlock
asterisk-locks3.txt
asterisk-locks4.txt
please delete asterisk-locks2.txt it's mistake/duplicate.

related console messages during last locks:

[Jan  7 12:36:52] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing!
[Jan  7 12:36:53] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing!
[Jan  7 12:36:54] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing!
[Jan  7 12:36:55] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing!
[Jan  7 12:36:56] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing!
[Jan  7 12:36:57] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing!
[Jan  7 12:36:57] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for scheduling!
[Jan  7 12:36:58] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing!
[Jan  7 12:36:58] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for scheduling!
[Jan  7 12:36:59] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing!
[Jan  7 12:36:59] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for scheduling!
[Jan  7 12:37:00] NOTICE[29929] chan_iax2.c: Out of idle IAX2 threads for I/O, pausing!



By: pj (pj) 2008-01-08 11:25:38.000-0600

currently almost every call makes asterisk locks, as I posted before,
disabling jitterbuffer has no efect,
often audible click in sound over iax
should I recompile with another debug options?

By: pj (pj) 2008-01-12 14:15:45.000-0600

I was changed all my voip trunks from iax to sip,
now it's works _much_ better, no crashes, no deathlocks, no sound clicks,
it seems, that chan_iax is somehow buggy in trunk,
I will monitor my asterisk if also "prodding channel.." will be gone.

By: pj (pj) 2008-01-14 02:57:13.000-0600

crash again today, after call hangup, pure sip2sip environment, gdb4.txt attached
Asterisk SVN-trunk-r98083,
I will try to recompile without DEBUG_THREADS, maybe it's source of all my crashes.



By: pj (pj) 2008-01-14 13:42:59.000-0600

Asterisk SVN-trunk-r98676
compiled without DEBUG_THREADS, but crashed again, after call hangup (congestion),
atached gdb5.txt

By: pj (pj) 2008-01-15 05:35:27.000-0600

it seems, that asterisk have some memory leak,
it starts about 20MB memory, but after several hours with minimal call processing load (about 30 calls) eats >120MB! this is also indication for oncomming crash.
I don't see this behaviour before.
Asterisk SVN-trunk-r98676
memory usage, potential memory leak:
root     28233  0.0  0.7   3812   884 pts/2    S    Jan14   0:00 /bin/sh /etc/init.d/asterisk start
asterisk 28456  1.2 74.9 123704 94556 pts/2    Sl   Jan14  13:06  \_ asterisk -vvv -c
memory usage after restart:
root     28928  0.0  0.6   3812   844 pts/2    S    12:29   0:00 /bin/sh /etc/init.d/asterisk start
asterisk 28935  0.3  3.9  11552  4968 pts/2    Sl   12:29   0:01  \_ asterisk -vvv -c



By: pj (pj) 2008-01-17 15:25:50.000-0600

how can I debug potential memory leak in asterisk, as I mentioned above?

By: pj (pj) 2008-02-06 02:46:00.000-0600

my first issue 'prodding channel' was avoided when I migrate from iax to sip and throw iax away.
my second issue with memory leaks and crashes no longer remaining in last asterisk trunk revisions
so, I think, you can close this tiket, thanks.

By: jmls (jmls) 2008-02-17 13:05:41.000-0600

at reporter's request

By: pj (pj) 2008-03-01 05:14:07.000-0600

I would like to reopen this issue,
I found source of the 'prodding channel..." issue, it can be easy reproduced:
I'm using softphone to call this simple dialplan:
 '555' =>          1. Set(TIMEOUT(absolute)=60)                  [pbx_ael]
                   2. MusicOnHold()                              [pbx_ael]

after call is established, I kill softphone to simulate broken connection, after absolute timeout expires asterisk starts looping in cycle below...

when I remove 'Set(TIMEOUT(absolute)=60)' and repeat, channel remains open, even if I have 'rtptimeout=15' in sip.conf, this is related to
# 0011562: "rtptimeout" doesn't terminate channel if RTP is lost during "Echo()"




[Mar  1 12:00:47] WARNING[27792]: channel.c:2883 ast_prod: Prodding channel 'SIP/324z-082731b8' failed
[Mar  1 12:00:47]     -- Started music on hold, class 'default', on SIP/324z-082731b8
[Mar  1 12:00:47]     -- Stopped music on hold on SIP/324z-082731b8
[Mar  1 12:00:47]   == Spawn extension (from-pj-test, 555, 2) exited non-zero on 'SIP/324z-082731b8'
[Mar  1 12:00:47]     -- Executing [555@from-pj-test:2] MusicOnHold("SIP/324z-082731b8", "") in new stack
[Mar  1 12:00:47] WARNING[27792]: channel.c:2883 ast_prod: Prodding channel 'SIP/324z-082731b8' failed
[Mar  1 12:00:47]     -- Started music on hold, class 'default', on SIP/324z-082731b8
[Mar  1 12:00:47]     -- Stopped music on hold on SIP/324z-082731b8
[Mar  1 12:00:47]   == Spawn extension (from-pj-test, 555, 2) exited non-zero on 'SIP/324z-082731b8'

By: Digium Subversion (svnbot) 2008-04-09 12:43:37

Repository: asterisk
Revision: 113836

U   trunk/main/pbx.c

------------------------------------------------------------------------
r113836 | mmichelson | 2008-04-09 12:43:36 -0500 (Wed, 09 Apr 2008) | 14 lines

There was a subtle logical difference between 1.4 and trunk with regards to how timeouts
were handled. In 1.4, if the absolute timeout were reached on a call, no matter what
the return value of ast_spawn_extension was, the pbx would attempt to go to the 'T'
extension or hangup otherwise. The rearrangement of this function in trunk made this check
only happen in the case that ast_spawn_extension returned 0. If ast_spawn_extension returned
1, then the fact that the timeout expired resulted in a no-op, and would cause an infinite
loop to occur in __ast_pbx_run. This change fixes this problem. Now timeouts will
behave as they did in 1.4

(closes issue ASTERISK-11038)
Reported by: pj
Tested by: putnopvut


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=113836

By: Digium Subversion (svnbot) 2008-04-09 12:45:04

Repository: asterisk
Revision: 113837

_U  branches/1.6.0/
U   branches/1.6.0/main/pbx.c

------------------------------------------------------------------------
r113837 | mmichelson | 2008-04-09 12:45:03 -0500 (Wed, 09 Apr 2008) | 22 lines

Merged revisions 113836 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
r113836 | mmichelson | 2008-04-09 12:48:33 -0500 (Wed, 09 Apr 2008) | 14 lines

There was a subtle logical difference between 1.4 and trunk with regards to how timeouts
were handled. In 1.4, if the absolute timeout were reached on a call, no matter what
the return value of ast_spawn_extension was, the pbx would attempt to go to the 'T'
extension or hangup otherwise. The rearrangement of this function in trunk made this check
only happen in the case that ast_spawn_extension returned 0. If ast_spawn_extension returned
1, then the fact that the timeout expired resulted in a no-op, and would cause an infinite
loop to occur in __ast_pbx_run. This change fixes this problem. Now timeouts will
behave as they did in 1.4

(closes issue ASTERISK-11038)
Reported by: pj
Tested by: putnopvut


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=113837

By: Digium Subversion (svnbot) 2008-04-09 13:27:37

Repository: asterisk
Revision: 113872

_U  team/group/codec_bits/
U   team/group/codec_bits/channels/chan_h323.c
U   team/group/codec_bits/channels/chan_skinny.c
U   team/group/codec_bits/contrib/scripts/astcli
U   team/group/codec_bits/main/pbx.c

------------------------------------------------------------------------
r113872 | tilghman | 2008-04-09 13:27:35 -0500 (Wed, 09 Apr 2008) | 44 lines

Merged revisions 113834,113836,113838,113840 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
r113834 | qwell | 2008-04-09 12:41:09 -0500 (Wed, 09 Apr 2008) | 7 lines

Move all messages wrapped in skinnydebug from debug to verbose.

(closes issue ASTERISK-11651)
Reported by: DEA
Patches:
     chan_skinny-debug-log.txt uploaded by DEA (license 3)

........
r113836 | mmichelson | 2008-04-09 12:48:33 -0500 (Wed, 09 Apr 2008) | 14 lines

There was a subtle logical difference between 1.4 and trunk with regards to how timeouts
were handled. In 1.4, if the absolute timeout were reached on a call, no matter what
the return value of ast_spawn_extension was, the pbx would attempt to go to the 'T'
extension or hangup otherwise. The rearrangement of this function in trunk made this check
only happen in the case that ast_spawn_extension returned 0. If ast_spawn_extension returned
1, then the fact that the timeout expired resulted in a no-op, and would cause an infinite
loop to occur in __ast_pbx_run. This change fixes this problem. Now timeouts will
behave as they did in 1.4

(closes issue ASTERISK-11038)
Reported by: pj
Tested by: putnopvut


........
r113838 | qwell | 2008-04-09 12:56:07 -0500 (Wed, 09 Apr 2008) | 2 lines

Fix a small file handle "leak" pointed out by jjshoe on #asterisk.

........
r113840 | file | 2008-04-09 13:05:40 -0500 (Wed, 09 Apr 2008) | 4 lines

Enable enough RTP bridging to allow P2P to work.
(closes issue ASTERISK-11355)
Reported by: pj

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=113872