[Home]

Summary:ASTERISK-17407: [patch] Seems Like ast_read / timerfd_timer_ack Causes lockups and resource drain
Reporter:Gregory Hinton Nietsky (irroot)Labels:
Date Opened:2011-02-15 02:31:46.000-0600Date Closed:2011-06-04 10:06:52
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_timing_timerfd
Versions:1.6.2.15 Frequency of
Occurrence
Related
Issues:
is related toASTERISK-17867 [patch] Random entire asterisk deadlock when use builtin_atxfer when use res_timing_timerfd module
is related toASTERISK-17512 Asterisk-deadlocks - followup to #18497
Environment:Attachments:( 0) backtrace_for_lock3.txt
( 1) lock3.txt
( 2) res_timing_timerfd.issue18811.patch
Description:The system locks up eventually and as it is holding a channel lock things
like pickup and walking channels locked will cause a deadlock ...

i have reverted to dahdi timer that is not prone to this problem.



****** ADDITIONAL INFORMATION ******


Thread 52 (process 14822):
#0  0xb76e4430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb741921b in read () from /lib/libpthread.so.0
No symbol table info available.
#2  0xafd69e0e in timerfd_timer_ack (handle=64, quantity=1) at res_timing_timerfd.c:86
       expirations = 0
       read_result = 0
       __PRETTY_FUNCTION__ = "timerfd_timer_ack"
#3  0x0817573d in ast_timer_ack (handle=0xae2415d0, quantity=1) at timing.c:76
No locals.
#4  0x080a40b6 in __ast_read (chan=0xae24b528, dropaudio=0) at /mnt/x86_64/usr/src/voip/asterisk-1.6.2.17-rc2/include/asterisk/lock.h:236
       res = AST_TIMING_EVENT_EXPIRED
       f = (struct ast_frame *) 0x0
       blah = 1
       prestate = 6
       count = 8
       cause = 0
       __PRETTY_FUNCTION__ = "__ast_read"
ASTERISK-1  0x080a5adf in ast_read (chan=0xae24b528) at /mnt/x86_64/usr/src/voip/asterisk-1.6.2.17-rc2/include/asterisk/lock.h:236
No locals.
ASTERISK-2  0xb380b093 in wait_for_answer

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1382241424 (pbx_thread           started at [ 4627] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 2744 __ast_read &chan->lock_dont_use 0xae24b5c0 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x8107e33]
       /usr/sbin/asterisk() [0x809c674]
       /usr/sbin/asterisk() [0x80a3de0]
       /usr/sbin/asterisk(ast_read+0x19) [0x80a5adf]
       /usr/lib/asterisk/modules-1.6/app_queue.so(+0xd093) [0xb380b093]
       /usr/lib/asterisk/modules-1.6/app_queue.so(+0xf825) [0xb380d825]
       /usr/lib/asterisk/modules-1.6/app_queue.so(+0x16099) [0xb3814099]
       /usr/sbin/asterisk(pbx_exec+0x1a5) [0x811b433]
       /usr/sbin/asterisk() [0x8122aca]
       /usr/sbin/asterisk(ast_spawn_extension+0x53) [0x8124264]
       /usr/sbin/asterisk() [0x8124970]
       /usr/sbin/asterisk() [0x8125e39]
       /usr/sbin/asterisk() [0x81811b6]
       /lib/libpthread.so.0(+0x5afe) [0xb7411afe]
       /lib/libc.so.6(clone+0x5e) [0xb698764e]
=== -------------------------------------------------------------------
===
=======================================================================
Comments:By: Gregory Hinton Nietsky (irroot) 2011-02-15 03:51:20.000-0600

Here is a alternate path same effect ...

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1414575248 (pbx_thread           started at [ 4627] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 2744 __ast_read &chan->lock_dont_use 0xa2e02f8 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x8107e33]
       /usr/sbin/asterisk() [0x809c674]
       /usr/sbin/asterisk() [0x80a3de0]
       /usr/sbin/asterisk(ast_read+0x19) [0x80a5adf]
       /usr/sbin/asterisk(ast_safe_sleep_conditional+0xc0) [0x80a07b4]
       /usr/sbin/asterisk(ast_safe_sleep+0x28) [0x80a094e]
       /usr/lib/asterisk/modules-1.6/res_musiconhold.so(+0x4d54) [0xafbf2d54]
       /usr/sbin/asterisk(pbx_exec+0x1a5) [0x811b433]
       /usr/sbin/asterisk() [0x8122aca]
       /usr/sbin/asterisk(ast_spawn_extension+0x53) [0x8124264]
       /usr/lib/asterisk/modules-1.6/app_macro.so(+0x2ff8) [0xb3670ff8]
       /usr/lib/asterisk/modules-1.6/app_macro.so(+0x4109) [0xb3672109]
       /usr/sbin/asterisk(pbx_exec+0x1a5) [0x811b433]
       /usr/sbin/asterisk() [0x8122aca]
       /usr/sbin/asterisk(ast_spawn_extension+0x53) [0x8124264]
       /usr/sbin/asterisk() [0x8124970]
       /usr/sbin/asterisk() [0x8125e39]
       /usr/sbin/asterisk() [0x81811b6]
       /lib/libpthread.so.0(+0x5afe) [0xb75a2afe]
       /lib/libc.so.6(clone+0x5e) [0xb688564e]
=== -------------------------------------------------------------------
===
=======================================================================


Thread 47 (process 21863):
#0  0xb7875430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb75aa21b in read () from /lib/libpthread.so.0
No symbol table info available.
#2  0xafbdee0e in timerfd_timer_ack (handle=62, quantity=1) at res_timing_timerfd.c:86
       expirations = 0
       read_result = 0
       __PRETTY_FUNCTION__ = "timerfd_timer_ack"
#3  0x0817573d in ast_timer_ack (handle=0x9fdd228, quantity=1) at timing.c:76
No locals.
#4  0x080a40b6 in __ast_read (chan=0xa2e0260, dropaudio=0) at /mnt/x86_64/usr/src/voip/asterisk-1.6.2.17-rc2/include/asterisk/lock.h:236
       res = AST_TIMING_EVENT_EXPIRED
       f = (struct ast_frame *) 0x0
       blah = 1
       prestate = 6
       count = 7
       cause = 0
       __PRETTY_FUNCTION__ = "__ast_read"
ASTERISK-1  0x080a5adf in ast_read (chan=0xa2e0260) at /mnt/x86_64/usr/src/voip/asterisk-1.6.2.17-rc2/include/asterisk/lock.h:236
No locals.
ASTERISK-2  0x080a07b4 in ast_safe_sleep_conditional (chan=0xa2e0260, ms=2808, cond=0, data=0x0)
   at /mnt/x86_64/usr/src/voip/asterisk-1.6.2.17-rc2/include/asterisk/lock.h:236
       dup_f = (struct ast_frame *) 0x0
       f = (struct ast_frame *) 0x81fac00
       silgen = (struct ast_silence_generator *) 0x0
       res = 0
       deferred_frames = {first = 0x0, last = 0x0}
       __PRETTY_FUNCTION__ = "ast_safe_sleep_conditional"
ASTERISK-3  0x080a094e in ast_safe_sleep (chan=0xa2e0260, ms=10000) at /mnt/x86_64/usr/src/voip/asterisk-1.6.2.17-rc2/include/asterisk/lock.h:236
No locals.
ASTERISK-4  0xafbf2d54 in play_moh_exec (chan=0xa2e0260, data=0x81d2ec0) at /mnt/x86_64/usr/src/voip/asterisk-1.6.2.17-rc2/include/asterisk/lock.h:236
       parse = 0xabaeaaa0 ""
       class = 0x0
       timeout = -1
       res = 0
       args = {argc = 0, argv = 0xabaeaac0, class = 0x0, duration = 0x0}
       __PRETTY_FUNCTION__ = "play_moh_exec"

By: Gregory Hinton Nietsky (irroot) 2011-02-15 09:16:19.000-0600

see also ASTERISK-17388 this caused that resolved dead lock.

By: Leif Madsen (lmadsen) 2011-02-16 15:46:03.000-0600

irroot: I don't understand what you mean by your last note?

By: Gregory Hinton Nietsky (irroot) 2011-02-16 22:10:48.000-0600

Sorry @lmadsen the problem described above is the root of the deadlock experienced in ASTERISK-17388 blocking with a chan lock held caused this and other problems took me a while to debug and find the root cause .

By: Michel Verbraak (astmiv) 2011-04-12 14:16:22

Added patch resolves this deadlock problem.

Problem was that at some time the timing rate was set to 0 (zero) and after this a call to ast_timer_ack was done which in its turn would do a call to read().

Setting the rate of the timerfd to 0 stops the timerfd.

The patch cheks if the timer rate is set to 0. If so return otherwise continue and read ticks.

This problem also exists in 1.8.3.2 and trunk

Tested with sipp. Before this patch I got the deadlock between 5 to 10 calls. After the patch not anymore.

The other timing resources, pthread and dahdi, do not have this bug. Dahdi uses an ioctl to the dahdi kernel module and the pthread uses the polling meganism.



By: Gregory Hinton Nietsky (irroot) 2011-04-13 05:58:56

Thx ill try it out ...

By: JoshE (n8ideas) 2011-04-19 18:46:31

Just to give a little bit of information here, this patch on 1.6.2.17.2 does not fully resolve the deadlock issue.  I am running that version with timerfd and am still experiencing a deadlock path, which seems to occur around once weekly on a system that processes around 10,000 calls / day.

Unfortunately, I don't have the correct setup right now to get full backtraces, but I will do my best to supply them as soon as possible.



By: Gregory Hinton Nietsky (irroot) 2011-04-27 04:19:23

I have had no problems since apply in it but there has been low volume so not sure yet im giving it a week or two and then trying it on a higher vol site.

By: Gregory Hinton Nietsky (irroot) 2011-05-07 03:24:43

Ok this has not happened again on the asterisk at the office ill be introducing timmerfd into a few more higher use enviroments ... looking good so far ....

i have seen a couple of the warnings generated so good it seems to do ...

By: zvision (zvision) 2011-05-11 08:52:43

I have the same issue with the current Asterisk 1.6.2 SVN version. Launching MeetMe with a few hundreds users at a single conference almost always leaves a few deadlocked MeetMe users (usually 1-4). Attaching GDB to a running Asterisk show deadlocked MeetMe threads waiting in the timerfd_timer_ack read call. The dealock occurs when these users are about to hang up (VoIP channels are hung up correctly, only MeetMe users in a conference are left deadlocked).

By: Gregory Hinton Nietsky (irroot) 2011-05-11 09:41:30

This does not deadlock but it breaks timing somehow all back to dahdi will stay there

By: Leif Madsen (lmadsen) 2011-05-24 12:04:15

I'm closing this in favor of ASTERISK-17867 per a discussion on the asterisk-dev mailing list. Thanks!