[Home]

Summary:ASTERISK-17832: [regression] Deadlock in chan_sip
Reporter:Clod Patry (junky)Labels:
Date Opened:2011-05-10 15:39:05Date Closed:2011-07-15 14:12:22
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:1.8.4 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:Hi,
by testing 1.8.4 this morning, i've got a deadlock after 48 minutes.
That machine just getting SIP calls, launching MeetMe() and nothing else.



yankee*CLI> core show locks

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 140112174364944 (do_monitor           started at [24712] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 24684 do_monitor &monlock 0x7f6e6e8c9fe0 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x1d) [0x4ef2a4]
/usr/sbin/asterisk(__ast_pthread_mutex_lock+0xd9) [0x4e7df8]
/usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e67dda9]
/usr/sbin/asterisk [0x570f25]
/lib/libpthread.so.0 [0x7f6e76e25a04]
/lib/libc.so.6(clone+0x6d) [0x7f6e7766ed4d]
=== ---> Tried and failed to get Lock #1 (chan_sip.c): MUTEX 3756 __sip_autodestruct p->owner 0x24195f8 (0)
/usr/sbin/asterisk(ast_bt_get_addresses+0x1d) [0x4ef2a4]
/usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xd9) [0x4e81b6]
/usr/sbin/asterisk(__ao2_trylock+0x5a) [0x44884e]
/usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e610c49]
/usr/sbin/asterisk(ast_sched_runq+0x18e) [0x5540fc]
/usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e67ddbb]
/usr/sbin/asterisk [0x570f25]
/lib/libpthread.so.0 [0x7f6e76e25a04]
/lib/libc.so.6(clone+0x6d) [0x7f6e7766ed4d]
=== -------------------------------------------------------------------
===
=== Thread ID: 140111955966224 (pbx_thread           started at [ 5038] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 3661 __ast_read chan 0x24195f8 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x1d) [0x4ef2a4]
/usr/sbin/asterisk(__ast_pthread_mutex_lock+0xd9) [0x4e7df8]
/usr/sbin/asterisk(__ao2_lock+0x5a) [0x44878c]
/usr/sbin/asterisk [0x47681a]
/usr/sbin/asterisk(ast_read+0x1d) [0x478d77]
/usr/lib/asterisk/modules/app_meetme.so [0x7f6e6b560e25]
/usr/lib/asterisk/modules/app_meetme.so [0x7f6e6b56753a]
/usr/sbin/asterisk(pbx_exec+0x1fb) [0x508b59]
/usr/sbin/asterisk [0x512cc2]
/usr/sbin/asterisk(ast_spawn_extension+0x65) [0x51479c]
/usr/sbin/asterisk [0x51520b]
/usr/sbin/asterisk [0x516e2b]
/usr/sbin/asterisk [0x570f25]
/lib/libpthread.so.0 [0x7f6e76e25a04]
/lib/libc.so.6(clone+0x6d) [0x7f6e7766ed4d]
=== -------------------------------------------------------------------
===
=== Thread ID: 140111957489936 (netconsole           started at [ 1344] asterisk.c listener())
=== ---> Waiting for Lock #0 (cli.c): MUTEX 900 handle_chanlist c 0x24195f8 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x1d) [0x4ef2a4]
/usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xd9) [0x4e81b6]
/usr/sbin/asterisk(__ao2_trylock+0x5a) [0x44884e]
/usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e610c49]
/usr/sbin/asterisk(ast_sched_runq+0x18e) [0x5540fc]
/usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e67ddbb]
/usr/sbin/asterisk [0x570f25]
/lib/libpthread.so.0 [0x7f6e76e25a04]
/lib/libc.so.6(clone+0x6d) [0x7f6e7766ed4d]
=== --- ---> Locked Here: channel.c line 3661 (__ast_read)
=== -------------------------------------------------------------------
===
=======================================================================
Comments:By: Leif Madsen (lmadsen) 2011-05-10 16:38:49

What was the previous version that didn't exhibit the deadlock? If this happens again, can you provide a backtrace of the running process? Thanks!

By: Clod Patry (junky) 2011-05-10 16:48:14

I used 1.8.3 without any issue.
I've noticed the 1.8.4-rc2 caused a deadlock, but i'm not sure it's the same deadlock though.

Since this is a production system, i had to rollback to 1.8.3 to be stable with customers.

By: Igor Nikolaev (microlana) 2011-05-16 13:24:06

See issue ASTERISK-1905304. This is because ast_read() into infinity loop into read() system call with disarmed timer (when use res_timerfd module as timing source).

By: Gregory Hinton Nietsky (irroot) 2011-05-17 03:07:51

im with @microlana remove timerfd and use dahdi ... ASTERISK-17407 this is not a "deadlock" its a "block"

By: Clod Patry (junky) 2011-05-18 22:43:15

im with 1.8.4 for
1 day, 9 hours, 41 minutes, 29 seconds

By having the res_timing_timerfd disabled (and res_timing_dahdi enabled), it seems it fixes my issues too.

Good job microlana & iroot.