Summary: | ASTERISK-17832: [regression] Deadlock in chan_sip | ||
Reporter: | Clod Patry (junky) | Labels: | |
Date Opened: | 2011-05-10 15:39:05 | Date Closed: | 2011-07-15 14:12:22 |
Priority: | Major | Regression? | Yes |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | 1.8.4 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Attachments: | ||
Description: | Hi, by testing 1.8.4 this morning, i've got a deadlock after 48 minutes. That machine just getting SIP calls, launching MeetMe() and nothing else. yankee*CLI> core show locks ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 140112174364944 (do_monitor started at [24712] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 24684 do_monitor &monlock 0x7f6e6e8c9fe0 (1) /usr/sbin/asterisk(ast_bt_get_addresses+0x1d) [0x4ef2a4] /usr/sbin/asterisk(__ast_pthread_mutex_lock+0xd9) [0x4e7df8] /usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e67dda9] /usr/sbin/asterisk [0x570f25] /lib/libpthread.so.0 [0x7f6e76e25a04] /lib/libc.so.6(clone+0x6d) [0x7f6e7766ed4d] === ---> Tried and failed to get Lock #1 (chan_sip.c): MUTEX 3756 __sip_autodestruct p->owner 0x24195f8 (0) /usr/sbin/asterisk(ast_bt_get_addresses+0x1d) [0x4ef2a4] /usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xd9) [0x4e81b6] /usr/sbin/asterisk(__ao2_trylock+0x5a) [0x44884e] /usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e610c49] /usr/sbin/asterisk(ast_sched_runq+0x18e) [0x5540fc] /usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e67ddbb] /usr/sbin/asterisk [0x570f25] /lib/libpthread.so.0 [0x7f6e76e25a04] /lib/libc.so.6(clone+0x6d) [0x7f6e7766ed4d] === ------------------------------------------------------------------- === === Thread ID: 140111955966224 (pbx_thread started at [ 5038] pbx.c ast_pbx_start()) === ---> Lock #0 (channel.c): MUTEX 3661 __ast_read chan 0x24195f8 (1) /usr/sbin/asterisk(ast_bt_get_addresses+0x1d) [0x4ef2a4] /usr/sbin/asterisk(__ast_pthread_mutex_lock+0xd9) [0x4e7df8] /usr/sbin/asterisk(__ao2_lock+0x5a) [0x44878c] /usr/sbin/asterisk [0x47681a] /usr/sbin/asterisk(ast_read+0x1d) [0x478d77] /usr/lib/asterisk/modules/app_meetme.so [0x7f6e6b560e25] /usr/lib/asterisk/modules/app_meetme.so [0x7f6e6b56753a] /usr/sbin/asterisk(pbx_exec+0x1fb) [0x508b59] /usr/sbin/asterisk [0x512cc2] /usr/sbin/asterisk(ast_spawn_extension+0x65) [0x51479c] /usr/sbin/asterisk [0x51520b] /usr/sbin/asterisk [0x516e2b] /usr/sbin/asterisk [0x570f25] /lib/libpthread.so.0 [0x7f6e76e25a04] /lib/libc.so.6(clone+0x6d) [0x7f6e7766ed4d] === ------------------------------------------------------------------- === === Thread ID: 140111957489936 (netconsole started at [ 1344] asterisk.c listener()) === ---> Waiting for Lock #0 (cli.c): MUTEX 900 handle_chanlist c 0x24195f8 (1) /usr/sbin/asterisk(ast_bt_get_addresses+0x1d) [0x4ef2a4] /usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xd9) [0x4e81b6] /usr/sbin/asterisk(__ao2_trylock+0x5a) [0x44884e] /usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e610c49] /usr/sbin/asterisk(ast_sched_runq+0x18e) [0x5540fc] /usr/lib/asterisk/modules/chan_sip.so [0x7f6e6e67ddbb] /usr/sbin/asterisk [0x570f25] /lib/libpthread.so.0 [0x7f6e76e25a04] /lib/libc.so.6(clone+0x6d) [0x7f6e7766ed4d] === --- ---> Locked Here: channel.c line 3661 (__ast_read) === ------------------------------------------------------------------- === ======================================================================= | ||
Comments: | By: Leif Madsen (lmadsen) 2011-05-10 16:38:49 What was the previous version that didn't exhibit the deadlock? If this happens again, can you provide a backtrace of the running process? Thanks! By: Clod Patry (junky) 2011-05-10 16:48:14 I used 1.8.3 without any issue. I've noticed the 1.8.4-rc2 caused a deadlock, but i'm not sure it's the same deadlock though. Since this is a production system, i had to rollback to 1.8.3 to be stable with customers. By: Igor Nikolaev (microlana) 2011-05-16 13:24:06 See issue ASTERISK-1905304. This is because ast_read() into infinity loop into read() system call with disarmed timer (when use res_timerfd module as timing source). By: Gregory Hinton Nietsky (irroot) 2011-05-17 03:07:51 im with @microlana remove timerfd and use dahdi ... ASTERISK-17407 this is not a "deadlock" its a "block" By: Clod Patry (junky) 2011-05-18 22:43:15 im with 1.8.4 for 1 day, 9 hours, 41 minutes, 29 seconds By having the res_timing_timerfd disabled (and res_timing_dahdi enabled), it seems it fixes my issues too. Good job microlana & iroot. |