Summary:ASTERISK-14222: deadlock in res_timing_pthread and chan_sip do_monitor/rettransmit
Reporter:Tim Ringenbach at Asteria Solutions Group (tim_ringenbach)Labels:
Date Opened:2009-05-28 19:01:22Date Closed:2011-06-07 14:08:11
Versions:Frequency of
Environment:Attachments:( 0) backtrace.log
( 1) locks3.txt
Description:I ran into this deadlock while sending about 250 channels of ulaw fax between two asterisk boxes. Removing res_timing_pthread.so and replacing it with res_timing_dahdi.so seemed to make it stop deadlocking.

I'll attach the complete backtrace and 'core show locks' as files. But the deadlock seems to be between these two places:

__owner = 15306,
gdb) bt
#0  0xb7ee7410 in __kernel_vsyscall ()
#1  0xb7cd8881 in select () from /lib/tls/i686/cmov/libc.so.6
#2  0xb70fc4ca in read_pipe (rd_fd=1038, quantity=1, clear=1) at res_timing_pthread.c:378
#3  0xb70fc146 in pthread_timer_disable_continuous (handle=1038) at res_timing_pthread.c:247
#4  0x08187f56 in ast_timer_disable_continuous (handle=0x8f2c0d8) at timing.c:185
ASTERISK-1  0x080a8167 in __ast_read (chan=0x8676368, dropaudio=0) at channel.c:2644
ASTERISK-2  0x080a9e57 in ast_read (chan=0x8676368) at channel.c:2993
ASTERISK-3  0xb35e57e7 in wait_for_answer (in=0xae00e668, outgoing=0x87ba0b0, to=0xb16b3604, peerflags=0xb16b3dac, pa=0xb16b361c, num_in=0xb16b3428,
   result=0xb16b35e8) at app_dial.c:887
ASTERISK-4  0xb35ebb12 in dial_exec_full (chan=0xae00e668, data=0xb16b60c8, peerflags=0xb16b3dac, continue_exec=0x0) at app_dial.c:1846
ASTERISK-5  0xb35ee3c1 in dial_exec (chan=0xae00e668, data=0xb16b60c8) at app_dial.c:2252
ASTERISK-6 0x08123e32 in pbx_exec (c=0xae00e668, app=0x826a090, data=0xb16b60c8) at pbx.c:1348
ASTERISK-7 0x0812d218 in pbx_extension_helper (c=0xae00e668, con=0x0, context=0xae00eee8 "outcontext", exten=0xae00ef38 "5555555555", priority=3,
   label=0x0, callerid=0xad3ee170 "2567050287", action=E_SPAWN, found=0xb16b8228, combined_find_spawn=1) at pbx.c:3690
ASTERISK-8 0x0812e7d2 in ast_spawn_extension (c=0xae00e668, context=0xae00eee8 "outcontext", exten=0xae00ef38 "5555555555", priority=3,
   callerid=0xad3ee170 "2567050287", found=0xb16b8228, combined_find_spawn=1) at pbx.c:4143
ASTERISK-9 0x0812efc5 in __ast_pbx_run (c=0xae00e668, args=0x0) at pbx.c:4233
ASTERISK-10 0x0813077b in pbx_thread (data=0xae00e668) at pbx.c:4520
ASTERISK-11 0x0819353d in dummy_start (data=0xae26e050) at utils.c:968
ASTERISK-12 0xb7ad24fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-13 0xb7cdfe5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 526 (Thread 0xb6fb8b90 (LWP 14285)):
ASTERISK-2  0xb7cf35f6 in backtrace () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-3  0x0811025d in ast_bt_get_addresses (bt=0x867652c) at logger.c:1201
ASTERISK-4  0xb6fd8019 in __ast_pthread_mutex_trylock (filename=0xb704bc14 "chan_sip.c", lineno=3481, func=0xb704d15c "retrans_pkt",
   mutex_name=0xb704d400 "&pkt->owner->owner->lock_dont_use", t=0x8676400) at /usr/src/asterisk-
ASTERISK-5  0xb6fd7b4a in retrans_pkt (data=0x8895b08) at chan_sip.c:3481
ASTERISK-6 0x0817817d in ast_sched_runq (con=0xb71899a8) at sched.c:620
ASTERISK-7 0xb7036899 in do_monitor (data=0x0) at chan_sip.c:21330

Comments:By: Russell Bryant (russell) 2009-05-29 12:02:54

Is it really sitting in select?  That code tells select() to return immediately ...

By: Tim Ringenbach at Asteria Solutions Group (tim_ringenbach) 2009-05-29 12:35:27

I don't know. I guess I could have just caught it in there. I probably won't have time to do it again and check until at least Tuesday.

By: Russell Bryant (russell) 2009-05-29 17:38:22

Alright.  Also, if you try again, please try the latest code in the 1.6.2 branch, as I just committed some fixes to res_timing_pthread that could be related.

By: Leif Madsen (lmadsen) 2009-06-16 14:03:58

Just ping this issue again to see if we can get a report back from the reporter to determine if the issue is now resolved so we can close this issue. Thanks!

By: Leif Madsen (lmadsen) 2009-06-24 13:58:34

Closing this issue for now. The reporter is free to reopen the issue should this still be a problem. Thanks!