[Home]

Summary:ASTERISK-18835: res_monitor causing deadlock with no calls coming through
Reporter:Shaun Clark (shaunc869)Labels:
Date Opened:2011-11-07 11:09:28.000-0600Date Closed:2011-11-29 14:05:32.000-0600
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:1.8.7.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Attachments:( 0) backtrace-threads.txt
( 1) core-show-locks.txt
Description:Randomly it seems our Asterisk server stops taking calls, sometimes for a few minutes and sometimes for hours. It seems that we get a lock in the system around restart_monitor(). Everything holds around this lock and sometimes it clears itself, sometimes even after trying to restart it a few times it doesn't seem to clear itself. We're not sure what's causing it. We've run the show locks commands and the backtrace (see attached files).
Comments:By: Shaun Clark (shaunc869) 2011-11-07 11:23:35.703-0600

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 0x2b93d464b950 (do_monitor           started at [25233] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 24737 handle_request_do &netlock 0x2b93b5258fa0 (1)
asterisk(ast_bt_get_addresses+0x1a) [0x4edc70]
asterisk(__ast_pthread_mutex_lock+0xd4) [0x4e695e]
/usr/lib/asterisk/modules/chan_sip.so [0x2b93b500b224]
/usr/lib/asterisk/modules/chan_sip.so [0x2b93b500b019]
asterisk(ast_io_wait+0x1ba) [0x4e0b14]
/usr/lib/asterisk/modules/chan_sip.so [0x2b93b500cb62]
asterisk [0x57042b]
/lib/libpthread.so.0 [0x2b93a19333ba]
/lib/libc.so.6(clone+0x6d) [0x2b93a113e02d]
=== ---> Lock #1 (chan_sip.c): MUTEX 7763 sip_pvt_lock_full pvt 0x1268460 (1)
asterisk(ast_bt_get_addresses+0x1a) [0x4edc70]
asterisk(__ast_pthread_mutex_lock+0xd4) [0x4e695e]
asterisk(__ao2_lock+0x53) [0x446064]
/usr/lib/asterisk/modules/chan_sip.so [0x2b93b4fb2446]
/usr/lib/asterisk/modules/chan_sip.so [0x2b93b500b300]
/usr/lib/asterisk/modules/chan_sip.so [0x2b93b500b019]
asterisk(ast_io_wait+0x1ba) [0x4e0b14]
/usr/lib/asterisk/modules/chan_sip.so [0x2b93b500cb62]
asterisk [0x57042b]
/lib/libpthread.so.0 [0x2b93a19333ba]
/lib/libc.so.6(clone+0x6d) [0x2b93a113e02d]
=== -------------------------------------------------------------------
===
=======================================================================


By: Shaun Clark (shaunc869) 2011-11-07 12:00:26.255-0600

Peer             User/ANR         Call ID          Format           Hold     Last Message    Expiry
67.231.4.195     +15033410103     654444383_33135  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15128329145     1090891562_5772  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     1611443936_1264  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     1611337330_1289  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     1527263115_1996  0x4 (ulaw)       No       Rx: INVITE                
67.231.8.195     +18186029669     1611341935_6192  0x4 (ulaw)       No       Rx: ACK                  
67.231.4.195     Restricted       654995102_80320  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     655270848_56222  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     654995244_24745  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     654881505_11134  0x4 (ulaw)       No       Rx: INVITE                
67.231.8.195     (None)           59c989ed0e721da  0x0 (nothing)    No       Init: OPTIONS            
67.231.8.195     (None)           6610e9311413b5d  0x0 (nothing)    No       Init: OPTIONS            
67.231.8.195     (None)           3095ea6d0caaac7  0x0 (nothing)    No       Init: OPTIONS            
67.231.8.195     +15033410103     1610828497_1229  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     655164292_12810  0x4 (ulaw)       No       Rx: INVITE                
67.231.8.195     (None)           4c561182062cd7e  0x0 (nothing)    No       Init: OPTIONS            
67.231.8.195     +15033410103     1611342003_7980  0x0 (nothing)    No       Rx: INVITE                
67.231.4.195     +15033410103     1527267806_3214  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     1611443887_1285  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     1610771542_1239  0x4 (ulaw)       No       Rx: INVITE                
67.231.4.195     +15033410103     1527263083_1164  0x4 (ulaw)       No       Rx: INVITE                
21 active SIP dialogs


By: Leif Madsen (lmadsen) 2011-11-10 14:42:28.545-0600

You're using res_timing_pthread, and is highly likely the cause of the problem. That timing module has an inherent probably where it "appears" things are deadlocking, when really it's just taking a very very long time. I suggest you switch to res_timing_dahdi.

By: Shaun Clark (shaunc869) 2011-11-10 15:43:01.272-0600

We removed the timing all together currently. Do we need to have timing?

As for our answer to this issue we found our database was locking up and causing the system in turn to lock up, so it may have been timing or it may have been the database, since we solved both problems in the course of debugging this we're nto sure the cause was/is.

This is all on an Amazon EC2 instance for reference with 100% SIP for our inbound/outbound telco. Thanks!