Summary: | ASTERISK-18835: res_monitor causing deadlock with no calls coming through | ||
Reporter: | Shaun Clark (shaunc869) | Labels: | |
Date Opened: | 2011-11-07 11:09:28.000-0600 | Date Closed: | 2011-11-29 14:05:32.000-0600 |
Priority: | Critical | Regression? | |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | 1.8.7.0 | Frequency of Occurrence | Frequent |
Related Issues: | |||
Environment: | Attachments: | ( 0) backtrace-threads.txt ( 1) core-show-locks.txt | |
Description: | Randomly it seems our Asterisk server stops taking calls, sometimes for a few minutes and sometimes for hours. It seems that we get a lock in the system around restart_monitor(). Everything holds around this lock and sometimes it clears itself, sometimes even after trying to restart it a few times it doesn't seem to clear itself. We're not sure what's causing it. We've run the show locks commands and the backtrace (see attached files). | ||
Comments: | By: Shaun Clark (shaunc869) 2011-11-07 11:23:35.703-0600 ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 0x2b93d464b950 (do_monitor started at [25233] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 24737 handle_request_do &netlock 0x2b93b5258fa0 (1) asterisk(ast_bt_get_addresses+0x1a) [0x4edc70] asterisk(__ast_pthread_mutex_lock+0xd4) [0x4e695e] /usr/lib/asterisk/modules/chan_sip.so [0x2b93b500b224] /usr/lib/asterisk/modules/chan_sip.so [0x2b93b500b019] asterisk(ast_io_wait+0x1ba) [0x4e0b14] /usr/lib/asterisk/modules/chan_sip.so [0x2b93b500cb62] asterisk [0x57042b] /lib/libpthread.so.0 [0x2b93a19333ba] /lib/libc.so.6(clone+0x6d) [0x2b93a113e02d] === ---> Lock #1 (chan_sip.c): MUTEX 7763 sip_pvt_lock_full pvt 0x1268460 (1) asterisk(ast_bt_get_addresses+0x1a) [0x4edc70] asterisk(__ast_pthread_mutex_lock+0xd4) [0x4e695e] asterisk(__ao2_lock+0x53) [0x446064] /usr/lib/asterisk/modules/chan_sip.so [0x2b93b4fb2446] /usr/lib/asterisk/modules/chan_sip.so [0x2b93b500b300] /usr/lib/asterisk/modules/chan_sip.so [0x2b93b500b019] asterisk(ast_io_wait+0x1ba) [0x4e0b14] /usr/lib/asterisk/modules/chan_sip.so [0x2b93b500cb62] asterisk [0x57042b] /lib/libpthread.so.0 [0x2b93a19333ba] /lib/libc.so.6(clone+0x6d) [0x2b93a113e02d] === ------------------------------------------------------------------- === ======================================================================= By: Shaun Clark (shaunc869) 2011-11-07 12:00:26.255-0600 Peer User/ANR Call ID Format Hold Last Message Expiry 67.231.4.195 +15033410103 654444383_33135 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15128329145 1090891562_5772 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 1611443936_1264 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 1611337330_1289 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 1527263115_1996 0x4 (ulaw) No Rx: INVITE 67.231.8.195 +18186029669 1611341935_6192 0x4 (ulaw) No Rx: ACK 67.231.4.195 Restricted 654995102_80320 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 655270848_56222 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 654995244_24745 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 654881505_11134 0x4 (ulaw) No Rx: INVITE 67.231.8.195 (None) 59c989ed0e721da 0x0 (nothing) No Init: OPTIONS 67.231.8.195 (None) 6610e9311413b5d 0x0 (nothing) No Init: OPTIONS 67.231.8.195 (None) 3095ea6d0caaac7 0x0 (nothing) No Init: OPTIONS 67.231.8.195 +15033410103 1610828497_1229 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 655164292_12810 0x4 (ulaw) No Rx: INVITE 67.231.8.195 (None) 4c561182062cd7e 0x0 (nothing) No Init: OPTIONS 67.231.8.195 +15033410103 1611342003_7980 0x0 (nothing) No Rx: INVITE 67.231.4.195 +15033410103 1527267806_3214 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 1611443887_1285 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 1610771542_1239 0x4 (ulaw) No Rx: INVITE 67.231.4.195 +15033410103 1527263083_1164 0x4 (ulaw) No Rx: INVITE 21 active SIP dialogs By: Leif Madsen (lmadsen) 2011-11-10 14:42:28.545-0600 You're using res_timing_pthread, and is highly likely the cause of the problem. That timing module has an inherent probably where it "appears" things are deadlocking, when really it's just taking a very very long time. I suggest you switch to res_timing_dahdi. By: Shaun Clark (shaunc869) 2011-11-10 15:43:01.272-0600 We removed the timing all together currently. Do we need to have timing? As for our answer to this issue we found our database was locking up and causing the system in turn to lock up, so it may have been timing or it may have been the database, since we solved both problems in the course of debugging this we're nto sure the cause was/is. This is all on an Amazon EC2 instance for reference with 100% SIP for our inbound/outbound telco. Thanks! |