Summary: | ASTERISK-18487: Daily deadlock issue | ||||
Reporter: | Jason Legault (jlegault) | Labels: | |||
Date Opened: | 2011-09-08 13:41:26 | Date Closed: | 2011-09-15 10:46:41 | ||
Priority: | Critical | Regression? | No | ||
Status: | Closed/Complete | Components: | Applications/app_queue | ||
Versions: | 1.8.6.0 | Frequency of Occurrence | Frequent | ||
Related Issues: |
| ||||
Environment: | Linux Debian 2.6.26-2-amd64 x86_64 | Attachments: | ( 0) deadlock_gdb.txt | ||
Description: | I have a deadlock issue with asterisk 1.8.6.0 that happens once a day at peak times (150-200 calls or so being recorded) the problem has happened since 1.6.2.9 or so.. I've upgraded to every version since and they all have the same problem. RTP continues to work for existing calls but new calls can't be made. "netstat -anp | grep 5060" shows Recv-Q of 124680. I attached gdb to the running PID and did a "info thread" and "thread apply all bt". Output is attached. | ||||
Comments: | By: Paul Belanger (pabelanger) 2011-09-13 00:23:50.531-0500 We really need the output of 'core show locks' to help trace this down. By: Leif Madsen (lmadsen) 2011-09-13 11:15:20.296-0500 Requesting feedback from the reporter. By: Jason Legault (jlegault) 2011-09-13 12:15:27.728-0500 I enabled DEBUG_THREADS and the system resources were at capacity after reaching 30 concurrent calls. This system usually deadlocks at 200+ calls. I'm not sure what to try next. I tried the patch from issues ASTERISK-18101 and haven't had a deadlock yet. Do you think it could be the same issue? 210 active calls 16978 calls processed System uptime: 23 hours, 50 minutes, 43 seconds By: Gregory Hinton Nietsky (irroot) 2011-09-13 12:42:01.849-0500 Its quite likely this is related to 18101 Thread 390 (Thread 0x44fd3950 (LWP 7495)): #0 0x00007fa298926384 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007fa298921c0b in _L_lock_312 () from /lib/libpthread.so.0 #2 0x00007fa298921631 in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x00000000004df614 in __ast_pthread_mutex_lock (filename=0x591fa4 "astobj2.c", lineno=842, func=0x592150 "internal_ao2_iterator_next", mutex_name=0x59216b "a->c", t=0x300a290) at lock.c:244 #4 0x000000000044487c in __ao2_lock (user_data=0x300a2e8, file=0x591fa4 "astobj2.c", func=0x592150 "internal_ao2_iterator_next", line=842, var=0x59216b "a->c") at astobj2.c:157 #5 0x0000000000445e74 in internal_ao2_iterator_next (a=0x44fcaef0, q=0x44fcae80) at astobj2.c:842 #6 0x00000000004462bc in __ao2_iterator_next (a=0x44fcaef0) at astobj2.c:920 #7 0x00007fa279927440 in update_queue (q=0x3ca6698, member=0x5fbd248, callcompletedinsl=0, newtalktime=1554) at app_queue.c:4019 #8 0x00007fa27992c5b3 in try_calling (qe=0x44fcd1f0, options=0x44fcd12d "", announceoverride=0x44fcd12f "", url=0x44fcd12e "", Thread 187 (Thread 0x452bb950 (LWP 1341)): #0 0x00007fa298926384 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007fa298921c0b in _L_lock_312 () from /lib/libpthread.so.0 #2 0x00007fa298921631 in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x00000000004df614 in __ast_pthread_mutex_lock (filename=0x591fa4 "astobj2.c", lineno=842, func=0x592150 "internal_ao2_iterator_next", mutex_name=0x59216b "a->c", t=0x300a290) at lock.c:244 #4 0x000000000044487c in __ao2_lock (user_data=0x300a2e8, file=0x591fa4 "astobj2.c", func=0x592150 "internal_ao2_iterator_next", line=842, var=0x59216b "a->c") at astobj2.c:157 ---Type <return> to continue, or q <return> to quit--- #5 0x0000000000445e74 in internal_ao2_iterator_next (a=0x452b2ee0, q=0x452b2e70) at astobj2.c:842 #6 0x00000000004462bc in __ao2_iterator_next (a=0x452b2ee0) at astobj2.c:920 #7 0x00007fa279927440 in update_queue (q=0x3a9c688, member=0x7b2b858, callcompletedinsl=0, newtalktime=501) at app_queue.c:4019 By: Jason Legault (jlegault) 2011-09-13 14:16:27.327-0500 So far no dead lock after applying patch. By: Leif Madsen (lmadsen) 2011-09-13 15:41:44.005-0500 OK I'm actually going to mark this as a duplicate of ASTERISK-18101 then. |