Summary:ASTERISK-18166: Deadlock: asterisk isn't responding to any sip package anymore
Reporter:Jacco van Tuijl (jacco)Labels:
Date Opened:2011-07-22 03:23:08Date Closed:2011-08-17 14:19:31
Versions: Frequency of
is duplicated byASTERISK-17775 Deadlock with Dahdi
must be merged before resolvingASTERISK-18300 Merge: ASTERISK 18166
is related toASTERISK-18142 unresponsive to sip requests
Environment:Attachments:( 0) astdebug_2011.07.17_1333.log
( 1) bt2011-07-28T221834+0200.txt
( 2) tshark-log_2011.07.17_1333.rar
Description:the asterisk process is still running. asterisk CLI is still responding to comands.
asterisk isn't responding to any sip package anymore.

Comments:By: Jacco van Tuijl (jacco) 2011-07-22 03:25:56.059-0500

this file contains:
gdb, netstat,asterisk log,top,process list, df, monit log

By: Jacco van Tuijl (jacco) 2011-07-22 03:31:35.221-0500

this is a wireshark trace (look at the end and see how asterisk is not responding any more)

By: Gregory Hinton Nietsky (irroot) 2011-07-22 05:11:23.340-0500

Ive looked at the BT there is no "core show locks" or a "apply thread all bt full" so not easy to see but it does looke possibly like statechange issue see reviewboard 1313

By: Ole Kaas (ole.kaas) 2011-07-24 04:52:15.375-0500

This could be the same as bug 18142

By: Leif Madsen (lmadsen) 2011-07-26 09:53:43.635-0500

We'll definitely need a 'core show locks' to move this forward. Please supply.

By: Leif Madsen (lmadsen) 2011-07-26 09:56:17.377-0500

This is likely a res_timing_timerfd issue, so the work around for now is to use res_timing_dahdi for now. Hopefully we'll get res_timing_timerfd fixed up here shortly.

By: caspy (caspy) 2011-07-27 07:48:14.943-0500

i'm getting a deadlock like this with res_timing_dahdi.
next lock i'll try to suply bt.

By: Ole Kaas (ole.kaas) 2011-07-28 15:59:53.922-0500

After adding "noload => res_timing_timerfd.so" asterisk now uses dahdi for timing. No crash/deadlock for almost 2 days until now. This seems to be another issue though. BT attached (asterisk compiled with all the debug stuff).

EDIT: My memory is a bit vague about this right now, but to clarify. Asterisk failed to respond to sip requests as before, but "core show channels" reported active calls. No time to verify if there were rtp streams - the process was killed with -11 to have a core dump to make a backtrace from. I was hoping the bt could reveal somthing.

EDIT2: I suspect this to be a "full reload under load" issue. This server is not quite as busy as the other server where I've posted a backtrace (bug 18142 backtrace2.txt) - so the deadlock is "unreliable" on this server.

By: Leif Madsen (lmadsen) 2011-08-05 15:38:09.827-0500

I don't understand. If there is no crash/deadlock after using res_timing_dahdi how is there a backtrace being provided? What is the issue?

By: Leif Madsen (lmadsen) 2011-08-11 13:22:21.204-0500

Assigned to reported for feedback.

By: Jacco van Tuijl (jacco) 2011-08-16 07:28:22.086-0500

After deleting  res_timing_timerfd.so from disk I've have had no problems with astrisk no longer reponding to sip packages

By: Terry Wilson (twilson) 2011-08-17 14:19:31.688-0500

Fix committed to 1.8 r332320