[Home]

Summary:ASTERISK-18302: System Deadlock, No calls inbound or outbound
Reporter:Justin Phelps (ecnf)Labels:
Date Opened:2011-08-20 03:37:28Date Closed:2011-08-30 11:08:58
Priority:MajorRegression?
Status:Closed/CompleteComponents:
Versions:1.8.5.0 Frequency of
Occurrence
One Time
Related
Issues:
Environment:Debian 6.0.2 Dahdi Complete 2.5.0+2.5.0 Libpri 1.4.12 FreePBX 2.9Attachments:( 0) backtrace.txt
( 1) core-show-locks.txt
( 2) full.log
Description:Creating this issue via the discussion on [ASTERISK-18154|https://issues.asterisk.org/jira/browse/ASTERISK-18154]. Some history on that ticket might be appropriate.


I just had a solid lockup on the server. The CLI wasn't responding to any commands.
pri show channels showed channel 1 in Progress with an empty trailing field. Like this:

{noformat}
PRI       B    Chan Call       PRI  Channel
Span Chan Chan Idle Level      Call Name
  1    1 Yes  No   Proceeding Yes  
  1    2 Yes  No   Connect    Yes  DAHDI/i1/8506749597-b
{noformat}

I had to kill Asterisk and restart it to get calls working again.

Richard Mudgett has suggested the following:

Debugging deadlocks: Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install). This will then give you the console command "core show locks." When the symptoms of the deadlock present themselves again, please provide output of the deadlock via:

{noformat}
asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt
gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt
gdb> bt
gdb> bt full
gdb> thread apply all bt
Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks!
{noformat}

I have created a new ticket per Richard's request.
Comments:By: Justin Phelps (ecnf) 2011-08-20 03:38:56.131-0500

Tail of /var/log/asterisk/full
Last 10,000 lines right after lockup.

By: Justin Phelps (ecnf) 2011-08-20 03:41:42.734-0500

I'm waiting for the system to deadlock again so I can collect the backtrace.

By: Justin Phelps (ecnf) 2011-08-23 16:43:56.206-0500

Backtrace would not accept the PID of asterisk.

By: Richard Mudgett (rmudgett) 2011-08-23 17:31:19.914-0500

The deadlock looks like the timerfd problem which was recently fixed with v1.8 commit -r332320.

If you are using timerfd, you can either switch to dahdi timing or apply the change from the revision mentioned above.

By: Justin Phelps (ecnf) 2011-08-23 17:44:36.758-0500

I entered the following into /etc/asterisk/modules.conf

{code}
; Don't load res_timing_timerfd.so per https://issues.asterisk.org/jira/browse/ASTERISK-18302
; Testing to see if deadlocks are a timerfd issue.
noload => res_timing_timerfd.so
{code}

By: Justin Phelps (ecnf) 2011-08-23 17:45:22.477-0500

And restarted asterisk. show module confirms timerfd isn't loaded, and dahdi is the current timing source. I will continue monitoring for the next few days.

By: Justin Phelps (ecnf) 2011-08-30 11:08:52.367-0500

I haven't seen a deadlock since changing the timing source.

It seems that solved the problem. Thank you and I hope you enjoyed the cookies :)

By: Richard Mudgett (rmudgett) 2011-08-30 11:20:23.507-0500

Yes, they were good.  Thank you.