[Home]

Summary:ASTERISK-08764: Some of extensively used zaptel channels got blocked (become silent until asterisk restart)
Reporter:Anton Vazir (vazir)Labels:
Date Opened:2007-02-10 02:12:22.000-0600Date Closed:2011-06-07 14:00:27
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk_btinfo.22may2007.txt.bz2
( 1) astfulloutput.log.bz2
( 2) backtraces.bz2
Description:Channel bank T1 (24 FXS lines in use)
Intensively used channels are blocking once in a few days. Asterisk does not stop on "stop now" command - needs kill -9
Bug exist in 1.2.x & 1.4.0 including SVN - I'm experiencing that behaviour since starting using an asterisk.

Debug info, that I was able to get from output with all debug and lock detect options compiled on is included

****** ADDITIONAL INFORMATION ******

In the log output interested lines are the call made from Zap/24 to Zap/18

Zap/18 is the deadlocked channel, which responds as Busy until asterisk is restarted.
Zap/18 is the receptionist's extension, so most intensively used

Asterisk SVN No is: 53222
Zaptel Svn No is: 2087
Comments:By: Paul Cadach (pcadach) 2007-02-10 04:30:36.000-0600

Can you please produce log file around a time when receptionist's extension hangs?
core set debug 2 (or just set debug 2) will be a bit more helpful.

By: Serge Vecher (serge-v) 2007-02-10 18:54:26.000-0600

also, see please test the patch in 8957 and report results.

By: ewieling (ewieling) 2007-02-12 19:15:05.000-0600

As I understand it the patch in bug 8957 only applies to PRI.  We have experienced the issue in this bug in a non-PRI environment with channel banks.

By: Serge Vecher (serge-v) 2007-03-05 14:04:59.000-0600

ewieling: can you please provide the debug info requested by PCadach?. Use 1.2.16 for testing ...

By: ewieling (ewieling) 2007-03-05 16:54:33.000-0600

Unfortunately, I do not have access to a non-production system for testing.  Also this issue happens at random times for me.  Sometimes we can go weeks with no stuck channels, then have 15 stuck channels in 1 day.  Maybe vazir can test this, as he can reproduce this issue easily.

By: Serge Vecher (serge-v) 2007-03-26 12:44:03

vazir: what's the status?

By: Anton Vazir (vazir) 2007-03-27 10:44:32

The same, though after upgrade to 1.4.1 that happening a little less often. Digium support is useless at all. Seems they have to ask me "how can we help" from time to time, but never provided a resonable answer how and whom to contact at time when I get the issue (though I asked them twice), since I can't keep the system in the failed state for weeks, waiting their response.

By: Anton Vazir (vazir) 2007-05-03 22:29:01

New info:

Still locking channels. One of the channels continuously ringing, some works, some silent.

SVN revision: 62263

LOGS:

please see attached



By: Paul Cadach (pcadach) 2007-05-04 03:26:10

Anton, could you please show bt full for threads 7 (LWP 11568) and 6 (LWP 11589)? Looks like they are deadlocked, and I would like to see full call path for those threads.

By: Anton Vazir (vazir) 2007-05-04 05:05:48

I have to wait for deadlock to reappear - i had to kill asterisk to to restore it's workability. It will take 2-3 working days.

PS: Could you please give me a clue how to backtrace threads?

By: Anton Vazir (vazir) 2007-05-16 04:58:35

Please find the backtraces in the backtraces.bz2 - looks like there are more deadlocks - some in ZAP, some in IAX

By: Anton Vazir (vazir) 2007-05-18 07:47:41

[May 16 14:43:35] ERROR[32323] /usr/src/AST_SVN/asterisk-1.4/include/asterisk/lock.h: chan_zap.c line 4593 (zt_read): Deadlock? waited 5 sec for mutex '&p-
[May 16 14:43:35] ERROR[32323] /usr/src/AST_SVN/asterisk-1.4/include/asterisk/lock.h: chan_zap.c line 4578 (zt_exception): '&p->lock' was locked here.

and so on



By: Anton Vazir (vazir) 2007-05-22 04:08:06

New deadlock - I ve made fresh snapshot of backtrace. Please see asterisk_btinfo.22may2007.txt.bz2

By: Russell Bryant (russell) 2007-06-07 16:04:26

I have been trying to figure this out from the backtrace you have provided but have been having some trouble.  Would you mind letting me log in to your machine to look through the core dump with gdb?  Just make sure you have DONT_OPTIMIZE and DEBUG_THREADS enabled.  You can email me (russell@digium.com) or find me on IRC: russellb.

By: Anton Vazir (vazir) 2007-06-08 05:30:09

Russel,

there is no core dump, it's just deadlocking channels - so you will have to debug a live process. Another problem that it's a live office PBX and if that happens in working hours I have to restart * to restore functionality, plus we're in different timezones. I'll be trying to catch you in IRC to discuss possibilities!

By: Anton Vazir (vazir) 2007-06-08 05:36:22

BTW: is there a way to force * to dump the core for further debug?

By: Russell Bryant (russell) 2007-06-08 09:56:24

There is a script in the contrip/scripts directory called ast_grab_core.  That will grab a core dump from a running Asterisk process.  After running the script, you can restart asterisk.

By: Russell Bryant (russell) 2007-06-29 09:49:41

I'm suspending this issue due to a lack of feedback.  If you still have a problem and would like to further debug the issue, feel free to reopen this issue.  Thanks!

By: Anton Vazir (vazir) 2007-07-14 09:20:21

I'm planning to be back in the office in 1 week. I've already grabbed the core and will provide you with ssh access. Issue is critical.

By: Russell Bryant (russell) 2007-08-09 10:35:28

If you compile the latest version with DEBUG_THREADS, then the output of "core show locks" when it is locked up may be useful here.

By: Anton Vazir (vazir) 2007-08-09 22:23:18

Will do so today

By: Anton Vazir (vazir) 2007-08-13 02:27:39

Installing a fresh SVN to see if problem still persists.

By: Russell Bryant (russell) 2007-08-23 12:11:05

Feel free to repopen this issue when you can reproduce this and provide debug output ...