[Home]

Summary:ASTERISK-07815: Zap channels locked when Agents are involved
Reporter:Andres Junge (ajunge)Labels:
Date Opened:2006-09-24 23:33:12Date Closed:2007-03-05 13:50:10.000-0600
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt
( 1) bt.sep.25.15.43
( 2) bt.sep.28.11
( 3) extensions.conf
( 4) full.sep.25.15.30-59.gz
( 5) WARNING[5095]_chan_zap.c_Ring_requested_on_channel_1_already_in_use_on_span_1.__Hanging_up_owner._Asterisk_1.2.5_09-11-06.zip
Description:The configuration is the following. We have small call-center with queues and agents. The agents use SIP softphones (SJ_Phone) and they log dynamicaly to the queue with AddQueueMember. When they answer the call almost all the times they call to another outside number and then they transfer the call to that number (Zap to Zap bridge). Everything is being monitored.

The problem is that after some calls, some Zap channels get blocked. I have gdb the process and looks like they are mutex_locked. The problem with this is that the telco thinks the channel is avalilable and send a new call through it, but asterisk thinks that the channel is still in use and rejects the call as I see in the logs:

WARNING[4443]: Ring requested on channel 0/3 already in use on span 1. Hanging up owner.

The only way to unlock the channels is to restart asterisk.
Comments:By: Andres Junge (ajunge) 2006-09-25 01:36:08

I have detected some pattern.

Everytime I get this two lines:

channel.c: Generator got voice, switching to phase locked mode
channel.c: Scheduling timer at 0 sample intervals

After that I see no activity on the thread, and the Zap channel that the thread was using, gets locked.

By: Andres Junge (ajunge) 2006-09-25 01:45:44

in channel.c:1725

Why ast_settimeout does not lock-unlock the channel?

By: seb7 (seb7) 2006-09-25 07:04:08

We had the same problem on various versions of Asterisk / Zaptel from 1.0.x through to 1.2.x when there was a heavy load of inbound calls. The call centre we were maintaining no longer has a heavy load of calls, so we don't get the problem. We have also had this on servers that do not use AddQueueMember.

By: Andres Junge (ajunge) 2006-09-25 08:26:44

Seb7, did you find a workaround for this?

By: seb7 (seb7) 2006-09-25 08:34:25

Er, well the two workarounds we had were:-
(a) for the call centre: to put divert on busy and fail to their secondary asterisk server, except for our test number, and then when our alarm went off, to kill -9 safe_asterisk and asterisk (since stop now would put it into a worse state where divert on busy/fail wouldn't work), stop zaptel, restart zaptel and asterisk.
(b) for the asterisk servers connected to our switches: block out the individual b-channels on our switch so that calls are not sent to asterisk on these channels, and then restart asterisk using method (a) at a later time when there are no calls up on it.

So probably not much help! I haven't looked at this recently though.

By: Serge Vecher (serge-v) 2006-09-25 09:30:01

hmmm, this looks like a redux of bug 6997, which I've recently closed on a presumption it was fixed after one report. Caspy has narrowed the issue down there pretty good, can you please review the note 0048116 and see if that's the cause in your case too?

Also, you didn't report the specific version of 1.2 branch used. Make sure you are testing on 1.2.12.1

By: Andres Junge (ajunge) 2006-09-25 10:58:13

I have seen this before and I think is different. The problem persist. I'm working with svn branch 1.2 and testing with revision 43618

By: Serge Vecher (serge-v) 2006-09-25 11:13:39

ajunge: ok, I won't reopen or relate to 6997 then, unless there is a concrete link established. The next question is, are these backtraces from asterisk built with 'make don-optimize'? If they are not, asterisk will need to be rebuilt and new bt's produced.

By: Andres Junge (ajunge) 2006-09-25 11:17:28

mmm. not. I will rebuild with "make dont-optimize" and send the bt.

By: Serge Vecher (serge-v) 2006-09-25 11:29:25

also, I've briefly looked at the log files, and it looks like you've posted the 'debug' file... Can you please enable the 'debug' output for console in logger.conf and post that instead? (it contains much more information)... I'm wandering what the deal is with all the Call-Ids not being matched ...

By: Andres Junge (ajunge) 2006-09-25 11:31:53

you mean this:

console => notice,warning,error,debug

By: Serge Vecher (serge-v) 2006-09-25 13:50:12

exactamundo

By: Andres Junge (ajunge) 2006-09-25 15:15:58

Ok, just uploaded fresh new full logs and bt made with "make dont-optimize" asterisk

By: Serge Vecher (serge-v) 2006-09-25 15:20:59

ok, the logs are not from the console :(

By: Andres Junge (ajunge) 2006-09-25 18:50:05

how do I capture the console to a file?

By: seb7 (seb7) 2006-09-26 05:07:15

Logging console output:

Well, if you use PuTTY (a Windows SSH client) to connect to your asterisk machine, you can set it to log all printable output, and then connect to the asterisk console.

There's probably a better way though!

By: Serge Vecher (serge-v) 2006-09-26 08:44:52

I prefer piping output to 'tee', which allows interactivity on the console ...
asterisk -Tvvvvvdddddgc | tee /tmp/bug8027debug.txt

By: Andres Junge (ajunge) 2006-09-27 01:46:44

finally i'm running "script", when I get the logs i'll post it here.

By: Andres Junge (ajunge) 2006-09-28 10:12:40

Ok. I have the log files, but I can't uploaded here. So I put it in:

http://ayke.totexa.cl/ast-problem/console.sep.28.8-11.gz
http://ayke.totexa.cl/ast-problem/full.sep.28.8-11.gz
http://ayke.totexa.cl/ast-problem/bt.sep.28.11

By: Serge Vecher (serge-v) 2006-09-28 10:17:51

what modifications have you done to the sources?

By: Andres Junge (ajunge) 2006-09-28 10:19:20

none

By: BJ Weschke (bweschke) 2006-09-28 11:08:17

How are you doing the transfers? With the atxfer or other features.conf option or with the SIP transfer functionality?

By: Serge Vecher (serge-v) 2006-09-28 11:16:30

the debug you've posted indicates there were changes made to the sources. Please delete the Asterisk/zaptel/libpri sources and reinstall Asterisk from clean tarballs.

By: Andres Junge (ajunge) 2006-09-28 13:34:55

We do transfers with SIP transfer

By: advanceddataservices (advanceddataservices) 2006-10-06 11:46:21

We're having the same trouble when transfering a call from a zap channel to a zap channel. If you hang up one line, you get DT on the other though the zap bridge. This is a problem when our clients transf incomming calls to their cell phones, 2 lines never hang up when the call is complete.

By: Ketema Harris (ketema) 2006-10-09 21:29:07

I would like to help/assist with this problem in any way necessary.  This issue is happening frequently on my servers when we have heaviest call volume.  I can provide debug log from asterisk, detailed descriptions of how our call center is using the software, softclient debug log(from xmeeting and others), anything to help.  Attached is the debug log from today up to about 10 pm est.  we noticed the Zap channel being locked around 2 pm est (14:00).  Todays debug log:

http://www.ketema.net/10_9_2006_debug.tar.gz



By: advanceddataservices (advanceddataservices) 2006-10-11 10:44:37

Does anybody know if there is a previous version of zaptel that we all could roll back to, and still be compitable with the latest and greatest of everything else?

By: seb7 (seb7) 2006-10-11 11:23:21

Well given that this problem seems to have been around for years, I'd say the answer would be 'No', even if the problem is in the zaptel drivers. I think it is not a problem in the zaptel drivers though - it is an issue in the chan_zap or something similar in asterisk (maybe libpri I suppose - is everyone here using PRI? We were). So perhaps this bug is mis-filed...



By: Mike Purvis (nrw) 2006-10-18 13:00:31

I know that more debug information will be required before this is linked specifically to issue 6997, however I can not duplicate it to get debug information on our development system, it only happens under load on our production system during sip transfers in the exact same manner that occured in bug id 6997 once we upgraded from 1.2.7.1 to 1.2.12.1 after 6997 was closed. Due to the way it brings thing to a screeching halt, we had to revert back.



By: Serge Vecher (serge-v) 2006-10-18 15:13:33

anybody try 1.4-beta3?

By: Ahsan Masood (ahsan) 2006-11-09 05:24:10.000-0600

hi nrw. Just to be clear can you confirm that you got the issue in 1.2.12.1 in production and that you consequently had to move back to 1.2.7.1

By: seb7 (seb7) 2006-11-09 13:23:44.000-0600

I have uploaded a log with PRI intense debugging on from yesterday. I had this logging on to detect a problem with the PRI flapping (dropping briefly), so the logging was switched on before, during and after the situation started. Unfortunately, though, I didn't have Asterisk core debug level logging on. It is on Asterisk 1.2.5, but I don't think that should make a lot of difference in tracking down the cause of the issue, if there is anything useful in this log.

Also this bug is misfiled - I believe it should be filed under Channel Drivers, chan_zap, as the issue appears to be in chan_zap or at least within Asterisk and not in the Zaptel drivers.

By: Mike Purvis (nrw) 2006-11-10 17:34:33.000-0600

Yes when we upgraded to 1.2.12.1 on our production pbx we got the same errors that occurred in bug 6997.

It is entirely possible and I am willing to admit that it may not be specifically bug 6997 involved, but the end result was exactly the same.

I wish I could duplicate it on the dev box, but the dev box only has 4 sip phones, and 3 sip atas attached to it, no zap devices and it seems to be triggered most often under load.

If this bug is stuck waiting for debug information I will bite the bullet and let our call center take one for the team so to speak by moving over to 1.2.12.1 for the last half of a friday or something when we do not tend to be busy to get relevant debug information to move this further.

By: seb7 (seb7) 2006-11-14 10:30:41.000-0600

Ah, I see there was quite a discussion on this issue already in:
http://bugs.digium.com/view.php?id=6147



By: Serge Vecher (serge-v) 2007-01-09 12:55:13.000-0600

is this reproducible with the latest 1.2 release?

By: Henning Holtschneider (hehol) 2007-02-08 02:43:53.000-0600

I cannot reproduce this with Asterisk 1.2.14

By: Serge Vecher (serge-v) 2007-02-08 09:31:09.000-0600

In what I believe is a related issue (same messages at least), JerJer was able to narrow down a bit more.

By: Serge Vecher (serge-v) 2007-02-10 18:57:11.000-0600

and a patch has been posted.

By: Serge Vecher (serge-v) 2007-03-05 13:50:06.000-0600

feel free to reopen if this bug is not fixed in 1.2.16