Summary:ASTERISK-04229: deadlock when using internal queues
Reporter:mleahy (mleahy)Labels:
Date Opened:2005-05-19 11:10:30Date Closed:2011-06-07 14:00:26
Versions:Frequency of
Environment:Attachments:( 0) screenlog.txt
Description:We have a dev. asterisk system setup so that each extension has a queue associated with it so that only one call is sent to the phone at a time. Several times we have done some stress testing simply calling back between phones internally and after about 10 - 15 minutes of testing with calls into the queues, transfers via sip phones and meetme conferences the system will deadlock. We are not able to find a certain activity that is causing the deadlock but we have been able to make it happen everytime we have tried. It just takes a little time performing normal call operations.


I know this is really generic and I apologize in advance for that, I have attached a backtrace from the latest deadlock. We have read the online docs for how to perform a backtrace but it didn't make sense as we are not real linux pros. If there are more logs that we can create that would be helpful we are more than willing to recreate the problem to make more logs.

The reasoning for the internal queues is because we are also buillding a desktop client to be used with the system, we would like a way that multiple calls can alert an extension via the client, but still have only one call at the phone itself. Using a queue for each extension made this possible.
Comments:By: Kevin P. Fleming (kpfleming) 2005-05-19 11:20:29

Your backtrace looks fine, you did a nice job :-)

Is there a reason why you are using chan_agent for single-agent queues? Why not just put the agent's phone directly into the queue? I'd like to see if you can reproduce this problem without using chan_agent.

I would also recommend not loading any modules that you do not need; some of them create extra threads that are just wasting resources on your system (chan_phone, for example). If you need help figuring out how to change your modules.conf file to only load the modules you need, get on the #asterisk IRC channel and someone will be able to help you.

By: mleahy (mleahy) 2005-05-19 11:44:09

The reason for using the agent channels is that it enforces only one call to the phone at a time. We are using Polycom SIP Phones that accomodate multiple call appearances, so if a SIP Phone is a member of a queue and there is more than one call in the queue two calls will alert the SIP phone because the channel will still allow more connections unlike the agent channel which stops with one. I have the configuration changed so that all the personal queues have members of SIP phones rather than agents and I will check if we can still cause the deadlock.

Also as far as the modules go we already have the modules.conf setup to be fairly empty. I found a page on the voip-info WIKI that showed how to skinny up the modules config file so it is running pretty slim.

By: Kevin P. Fleming (kpfleming) 2005-05-19 12:33:53

OK, yeah, that's a pretty heavy way to work around the lack of call-waiting control in the phones... You can accomplish a similar result by registering each phone into its queue using a Local channel, then in the dialplan context that you send them to, set a group on the channel then check the group count for that phone to see if it is more than one, and return 'busy' back to the queue. If the count is one, then deliver the call to the phone.

Rumor has it the Polycom 1.5.x firmware will also allow for call-waiting disable, but it's not currently available.

By: Kevin P. Fleming (kpfleming) 2005-05-19 12:35:52

I see now that the only extraneous channel module you have loaded is chan_phone, so you are right, you are already running pretty lean (but I'm sure you don't have chan_phone-supported hardware on your server <G>).

There are probably quite a few application modules you could avoid loading, but those won't have any effect on this problem.

By: mleahy (mleahy) 2005-05-19 13:01:44

After moving all the queue members to sip phones instead of using agents we were not able to recreate the deadlock, but as described before that did remove the capability to limit it to one call. I changed it now to use local channels as queue members and then at the local extension do a ChanIsAvail on that SIP channel with the 's' option to see if that SIP phone is free, if not return a busy. This method created the functionality that we needed, now we will try and see if using this method creates a deadlock.

I set chan_phone to no_load in the modules.conf to make sure that isn't a factor. In the event that we are NOT able to create a deadlock when using the local channels as apposed to using agent channels would you still like to pursue the deadlock originally discovered when we were using agent channels, or do you think that was just using that feature in a way it was not meant to be used?

By: Kevin P. Fleming (kpfleming) 2005-05-19 13:11:28

Yes, it still needs to be pursued, but there is at least one other open issue related to locking in chan_agent, so we may need to merge the issues together.

By: mleahy (mleahy) 2005-05-19 13:49:09

It appears that using Local channels for queue members will work, we were unable to cause the deadlock. If there is any more info needed that would be helpful in locating the issue with agent channels let us know and we will do what we can to help.

By: Kevin P. Fleming (kpfleming) 2005-05-19 14:15:53

Let's close this one, and open a new bug specifically for chan_agent entitled 'deadlocks under heavy load'. That should be adequate to describe what you were seeing, and we'll just have to figure out a way to get this arrangement set up on a lab system and replicate the problem.

If you have any testing scripts or other tools you were using to exercise the system and replicate the problem, please attach them to that new bug.