ASTERISK-14541: [patch] Queue member considered available when paused, causing high weight queue to block low weight queue

[Home]

Summary: ASTERISK-14541: [patch] Queue member considered available when paused, causing high weight queue to block low weight queue

Reporter: David Woolley (davidw) Labels:

Date Opened: 2010-05-18 10:31:20 Date Closed: 2011-08-29 11:57:30

Priority: Minor Regression? No

Status: Closed/Complete Components: Applications/app_queue

Versions: Frequency of
Occurrence

Related
Issues:
is duplicated by ASTERISK-14961 [patch] Paused members in queue with higher weight

Environment: Attachments: ( 0) Issue17354-fix1.diff.txt
( 1) queuepause-CLI-scrape.txt
( 2) queuepause-configs.diff.txt
( 3) queuepause-full.log

Description: Issue ASTERISK-12504 addresses a problem in which a call ringing on a member on a high weight queue and who is also in a low weight queue blocks the low weight queue. I believe the fix for this is incomplete in that it fails to treat a paused member as being unavailable. This can happen even if only the one member is on both queues.

Scenario. Given a high weight queue and a low weight queue both with the same single member, and the member is paused on the high weight queue, present a call first to the high weight queue and then to the low weight queue.

Expect. The first call is stalled on the queue. The second call is answered.

Get. Both calls are stalled until the first call is abandoned.

If the roles of low and high weight queue are reversed, this works as expected.

****** ADDITIONAL INFORMATION ******

In the real world case, the queue members are agents, and we are using 1.6.1.0. However the relevant code seems unchanged between that and trunk, so I have used trunk for the controlled test. I have also used local channels running the echo application for the members, as it simplified the set up and didn't tie up more phones than I had.

I started with make samples and used one X-Lite, which could access anonymously, and one Polycom, which needed to register. I configured Queue1 as weight 0, Queue 2 as weight 1, both leastrecent strategy, both with:

member => Local/600@default,,Echo

as the only member (this is the echo test from the demo context).

I then did:

queue pause member Local/600@default queue Queue2

and made a call to Queue2, which stalled, followed by a call to Queue1 which also stalled, until I terminated the first call.

I then unpaused the member on Queue2 and paused them on Queue1, and made the calls with the queues switched. The second call was answered.

Three attachments to follow.

Comments: By: David Woolley (davidw) 2010-07-07 12:41:36

I've uploaded a patch against trunk to handle this problem. When it is checking for a member being able to take work in a higher priority queue, it now first tests whether it is paused in that queue, a relatively cheap test, then, after determining how many available agents there are in the queue, it discounts a conflict if there are none.

This is a belt and braces approach, as the only reason (except maybe a race) I can see for having the number of available members be zero, after we have established that the member is nominally available for the first queue and is also in the second queue, is that they are paused in the second queue.

If one really does go to the second test and the member really isn't available, but others are, and there is more work than members, the presumption is that those other members will take up work that is on offer until there are no available members left, and a retry on the first queue will then succeed.

This patch also applies to 1.6.2, but Local channels, which I used to simplify the test configuration, are not well behaved, as they have an invalid status. Or real world case is with agents.

Also, although it doesn't invalidate that the test shows a problem with the unpatched code, the test behaves more realistically if one sets ringinuse=no for both queues.
By: Atis Lezdins (atis) 2010-07-08 10:01:27

Duplicate of ASTERISK-14961

I just don't understand why you need extra variable.

Could You test my patch and report results?
By: David Woolley (davidw) 2010-07-08 10:21:44

It would have been nice if someone had spotted this was a duplicate before I got back to work out the patch - I left it for a while in the hope of not having to do the patch myself :-( ; it would have been even nicer if your patch had been committed.

As the comments say, I've used a belt and braces approach. Your patch covers the case where the member is paused. The extra logic in mine will handle the same case, but less efficiently, as it will count the number available. It will also handle any other case where the member is not actually considered available in q, even though I cannot think of any such case. I still kept the direct test for paused, as it is much more efficient.

The extra variable was to avoid calling a non-const function twice.

I have no doubt that your patch will work where the problem is that the member is paused, which is the only case that I know can definitely happen.
By: David Woolley (davidw) 2010-11-12 08:57:39.000-0600

Just a reminder, this duplicates ASTERISK-14961 (although it includes some additional defensive coding).
By: Terry Wilson (twilson) 2011-08-24 15:40:32.698-0500

I have verified that both Asterisk 1.8 and trunk work properly here. Created queue1, weight 1; and queue2, weight 0. Added SIP/6001 to both queue1 and queue2 and paused SIP/6001 in queue1. Made call to queue1, SIP/6001 did not ring. Made call to queue2, SIP/6001 correctly rang. This looks like it was fixed here:

If you look at the function num_available_members() you can see that the return value is only incremented if !mem->paused, so paused members should already be filtered out.

If you could verify by testing a checkout from Asterisk 1.8 branch or trunk, I would appreciate it. Otherwise, I'll just close this. Thanks.
By: Terry Wilson (twilson) 2011-08-29 11:57:30.473-0500

Appears to already have been fixed.