Summary:ASTERISK-19726: [patch][bug] ConfBridge - Users listening to MoH, and who should be muted, are often unmuted and recorded
Reporter:Neil Tallim (flan)Labels:
Date Opened:2012-04-13 18:11:06Date Closed:2012-10-08 08:48:27
Versions:SVN 10.0.1 10.1.0 10.1.1 10.1.2 10.1.3 10.2.0 10.2.1 10.3.0 10.4.0 10.5.0 Frequency of
Environment:Attachments:( 0) allow_forced_mute.diff
( 1) bugASTERISK-19562_ASTERISK-19726_ASTERISK-20181.patch
( 2) moh_crash_fix_-_apply_afterwards.patch
( 3) moh_mute.patch
( 4) moh_prompt_fix_-_apply_after_crash_fix.patch
( 5) muting.patch
Description:app_confbridge has something of a mind of its own when it comes to deciding when to mute users and when to put them on hold. It also has a tendency to not mute users while they're on hold.

While working on some other patches to correct this inconsistent behaviour (without negatively altering the user experience), my team discovered something shocking: we were being recorded while on-hold, before a conference started. Further testing showed that this happened while on-hold in a number of cases, and even when we thought we were muted. We did not observe this behaviour without recording because we had no users without WAITMARKED set in our testing scenarios.

Further exploration showed that this issue affects unpatched versions of ConfBridge, thus justifying why it has been submitted as a critical issue.

This patch works by introducing a "forced mute" flag on participants, which prevents app_confbridge's whimsical unmuting of participants who were explicitly muted via AMI/CLI and ensuring that all MoH is accompanied by appropriate muting logic.

IMPORTANT: This patch builds upon other patches, which must be applied first, for it to be effective. These patches introduce corrective behavioural changes, but should have no unintended side-effects and have been in production in my environment for weeks with no observed problems:
Comments:By: Neil Tallim (flan) 2012-04-13 18:17:07.926-0500

Typos, typos everywhere.

By: Neil Tallim (flan) 2012-04-24 17:07:41.552-0500

It was discovered that, since a more recent SVN checkout (the only thing that changed since our last test of the flow), calling ast_moh_start() while it was already active would lead to a segfault.

This patch addresses that issue, by ensuring that it cannot be enabled twice.

By: Neil Tallim (flan) 2012-04-27 17:26:17.450-0500

The last crash-fix patch mistakenly left participants listening to MoH while being unmuted when disabled. That was a silly oversight, considering the issue this patch is meant to address.

By: Neil Tallim (flan) 2012-04-27 19:30:11.125-0500

The previous patches didn't consider what would happen if a prompt were played while MoH state was being explicitly set. This one should ensure that MoH and muting behaviour is consistent, even in the face of interruptions.

By: Terry Wilson (twilson) 2012-06-06 14:32:15.860-0500

I'm currently going through the code and taking a look at the patch.

One thing that stands out, though, is that there seems to be a flawed assumption. Confbridge users are not ever put on hold. They are connected to the conference and sometimes are played music. It happens to be called "hold music" but that is just because it is the music's normal purpose.

I'm not completely convinced that playing music automatically infers that one should be muted. I can imagine scenarios where it would be quite handy to be able to record the audio that was going on while a MOH was being played. If users are supposed to be muted, then the 'startmuted' option should be set.

By: Neil Tallim (flan) 2012-06-06 18:03:41.526-0500

While there will certainly be cases where recording participants who believe themselves to be on-hold (which is the psychological implication of MoH) is necessary, I feel, rather strongly, that such cases will be the exception, rather than the rule.

For those cases, explicitly monitoring the interesting channels, rather than the entire bridge, will work with existing Asterisk resources (AMI, AGI, and CLI all provide methods for this, so it's not even anything exotic). Muting a bridge member has no impact on the channel itself.

By: Terry Wilson (twilson) 2012-06-06 18:12:43.842-0500

I have never assumed I was on hold when joining a conference. I always considered the music something more like "You're the only one here, since we know silence is boring...here is some music."

I would be vehemently opposed to changing this behavior in a release branch (it is a behavior change, not a bug) and moderately opposed to changing the behavior in trunk (as it removes functionality).

For me, it seems much better to use the existing startmuted option if you want people to actually be muted at start. :-) It would be possible to add another option for unmuting all members when the conference was "ready" as defined by there being other conversation partners, marked users, etc.

By: Neil Tallim (flan) 2012-06-06 19:45:11.630-0500

There's inconsistency in leaving it alone, though, since the music is often, but not always, accompanied by muting, and startmuted has another role to play, preventing it from being used as a factor in making this decision.

Conferences don't always end when people start leaving, but based on my testing, it looks like that was the assumption made when this application was being built, because  it works fine in that case, but becomes unpredictable upon deviation, hence why I consider it a bug.

By: Neil Tallim (flan) 2012-06-06 19:51:33.207-0500

Sorry if that came across as accusitive. I just wanted to make the point that it looks like what's in the release pipeline might have unmapped edge cases, which allows for the possibility that what was implemented is not what was intended.

By: Terry Wilson (twilson) 2012-06-12 13:55:53.745-0500

The biggest issue with the patches (here and elsewhere) is that there is no benefit to creating lots of different issues when the patches for them all depend on each other. It is nearly impossible to review any of these issues separately because none of the patches are separate.

I'm still not convinced that music implies muting. If a user has wait_marked=yes, is not marked, and there are no marked users in the conference then they will be muted because wait_marked does imply muting in this case. Whether music is playing at the time, though, is currently irrelevant.

I can see that unmuting a user that has specifically been muted via AMI/CLI because a marked user has joined might be considered improper, though. It's just very hard to look at pieces of functionality in the patches when they are dependent on a bunch of other patches.

Also, when fixing a patch, don't post a patch that is a patch that depends on the previous patch, just post a patch against the branch you are working on with all of the changes. That way no one has to download several different patches and apply them all in the right order.

I'm still looking at stuff and trying to get it all figured out.

By: Terry Wilson (twilson) 2012-06-12 16:57:43.374-0500

I am attaching a patch for just adding the ability to force mute without making the assumption that music == muted.

The main reason I'm adding the patch is because it will apply, by itself, without any other patches and it solves the main issue that I see that this issue brings up, namely that joins/leaves can affect administratively muted conference users when they shouldn't. Also, it makes sure to unmute non-WAITMARKED users when the last marked user is leaving. I'm not sure why everyone is muted automatically when that happens. It may be better to remove the loop marking everyone muted, and just specifically mute WAITMARKED users.

Neil Tallim: Does this look right to you? It is obviously very similar to what you were doing.

By: Neil Tallim (flan) 2012-06-13 15:59:43.696-0500

I'll need to do a search through the file to make sure none of the mute toggle-points were missed. (Your patch looks a bit light, but it's very possible my memory has just faded over the past month)

While this would work, for it to be usable as a means of safeguarding against people being recorded when MoH is active, AMI messages would need to be emitted when MoH is toggled on or off so an aware controller, which would need to be implemented externally (the system integrator's problem), could send a forced mute/unmute request in response.

Since AMI messages are, as far as I know, emitted serially, this shouldn't cause any sort of race-condition, but it would almost certainly require refactoring all of the MoH sections into a common function, which was something I planned to do in a later patch, but is not particularly trivial, because they are not all the same.

Perhaps a better approach would be to modify the patch I've submitted and make it reference a new user profile option, "moh_implies_mute" or something, defaulting to false, that toggles muting around MoH sections. I can probably find time to build that early next week, if you'd be interested in that spin on this issue.

If you're following the patches on JIRA, then those found at the two links in the description should be applied in the order listed (Jonathan Rose's "moh_everything_spaces_fix.diff" is the only one needed from the first link, and there's only one found in the second). If you'd prefer that I assemble a unified patch with all three together, I can submit that easily enough by applying all three in succession, then diffing against a clean copy of the code.

I do apologise for, and as an open library developer myself, feel the pain of, the complexity that having successively dependent patches introduces. As I mentioned to Matt Jordan in a Review Board comment, I'm used to working on smaller, faster-moving projects, where acceptance, rejection, or a modified merge happens in a short timeframe and it's easier to build another patch on the outcome of that. The need to advance a project at work, which began during pre-release stages of ConfBridge's life, didn't allow me much time to wait between changes. While I've kept business logic from creeping into Asterisk code (yay, abstraction), I wasn't able to wait for feedback before proceeding and found issues, like this one, late in the development cycle, after other enhancements were made and had influence. (I'd prefer to avoid just forking Asterisk for internal use, not only because that's bad due to backporting overhead, but also because I believe these changes are generic and of benefit to all potential users)

In keeping with my argumentative nature, I still see the vast majority of ConfBridge's potential userbase wanting muting by default and believe the only reason this hasn't become an issue is that, if everything progresses in the order of people join -> marked user leaves -> conference ends (implied by common config options, like 'end_marked', since it's the common case), no unusual behaviour is observed, and this patch wouldn't change that. Being the common case, and given that Digium doesn't have an infinite time-budget, I wouldn't be surprised if testing was focused there; it was the sole focus here for quite a while, due to constraints of our own.

By: Terry Wilson (twilson) 2012-06-13 17:47:43.091-0500

The whole point of my patch, though, is that I believe that it is the original design decision to *not* ensure that MoH and mute state are linked. I even verified this with the original author of app_confbridge. That is why the settings are separate.

The patch I added is specifically to avoid the problem of confbridge unmuting via automatic behavior a user who was muted via AMI/CLI.

To ensure that someone is not recorded when they aren't supposed to be, all that the external application has to do when using the patch is to use wait_marked (if the condition for muting is based on marked users being in the conference) and/or specifically mute them. With something similar to your AMI/CLI MOH patch, one would send AMI/CLI commands for both muting and moh when one wanted to do both. It would also be easy to add a MusicOnHold: yes line to the AMI mute/unmute commands to handle everything at once I suppose, though I'm not sure that is really any better than just sending both AMI commands.

With my patch, the only time I think that someone would be recorded when music is playing is if wait_marked=no and music_on_hold_when_empty=yes and start_muted=no. The obvious thing to do if one wanted no wait_marked=yes, but did want moh when empty and muting would be to do start_muted=no. It might be that if everyone leaves but a single user with wait_marked=no and moh=yes that recording would still be done, though.

I am not opposed to having an moh_implies_mute option and helper functions being used for starting/stopping moh. If you can work on such a thing, please try to make it a standalone patch, though. Since there is no guarantee that all dependent patches would be accepted, it makes it *a lot* easier to review and get things committed if they are completely independent of each other.

By: Terry Wilson (twilson) 2012-06-14 09:44:03.140-0500

Also, I wouldn't have any problem making the moh_implies_mute option default to no in 10, but default to yes in trunk.

By: Neil Tallim (flan) 2012-06-14 11:24:11.831-0500

I'm making time to work on this today, so, with luck, I'll have something ready for review soonish.

Whatever I submit will conflict with ASTERISK-19526, since they have to touch the same parts of the code. Resolution of conflicts will probably be clearer if this patch passes first, but that one's already in Review Board, so... whatever happens happens, I guess.

Per your comments, I'll hold off on revising ASTERISK-19526 until this patch has reached its final state, since its form could change (or not) depending on what happens here. Backporting diffs to that one will be the easier of the two merging tasks to happen along this path of modifications.

By: Terry Wilson (twilson) 2012-06-14 11:45:37.919-0500

Thanks! I appreciate all of your work on this.

By: Neil Tallim (flan) 2012-06-14 13:38:07.825-0500

Around line 1080, there's a pretty obvious flaw that I addressed in ASTERISK-19562, which is somewhat relevant to this patch. The problem is that, when the last marked user leaves, everyone is muted, including users who should probably be left alone (non-wait_marked, non-end_marked).

Ordinarily, I'd be inclined to leave this alone, since it's not this patch's responsibility, having nothing to do with MoH specifically, but you addressed it in your version (which I'm incorporating where it makes sense to do so), albeit with a retroactive correction, where my approach was to check conditions before acting. So, with that said, which approach makes more sense: apply your fix, then consider replacing it with mine (or leave yours in place if it's clearer) when restructuring things as the patches start to get merged or leave the problem untouched because it's a quasi-known issue at this point?

By: Terry Wilson (twilson) 2012-06-14 14:18:41.358-0500

I wasn't sure which way to go when I wrote the patch either. It *seems* like it makes the most sense to just mute only those who need muting. I didn't know if I was missing anything, so I just added the fix at the end.

Feel free to leave it out of this patch altogether and handle it elsewhere.

By: Neil Tallim (flan) 2012-06-14 19:00:43.950-0500

I'm not nearly as confident about the coverage and behaviour of this patch (muting.patch) as I am about the one submitted before, since that one had over a month of pre-release testing and has been in production for over a month, too, but based on tests (described below), it seems to mute in-step with MoH consistently and doesn't seem to have broken the old behaviour in any negative ways. I'm comfortable submitting it for scrutiny, but definitely think it needs practical evaluation.

One point that must be stressed is that this patch does not go out of its way to correct unrelated issues. For example, while the race condition that makes MoH play if a marked user joins while a participant is still listening to the "now placed into" message, which itself finishes while the marked user is listening to the same thing, is handled by unifying the operations into a single post-playback step, but many other cases in which MoH can be triggered unexpectedly are ignored, since they're out of scope.

Tests performed (but which should be repeated):
- moh_implies_mute = yes and no
-- One participant joins, then a marked user joins
-- Two participants join, then a marked user joins
-- One participant joins, then a marked user joins, then a participant joins
-- A marked user joins, then a participant joins
-- A marked user joins, then two participants join
-- Two non-wait_marked users join
-- One non-wait_marked user and one participant join
-- One non-wait_marked user and one moderator join
-- Three non-wait_marked users join, two leave, then they join again

In all cases, a recording was made and observed for correctness. More complex cases were not attempted because every meaningful MoH toggle (those surrounding prompts don't count for this patch) is connected to a mute-check, so how they're reached is ASTERISK-19562's problem.

If you'd be able to create a branch for evaluating complex changes, I'd like to merge all three patches and introduce the enhancements made in this patch and derived from review comments (moh_implies_mute; your central mute-forcing function; other things like those) and let that soak for a little while to see if there are any flaws or inconsistencies. The other patches I've submitted are, as I recall, far smaller in scope and side-effects (AMI extensions and new functions that use existing methods), so I'm pretty comfortable with just letting them progress independently.

By: Neil Tallim (flan) 2012-06-14 19:03:58.607-0500

I accidentally left an unused variable in the user_profile structure.

By: Matt Jordan (mjordan) 2012-10-08 08:21:16.202-0500

Final patch by Terry Wilson (bugASTERISK-19562_ASTERISK-19726_ASTERISK-20181.patch) from review board https://reviewboard.asterisk.org/r/2072/ attached.