Summary:ASTERISK-21859: Confbridge doesn't tear down an empty conference bridge when all users were kicked via end_marked=yes. Also, side effect crashes.
Reporter:Chris Gentle (gentlec)Labels:
Date Opened:2013-06-03 14:35:02Date Closed:2013-09-17 09:27:03
Versions:11.4.0 Frequency of
duplicatesASTERISK-22581 AMI: ConfbridgeList has race condition causing crashes
is duplicated byASTERISK-22454 Confbridge leaves channel and room open if hang up during name recording
is related toASTERISK-22740 [patch] - Confbridge fails to destroy conference on hangup leading to Asterisk segfault
Environment:Asterisk 11.4.0 on Raspberry Pi with 2-9-2013-debian-wheezy and 11.3.0 running on an Intel Atom w/ Ubuntu 12.04.Attachments:( 0) alsa.conf
( 1) backtrace.txt
( 2) confbridge.conf
( 3) conferences.txt
( 4) extensions.conf
( 5) full
( 6) full.txt
( 7) leader_connect.sh
( 8) leader_disconnect.sh
Description:Please reference:


I've set up a conference where I'm relying on end_marked=yes to kick all participants when the leader exits.  This results in an error as shown below:

<< Hangup on console >>
   -- <Bridge/0x2364be4-input> Playing 'confbridge-leave.slin' (language 'en')
   -- Stopped music on hold on SIP/gent_2880-00000002
   -- <SIP/gent_2880-00000002> Playing 'custom/thank-you.ulaw' (language 'en')
   -- Executing [1000@conferences:2] Hangup("SIP/gent_2880-00000002",
"") in new stack
== Spawn extension (conferences, 1000, 2) exited non-zero on

In this case, the conference leader was the alsa console channel.  Once it was hung up, a non-zero exit status caused the confbridge to go into a bad state showing 0 users:

confbridge list

Conference Bridge Name           Users  Marked Locked?
================================ ====== ====== ========
1000                                  0      0 unlocked

Asterisk has to be restarted to clear this.  In a normal exit, a "confbridge list" would not show conference 1000 because it would have been destroyed.

If no participants are dialed into the conference, everything closes cleanly when the leader exits.

*The same scenario can be reproduced with only two SIP channels, one marked and one not marked. See comments*
Comments:By: Rusty Newton (rnewton) 2013-06-04 19:06:19.230-0500

I'm unable to reproduce this as described.

I used your dialplan from the mailing list conversation and just default settings for the confbridge.conf other than setting end_marked=yes for the user definition.

Using your dialplan, my basic test was

* A user calls into 1000_admin
* B user calls into 1000
* Hangup user A
* Asterisk hangs up user B playing the appropriate messages due to end_marked=yes
* check "confbridge list" to see if the conference is still up

I tried with SIP, Console and Local channel technologies, swapping them around between user and admin. For the several permutations I tried, it worked fine. The conference closes cleanly each time.

If I read the mailing list conversation correctly, you were able to reproduce with SIP and IAX channel technologies correct? Did it matter which was admin or user?

By the way the "Spawn extension" ... "exited non-zero" line is normal and expected. This is not the cause of your error, but simply a debug message.

Please attach (more actions > attach files) your confbridge.conf file, relevant extensions.conf excerpt and step by step instructions for how to reproduce the issue. Include any relevant config files, notes on how you compiled Asterisk, etc.

By: Chris Gentle (gentlec) 2013-06-04 21:15:22.695-0500

Hey, thanks for your help on this Rusty.  Maybe I've got something else going on.  I'll attach all the relevant stuff.  As far as compilation, I build from source running the standard make menuselect, make, make install.  Nothing special.

There may be a better way to do this.  The conference leader is ALWAYS the alsa channel because I'm using the mic input to feed the conference.  Everyone else is muted and just listening.  To automate the entire process, I use two VERY SIMPLE shell scripts called leader_connect.sh and leader_disconnect.sh which will be called from cron.  When leader_connect.sh runs, the alsa channel connects to the conference as the leader and also starts the ices participant via chan_local.  Then, some time later, leader_disconnect.sh will run and this should stop the leader and kick everyone else to end the conference cleanly.

The following steps get the conference into a bad state for me requiring a restart of asterisk:

1.  Dial in from SIP or IAX and enter the conference.  Should hear moh until leader joins.
2.  confbridge list should show 1 user
3.  From a shell prompt, run leader_connect.sh.  This will connect the leader and feed the conference with the mic input.  The ices participant will also be connected via chan_local.
4.  Run confbridge list.  Should show 3 users.
5.  Run leader_disconnect.sh
6.  confbridge list now shows 1 active conference with 0 users.  If things had exited cleanly, the conference would have been destroyed and the list would be empty.  Conference appears to be in a bad state at this point.  Note the ices process has been stopped as expected.
7.  Run leader_connect.sh
8.  confbridge list shows 2 users (leader and ices).
9.  Run leader_disconnect.sh
10.  confbridge list shows 1 user, apparently the Local channel (ices) did not get kicked and the ices process is still running.
11.  Restart asterisk to clear.

By: Rusty Newton (rnewton) 2013-06-05 19:04:08.972-0500

Thanks for providing all of that and the scripts. I can't quickly test with chan_alsa at the moment, but this sounds like a bug that only occurs with chan_alsa as the conference leader.

Are you able to reproduce with any other channel technology as the marked user and chan_alsa not involved?

By: Chris Gentle (gentlec) 2013-06-06 17:09:04.526-0500

Yes, I am able to reproduce it without chan_alsa.

Simplifying things even further, I got rid of chan_alsa and chan_local.  I set up a simple [conferences] context and put in the 11.4.0 confbridge.conf sample file (with no edits).  I made two extensions to allow me to call into the conference from two different SIP phones:

 ; admin conference user
 exten => 600,1,Goto(conferences,1000,1)
 ; normal conference user
 exten => 601,1,Goto(conferences,1001,1)

See conferences.txt for my [conferences] context.

Dial extension 600 to add the conference leader.  Then dial 601 to add a normal participant.  Then hang up the leader.  The other participant will hear a message that they have been kicked but confbridge list will show that the conference has not been torn down.

Conference Bridge Name           Users  Marked Locked?
================================ ====== ====== ========
1000                                  0      0 unlocked

By: Rusty Newton (rnewton) 2013-06-25 16:43:17.942-0500

With the simpler scenario I was able to reproduce the confbridge failing to go away. Also I found that it consistently crashes when you try to "confbridge list ..." the confbridge that was not torn down.

Attaching full log with VERBOSE 5, DEBUG 5,(full.txt) and backtrace (backtrace.txt) from the crash.

By: Jonathan Gibert (JoKoT3) 2013-08-09 10:33:09.359-0500


I had this bug on 11.4.0 and I'm able to reproduce it with 11.5.0 too (but it does not crash asterisk anymore).

I think I have narrowed the root cause around the two following options :
- end_marked
- wait_marked

whenever BOTH options are activated, and with the following scenario, the bug described by Chris happens :
- enter room with normal user and marked user (order does not matter)
- leave the conference with marked user
- normal user is kicked out
- bug occurs

The bug does not occur when wait_marked is not enabled.
The bug does not occur when both options are activated and the normal user leaves first.
The bug happens with more than one normal user, and with more than one marked user.

By: Jeffrey C. Ollie (jcollie) 2013-09-03 11:36:48.048-0500

I'm seeing this as well, I doubly had the problem in that I was recording the conferences.  Since the confbridge never went away the recording never stopped and the disk eventually filled up.

By: Rusty Newton (rnewton) 2013-09-11 14:18:28.594-0500

Jeffrey's issue was also seen on duplicate ASTERISK-22454 by another user.