[Home]

Summary:ASTERISK-07510: Redirecting Local channels to Meetme causes a crash when executing "core show channels"
Reporter:James Terhune (bigjimmy)Labels:
Date Opened:2006-08-10 14:42:39Date Closed:2007-10-01 16:15:01
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_local
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 2007-03-01-bt.txt
( 1) 2007-03-01-bt-full.zip
( 2) bt-2007-03-02-1.txt
( 3) full-2007-03-02-1.bz2
( 4) howto_reproduce_7706.txt
( 5) locks-1.4-82325.txt
( 6) run1-console.txt.gz
Description:We have an Agent channel in a conversation with a local channel.  Using the Redirect command from the manager, we take the local channel and send it to a meetme app.  We then initate 2 calls with the Originate command which adds the agent and the 3rd party to the conference.  We don't use the ExtraChannel parameter of the Redirect command as we need to set different options for the local channel.

Once the agent wishes to drop out of the call, they hit '*' to disconnect.  They don't hear any hold music and the agent channel is no longer able to receive calls.  Having a call sent to the agent causes asterisk to deadlock almost completely.

This is reproducable in both versions 1.2.10 and trunk, however the debug log is from trunk with DEBUG_CHANNEL_LOCKS, DEBUG_THREADS, DETECT_DEADLOCKS and DONT_OPTIMIZE being set.

****** ADDITIONAL INFORMATION ******

For the test, I have the Zap/g1 PRI looped into span 4, which calls the testin context.

The manager commands used were:

Make the initial call:
Action: Originate
Channel: Local/$leadid\@outbounddirect/n
MaxRetries: 0
Timeout: 18000
Async: True
Context: sipin
Variable: DIALNUMB=$phone
Variable: DEST=$stnid
Variable: Campid=$unid
Exten: $ext
Priority: 1

Make the conference:
Action: Redirect
Channel: $otherend
Exten: c$confno
Context: vericonf
Priority: 1

Action: Originate
Channel: Local/$outbound\@verifyout
MaxRetries: 0
Timeout: 36000
Async: True
Callerid: Verification Call <2345679876>
Context: vericonf
Exten: v$confno
Priority: 1

Action: Originate
Channel: Agent/$stnid
MaxRetries: 0
Timeout: 36000
Async: True
Callerid: Verification Call <2345679876>
Context: vericonf
Exten: a$confno
Priority: 1

Relevant portions of the dial plan:

[outbounddirect]

exten => _X.,1,Answer()
exten => _X.,n,Wait(.5)
exten => _X.,n,Dial(Zap/g1/${DIALNUMB}|18|M(connect^${EXTEN}^${DEST}))
exten => _X.,n,Wait(.5)
exten => _X.,n,Goto(f${HANGUPCAUSE},1)
exten => _X.,n,Hangup()

exten => f17,1,Playtones(busy)
exten => f17,n,Wait(2)
exten => f17,n,Hangup()
exten => f1,1,Playtones(info)
exten => f1,n,Wait(2)
exten => f1,n,Hangup()
exten => f0,1,Playback(number-not-answering,noanswer)
exten => f0,n,Set(PRI_CAUSE=129)
exten => f0,n,Hangup()
exten => _fX.,1,Playtones(congestion)
exten => _fX.,n,Wait(2)
exten => _fX.,n,Hangup()

exten => failed,1,Hangup()

[macro-connect]

exten => s,1,UserEvent(CONNECTMADE|AppData: ${ARG1}^${ARG2})
[vericonf]
exten => _aX.,1,MeetMe(${EXTEN:1},qdxA)
exten => _cX.,1,MeetMe(${EXTEN:1},qdx)
exten => _vX.,1,MeetMe(${EXTEN:1},qdxA)

[verifyout]

exten => 92,1,Dial(Zap/51)
exten => _X.,1,Dial(Zap/g1/${DIALNUMB})
exten => _X.,n,Hangup()


[agentdirect]
exten => _X.,1,Dial(Agent/${EXTEN})


[testin]
exten => _X.,1,Ringing()
exten => _X.,2,Wait(6)
exten => _X.,3,Playback(vm-options)
exten => _X.,4,Goto(3)

Comments:By: Joshua C. Colp (jcolp) 2006-08-16 12:45:57

Can you explain your test scenario a bit more? and as well - what type of agents, and is it only when you use agents?

By: James Terhune (bigjimmy) 2006-08-16 14:26:50

It is using regular agents that are defined in /etc/asterisk/agents.conf, they are logged into through the dialplan.

The scenario:
A call is connected between an agent and a local channel.  This local channel is basically just a Dial command that dials over a Zap channel.  This is used in an outbound telemarketing application. (this is the outbounddirect context).  The other end of the Originate goes to the [sipin] context, which I forgot to include above:
[sipin]
exten => 5001,1,Dial(Agent/1)

The agent then needs to join a third party to the call.  Their agent screens call a manager interface which issues the redirect and 2 originates seen in the "Additional Information" section above.  There is a 1 second sleep between each originate.

Once the conversation has finished, all parties will disconnect, the order in which they disconnect doesn't appear to matter in this case.

The agent will then attempt to proceed to the next call.  At this point Asterisk deadlocks to the point that it's unusable.

By: Joshua C. Colp (jcolp) 2006-08-16 14:31:02

There's two types of agents: regular and callback. one uses AgentLogin to log in, and the other AgentCallbackLogin. Which do you use? (I want to get this picture clear, as callback agents have been known to cause weird issues)

By: James Terhune (bigjimmy) 2006-08-16 14:47:35

Sorry, it's the regular agent (which uses AgentLogin), not the callback kind.

By: jmls (jmls) 2006-10-31 11:21:47.000-0600

there have been some (!) changes to trunk over the past few months. Can you confirm that this is still a problem ?

By: James Terhune (bigjimmy) 2006-10-31 12:14:35.000-0600

It's still a problem with SVN rev#46695.

By: James Terhune (bigjimmy) 2006-11-09 11:46:20.000-0600

I attempted to find a work-around for this so I removed all Local/ channels and just called on the Zap channels directly.  The lock up still happened, which probably means that it doesn't have anything to do with the Local channels.  I am suspecting a problem with the AMI Redirect command.

By: Tony Mountifield (softins) 2006-11-15 08:43:17.000-0600

If you do Redirect on just one leg of a call, you will lose the other leg, and I suspect you can't just do an Originate on an Agent channel.

Set up your dialplan so that each leg of the call has a different value in a particular channel variable (so that you can tell them apart), and then in your Redirect, use both Channel and ExtraChannel to direct both legs to the same place. At this place, have a GotoIf based on the channel variable to steer each leg to a different part of the dialplan.

By: James Terhune (bigjimmy) 2006-11-15 09:16:50.000-0600

I actually did just that (redirecting both legs) and got the same results (the conference worked, but asterisk deadlocked on the next call to the agent).

I did find a workaround though, if I redirect the physical channel that the agent is logged in from (in this case Zap/49) into the MeetMe and then when they have finished, I redirect that channel back to AgentLogin.  This seems to work as it destroys the Agent/ channel.  I am now wondering if the problem is with chan_agent?

By: Bhishm (shonki) 2006-11-16 01:23:41.000-0600

If you have observed carefully the "AGENT/<ID>" is created only when channel is bridged and not when agent is logged in.

When you do a redirect the call is disconnected in the queue, so the "AGENT/<ID>" should be destroyed. So theoritically redirect on "AGENT/<ID>" should be blocked or changes are required to fix this as its a very valid and important requirement that you would like to redirect an agent in a meetme room.

By: Serge Vecher (serge-v) 2007-02-28 13:41:11.000-0600

BigJimmy, are you still experiencing this with the latest 1.4 checked out from SVN?

By: James Terhune (bigjimmy) 2007-02-28 15:11:55.000-0600

It's still a problem with the latest svn, but now if I type 'core show channels' on the console, I get a core dump.  I will recompile and provide a backtrace tomorrow morning.

Here's the console output:

   -- Created MeetMe conference 1023 for conference '555'
   -- Executing [a555@vericonf:1] MeetMe("Agent/1", "555|qdxA") in new stack
[Feb 28 17:06:00] WARNING[11160]: channel.c:1817 ast_waitfor_nandfds: Thread -1228547168 Blocking 'Zap/pseudo-1700355906', already blocked by thread -1229284448 in procedure ast_waitfor_nandfds
fredericton*CLI> core show channels
Channel              Location             State   Application(Data)
Zap/pseudo-170035590 s@testin:1           Rsrvd   (None)
Zap/pseudo-211293408 s@testin:1           Rsrvd   (None)
Agent/1              a555@vericonf:1      Up      MeetMe(555|qdxA)
Zap/49-1             a555@vericonf:1      Up      MeetMe(555|qdxA)
fredericton*CLI>
Disconnected from Asterisk server
Executing last minute cleanups
/usr/sbin/safe_asterisk: line 111: 11155 Segmentation fault      (core dumped) nice -n $PRIORITY ${ASTSBINDIR}/asterisk ${CLIARGS} ${ASTARGS} >&/dev/${TTY} </dev/${TTY}
Asterisk ended with exit status 139
Asterisk exited on signal 11.
Automatically restarting Asterisk.
root@fredericton:/usr/src/svn/asterisk#

By: Serge Vecher (serge-v) 2007-02-28 15:17:47.000-0600

ok, please do; also, don't forget to specify the exact revision #.

By: James Terhune (bigjimmy) 2007-03-01 09:44:54.000-0600

I have uploaded a new backtrace, this is for revision SVN-trunk-r57147

By: Serge Vecher (serge-v) 2007-03-01 16:34:39.000-0600

BigJimmy: have the changes in 9175 resolved the issue for you?

By: James Terhune (bigjimmy) 2007-03-02 07:49:05.000-0600

That did not fix the problem.  Uploaded bt and full log for SVN-trunk-r57438.

By: Serge Vecher (serge-v) 2007-03-02 09:52:18.000-0600

Josh, would you please look at this one too? ;)

By: James Terhune (bigjimmy) 2007-03-20 08:05:45

This is still a problem in SVN-trunk-r59043

By: James Terhune (bigjimmy) 2007-04-12 13:51:28

This is still a problem in SVN-trunk-r61599.

This bug is also older than many of the bugs that are getting discussed in the "top 15 oldest bugs" discussions.

By: Dan Turner (dan turner) 2007-04-23 19:49:30

serge, I also am having real issues with this bug. How can I get if fixed? Is this issue resolved in Business Edition?

By: Dwayne Hubbard (dhubbard) 2007-09-12 15:52:56

This may be fixed in 82286 (1.4 branch) and 82287 (trunk).  Can someone please try the latest and greatest and then report back on this issue ?

By: James Terhune (bigjimmy) 2007-09-13 09:36:06

Still doesn't work in the latest 1.4 branch.  'core show channels' doesn't crash it any more, but it still gets deadlocked.

By: James Terhune (bigjimmy) 2007-09-13 10:44:11

I uploaded a 'core show locks' output.

By: Dwayne Hubbard (dhubbard) 2007-09-13 22:13:42

BigJimmy,
 Is the uploaded 'core show locks' output the only reason you believe your channels are deadlocked ?  If not, what other side effects of a deadlock are you experiencing since the crashes were fixed?
  I'm duplicating the steps you gave to cause the deadlock via AMI and I'm not seeing your problem.  Would it be possible for you to provide a script that I can use to definitely reproduce your problem?  You don't have to provide the script here if you want to get it to me another way.



By: James Terhune (bigjimmy) 2007-09-14 11:41:17

The main symptom that I'm seeing is that none of the zap channels work when it is deadlocked.  I know that the people in the be-support department have looked at it, but I've uploaded a file which details how to reproduce the problem.

By: Dwayne Hubbard (dhubbard) 2007-09-14 12:38:29

I used the files that you provided to BE-support to create a Tcl script that implements the steps that you detail, but do not see the issue.  This is most likely a race condition and that is why I was requesting a script that you are using to reproduce the issue.

Don't let the output of 'core show locks' (locks-1.4-82325.txt) make you think that the two agent threads waiting for app_lock are actually deadlocked.  I believe that this is how chan_agent implementation works.  While agents are in a call they lock their thread with the app_lock so the other channel thread can handle the call the agent is in.

Does the 'core show channels' CLI operation that you state still gets deadlocked actually complete its output and provide you with a CLI prompt after completion ?

By: James Terhune (bigjimmy) 2007-09-14 12:58:11

I am able to reproduce the problem without a script, I just paste it into the manager session as need be.  However, I do notice a few things:
- Doing a 'core show channels' will usually segfault asterisk if I do it directly after the Redirect, but before I hang anybody up.
- If I swap which channel is in the Channel and ExtraChannel parameters, it will deadlock, but won't crash on a 'core show channels'
- If I redirect the non-agent channel (without using ExtraChannel) and do an Originate to the agent channel that points it to the conference, it won't deadlock.

By: Dwayne Hubbard (dhubbard) 2007-09-14 13:02:09

OK, I'll try some of these things and provide an update.  thanks!

By: Dwayne Hubbard (dhubbard) 2007-09-14 15:27:45

I'm now able to reproduce 'the' or a crash, but its going to take some time next week to figure out what is really happening and how to fix it.  I'll provide an update as soon as I have something interesting to report.



By: Dwayne Hubbard (dhubbard) 2007-09-27 18:01:48

I'm almost positive that this issue is fixed in revision 84018 of the 1.4 branch and 84019 of trunk.  Please try the latest and let me know if you still have problems.

By: James Terhune (bigjimmy) 2007-09-28 08:43:36

I tried version 82309, but now when you redirect and Agent/ channel it doesn't crash or deadlock, which is good, but you lose the agent channel.

Test scenario has an agent logged in on Zap/51 and a call on Zap/49 connected to Agent/1.  I redirect as below:

Action: Redirect
Channel: Agent/1
ExtraChannel: Zap/49-1
Exten: a555
Context: vericonf
Priority: 1

And I see this over the manager: (note the AgentLogoff event)

Response: Success
Message: Dual Redirect successful

Event: Unlink
Privilege: call,all
Channel1: Zap/49-1
Channel2: Agent/1
Uniqueid1: 1190987965.28
Uniqueid2: 1190987967.29
CallerID1: 200
CallerID2: 4001

Event: Hangup
Privilege: call,all
Channel: Agent/1
Uniqueid: 1190987967.29
Cause: 0
Cause-txt: Unknown

Event: Newexten
Privilege: call,all
Channel: Zap/49-1
Context: vericonf
Extension: a555
Priority: 1
Application: MeetMe
AppData: 555|dx
Uniqueid: 1190987965.28

Event: Newchannel
Privilege: call,all
Channel: Zap/pseudo-1466374569
State: Rsrvd
CallerIDNum: 200
CallerIDName: <unknown>
Uniqueid: 1190988034.30

Event: Agentlogoff
Privilege: agent,all
Agent: 1
Logintime: 71
Uniqueid: 1190987960.27

Event: Newexten
Privilege: call,all
Channel: Zap/51-1
Context: vericonf
Extension: a555
Priority: 1
Application: MeetMe
AppData: 555|dx
Uniqueid: 1190987960.27

Event: MeetmeJoin
Privilege: call,all
Channel: Zap/51-1
Uniqueid: 1190987960.27
Meetme: 555
Usernum: 2

Event: MeetmeJoin
Privilege: call,all
Channel: Zap/49-1
Uniqueid: 1190987965.28
Meetme: 555
Usernum: 1


If I do a single redirect with the agent channel (no ExtraChannel) I also lose the Agent channel.

By: Dwayne Hubbard (dhubbard) 2007-09-28 09:37:21

BigJimmy,
 I need you to please try revision 84018+ of the 1.4 branch, or 84019+ of trunk.  I think your problem is resolved in these revisions.  Revision 82309 does not include my latest fix.  I definitely saw 'core show channels' crashes and dead channel problems go away with my latest fix.

By: James Terhune (bigjimmy) 2007-09-28 09:53:14

I just checked again and the version I reported was what "core show version" gave me, which is incorrect.  The actual version I tested was 1.4-84080

Sorry about the confusion.

By: Dwayne Hubbard (dhubbard) 2007-09-28 16:03:54

BigJimmy,
  You lose the agent channel because an agent's base channel is what really must be redirected, not the agent.  A Redirect does an asynchronous goto on a channel to a new context, extension, and priority in the dialplan.  In order to accomplish this, the channel must leave its existing context, extension, and priority; thus the AgentLogoff event.

  If you mean something else by "you lose the agent channel", please explain.  Otherwise, I believe its time to close this issue if you are no longer seeing crashes, getting deadlocks, or accumulating stuck channels.

By: James Terhune (bigjimmy) 2007-10-01 13:34:55

Well, I am not seeing any more deadlocks, crashes or anything else undesirable, so I would say that the issue is fixed.  As for the loss of the Agent/ channel, that's another issue for another time.

By: Dwayne Hubbard (dhubbard) 2007-10-01 16:15:01

The final fix is committed to revision 84274 of the 1.4 branch and 84275 of trunk.  Thank you for your patience with this issue and for all your quick feedback.