ASTERISK-10374: Asterisk segfaults after an attended transfer to a queue using "Eyebeam" softphone.

[Home]

Summary: ASTERISK-10374: Asterisk segfaults after an attended transfer to a queue using "Eyebeam" softphone.

Reporter: Ted Brown (ted brown) Labels:

Date Opened: 2007-09-23 16:13:44 Date Closed: 2007-11-05 08:51:12.000-0600

Priority: Critical Regression? No

Status: Closed/Complete Components: Applications/app_queue

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments: ( 0) 170907_info_threads.txt
( 1) 170907_info_variables.txt
( 2) 170907_thread_apply_all.txt
( 3) agents.conf
( 4) bt_full_crash_on_invite.txt
( 5) bt_full_071004.txt
( 6) configuration071009.tar.gz
( 7) core_show_channels.txt
( 8) extensions.conf
( 9) queues.conf
(10) sip.conf

Description: Platform: Suse Linux Enterprise Server 10
Machine: IBM xSeries 226
Asterisk version: 1.4.11

Bug description:

Asterisk crashes (segfault) when an attended transfer to a queue is performed and when EYEBEAM sofphone is used to make the transfer. This crash can be easily reproduced as follows:

****** STEPS TO REPRODUCE ******

- New call is placed to a queue "A"
- Call is passed to an agent registered in queue "A"
- Agent takes a new line in his softphone and starts a new call to another queue "B"
- Another agent takes the call out of the queue "B"
- First agent transfers the call to latest agent (by pressing XFER button).
- Transfer is correctly perfomed.

After that, Asterisk will crash after a "show channels" command or when processing an INVITE. The crash doesn't occur when using a different phone to transfer the call (tested in our labs with Linksys and SNOM).

Comments: By: Russell Bryant (russell) 2007-09-25 02:42:17

Please provide a backtrace. See doc/backtrace.txt for more information
By: Ted Brown (ted brown) 2007-09-25 05:02:54

It seems that the last part of my description get lost, here it goes again:

The procedure to reproduce the crash is as follows, which can be summarized as performing an attended transfer to a queue using "Eyebeam" softphone:

- User A receives a call in line #1 of his softphone.
- User A takes a new line #2 in his softphone, and makes a call to a queue using that line
- User B (agent) takes this call out from the queue (no matter which phone/softphone he's been using)
- User A transfers the call in line #1 to line #2, pressing the XFER button

After that event, Aterisk will segfault after a "show channels" command is issued.

As a workaround, we've taught users to not make attended transfers to queue, only blind transfers.
By: Dmitry Andrianov (dimas) 2007-09-25 05:23:07

out of curiocity - isn't it simpler to tell admins not to use "show channels" command instead of telling users not to do atxfer?

(of course it it still a workaround and the bug needs to be solved but at least it lets user work in the way they used to...)
By: Ted Brown (ted brown) 2007-09-25 05:33:43

Well, I forgot to mention that Asterisk also hangs even if no "show channels" is performed, but without a clear pattern (processing SIP INVITE's).

By: Ted Brown (ted brown) 2007-09-25 08:50:46

I've uploaded a very simple configuration that can be used to reproduce the crash (agents.conf, queues.conf, extensions.conf and sip.conf). I've only modified these files after a clean installation and "make samples".

The steps to make Asterisk crash with this configuration is:

-Register 3 phones with extensions 302,202 and 402. One of them must be
an eyebeam (extension 402)

-Make a call from the phone with extension 302 to 790 (to register the user as an agent)

-Make a call from extension 202 to extension 402

-Answer the call.

-Put the call on hold in the eyebeam

-Call from another line of the eyebeam to number 300 (the queue's number)

-Answer the call in extension 302 (as it is the only agent online)

-In eyebeam, press 'XFER' button and the line 1 button. So far, transfer
is really made (on the eyebeam's screen you must be able to read "Transfer
succeeded") and both parties can talk to each other.

-In the CLI, type "core show channels"

Right now you have been disconnected from Asterisk, and a coredump has
been generated in /tmp/ directory.

Hope it helps.

By: Ted Brown (ted brown) 2007-09-27 05:53:56

Has anybody been able to reproduce this problem?
By: Russell Bryant (russell) 2007-09-27 08:27:21

I'm marking this as related to 7706 it sounds like both are symptoms of the same core problem.
By: Ted Brown (ted brown) 2007-09-28 05:20:28

I wonder if it would be possible to make a workaround via dialplan, not allowing attended transfers when the transferee is a queue. Could we somehow distinguish between blind and attended transfers on the dialplan, so we can make blind transfers avoiding this crash?
By: Tilghman Lesher (tilghman) 2007-09-28 15:46:43

Would it be possible to get just a 'bt full' here? It's not obvious which thread is crashing.
By: Ted Brown (ted brown) 2007-09-29 07:16:30

I've just uploaded a bt full from a crash on an INVITE request.

I want to point that the 107 extension is the transferer and the 403 is the transferee. But the transfer was made several minutes before the crash

Would it be helpful uploading a sip debug?
By: Dwayne Hubbard (dhubbard) 2007-10-01 16:33:05

Ted,
This sounds just like issue 7706, which is now fixed in revision 84274 of the 1.4 branch, and 84275 of trunk.
I just fixed an issue with an AMI redirect to a meetme using Agents that caused a crash if you did 'core show channels' and it hung if you didn't type 'core show channels' but the call was torn down normally.
Can you please try this again on the latest and greatest and verify that you are still seeing this problem? Otherwise, I think this issue should be fixed.
By: Dwayne Hubbard (dhubbard) 2007-10-01 16:59:41

Waiting to see if the problem exists using revision 84274 or later.
By: Ted Brown (ted brown) 2007-10-02 09:00:09

i'm running SVN-branch-1.4-r84291 now in my little test machine, as i cannot do any test in the production machine until tomorrow.

it looks like it doesn't crash anymore making a 'core show channels'

but there is a tiny problem. A few days ago, i found a weird way to make my test asterisk crash on an INVITE request; the bt full in this case is similar to the bts when asterisk crashes on an INVITE in my production machine. i use a linksys with one only line for extension A, i call extension B (the eyebeam) which calls extension A back and makes the attended transfer - asterisk crashes, even with this new release. but now, the bt shows different things.

this maybe a different issue (and a very irrelevant one). i don't know, but i tested this doing the attended transfer with another linksys and asterisk didn't crash.

as i said, tomorrow i'll test this release in my production machine, and tell you
By: Ted Brown (ted brown) 2007-10-04 10:46:13

Hi

we've just had a crash in the production machine. The bt full (uploaded) shows different things now... but the same things it shows in my little test machine making it crash the weird way. So i guess it's the same problem with a different result
By: Dwayne Hubbard (dhubbard) 2007-10-04 13:38:53

Ted,
Can you please redescribe your problem. I believe the 'core show channels' aspect of your issue is resolved, but that you have other [maybe related] issues. From looking at your backtrace, I'm going to have to duplicate the issue here before I can really help with this issue.

Please describe, if you can, the absolute minimum config/steps required to duplicate your latest issue. If you can provide all the configs and everything, it would be ideal.
By: Ted Brown (ted brown) 2007-10-05 08:01:41

Sorry, i have found no clear pattern to reproduce this. I have been testing several ways, and I made it crash. But not a deterministic way. There must be some factor I am not noticing. It will crash after an attended transfer, sooner or later. And in the production machine there are dozens of transfers during a day. You can try to duplicate it as I explained before, with a transfer with an eyebeam to the same extension, and with the configuration that I have already uploaded.

The difference between crashing or not crashing can be seen on the console. When it doesn't crash it looks like this:

-- SIP/302-08238dd0 answered Local/302@internal-1b3b,2
[Oct 5 14:58:38] DEBUG[10917]: app_queue.c:2166 wait_for_answer: Dunno what to do with control type -1
-- Agent/302 answered SIP/129-08232218
-- Stopped music on hold on SIP/129-08232218
[Oct 5 14:58:38] DEBUG[10917]: chan_agent.c:538 agent_read: Bridge on 'SIP/302-08238dd0' being set to 'Agent/302' (3)
[Oct 5 14:58:38] DEBUG[10917]: chan_agent.c:446 agent_read: Native formats changing from 4 to 524292
[Oct 5 14:58:38] DEBUG[10917]: chan_agent.c:446 agent_read: Resetting read to 4 and write to 4
== Spawn extension (internal, 302, 3) exited non-zero on 'Local/302@internal-1b3b,2'
-- Started music on hold, class 'default', on SIP/302-08238dd0
-- Stopped music on hold on SIP/302-08238dd0
-- Stopped music on hold on SIP/202-0822b098
[Oct 5 14:58:40] DEBUG[10664]: chan_sip.c:13026 attempt_transfer: SIP transfer: Succeeded to masquerade channels.
[Oct 5 14:58:40] DEBUG[10855]: chan_agent.c:815 agent_hangup: Hungup, howlong is 0, autologoff is 0
[Oct 5 14:58:40] DEBUG[10917]: chan_agent.c:461 agent_read: Bridge on 'Agent/129<ZOMBIE>' being cleared (2)
== Spawn extension (departamento3, s, 4) exited non-zero on 'SIP/129-08232218'

When it crashes it never gets to show the 'Agent/XXX<ZOMBIE>' line(btw: transferred:202, tranferer:129, transferee:302)
By: Ted Brown (ted brown) 2007-10-09 05:21:30

I'm testing now SVN-branch-1.4-r85057. A couple of things to tell you:

Found a way to make * crash. I made several attended transfers to queues which didn't crash, then tried this: made a call to a queue A, picked the call with an eyebeam A, then atx to another queue B, picked this second call with another eyebeam, then atx to another queue C, then crash. Just after the crash tried this again and asterisk didn't crash.
This scenario may look quite complicated, but in our production machine is perfectly possible and understandable.

I also tried the latest patch for bug 10406, and asterisk will crash after every attended transfer to queues with the eyebeam. Tested it with linksys phones, and it doesn't crash.

I noticed too that the number of channels used for the transfer changes from one model and version to another.

I'm uploading my current configuration and the core show channels for different tests
By: Ted Brown (ted brown) 2007-10-10 06:15:29

Testing latest svn. NO AUDIO IN SIP CALLS TO QUEUES. In fact, the behavior is the same as it was with 85057 with patch for bug 10406, but i hadn't noticed this before.

Also, now attended transfers to queues with zoiper 2.09 crash too

More details in http://bugs.digium.com/view.php?id=10406#71752
By: callguy (callguy) 2007-10-10 07:20:59

Ted Brown: I think we are both running into a similar set of issues. Look at the first diff in bug 10571 posted by Corydon76 (it turns off local channel optimization) you may want to try your scenarios with that and see if you experience the same behavior.

We found a lot of problems caused by auto-creating local channels and did some dialplan work to avoid it wherever possible to good effect (though there are obviously still issues separate from that). Your output of core show channels looks suspiciously similar.
By: Ted Brown (ted brown) 2007-10-10 10:17:15

callguy: I tried corydon's diff from bug 10571, but there is no noticeable change. I agree with you that there is something very wrong here related to local channels and redirections. If you can take a look to the extensions.conf uploaded here and tell me if there is something i can change to avoid the use of local channels, i'd appreciate it very much.
By: callguy (callguy) 2007-10-23 07:24:18

Ted Brown - we are seeing the same thing as you, no audio on calls into queues, though it seems to be related to certain queue settings.

Edit: This apears to be only happening on queues that have agents, persistent members work fine, but it is 100% reproducible, no audio on SIP calls into any agent in a queue.

By: Ted Brown (ted brown) 2007-10-29 12:13:37

I have just tested attended transfers to queues commenting the code lines callguy says in bug 11071. With the eyebeam and that fix applied, it doesn't crash. Our eyebeam project has been cancelled due to this bug, so i can't test this in a production environment. Sorry
I can also tell you that zoiper problem is still there even commenting those lines.
By: Joshua C. Colp (jcolp) 2007-11-02 14:32:22

Please try the patch in bug 11071 to see if it solves this issue.
By: Ted Brown (ted brown) 2007-11-05 05:09:28.000-0600

I've tested the patch for bug 11071. Now there is no crash with eyebeam, neither with zoiper. Everything working fine

I can't assure you this issue won't reappear, as I can't test it with large load. But if nobody else disagrees, I think you can close this issue

By: Dwayne Hubbard (dhubbard) 2007-11-05 08:51:12.000-0600

It looks like fixes for issues 7706 and 11071, as well as some others may have fixed this issue. Reporter is no longer seeing crashes. Thanks to all that contributed to this issue.