[Home]

Summary:ASTERISK-08448: Crash on blind transfer of an incoming call (queue)
Reporter:cajus (cajus)Labels:
Date Opened:2006-12-28 12:06:10.000-0600Date Closed:2011-06-07 14:00:18
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt-full_20070102.txt
( 1) bt-full-1.2.14-1.txt
( 2) bt-full-4.txt
( 3) crash-1
( 4) crash-2.bz2
( 5) verbosedebug.txt
( 6) verbosedebug2.txt
( 7) verbosedebug2-trimmed.txt
( 8) verbosedebug4-trimmed.txt
Description:On higher load, asterisk crashes frequently when transferring incoming calls that get collected in a queue to another SIP phone. Sadly, this happens sometimes and is not reproducable every time.

System is: Debian GNU/Linux (etch), uP "Intel(R) Xeon(TM) CPU 3.00GHz" system with "Eicon Networks Corporation Diva Server PRI Rev 3 (rev 01)". I'm using the realtime extension with mysql.

****** ADDITIONAL INFORMATION ******

Debian Packages of 1.2.14 built with complete debugging have been placed on http://www.naasa.net/stuff.
Comments:By: Serge Vecher (serge-v) 2006-12-28 13:20:34.000-0600

A. Is this a backtrace from Asterisk built with "make dont-optimize?" If not, can you please rebuild Asterisk and produce a new bt.
B. Do you think you could produce a SIP debug trace illustrating the problem? If yes, please do as per following:
1) Prepare test environment (reduce the amount of unrelated traffic on the server);
2) Make sure your logger.conf has the following line:
  console => notice,warning,error,debug
3) restart Asterisk with the following command:
  'asterisk -Tvvvvvdddddngc | tee /tmp/verbosedebug.txt'
4) Enable SIP transaction logging with the following CLI commands:
set debug 4
set verbose 4
sip debug
5) Trim startup information and attach verbosedebug.txt to the issue.

By: cajus (cajus) 2006-12-29 05:11:41.000-0600

A. It is ported from the 1.2.13 debian package with -O0 and dont-optimize in this case.
B. I can't reproduce it on the test system. Either it's not enough load, or it doesn't reproduce timings, so:
1) is not usefull in the moment
2) is done
3) is nearly done
4) is done
5) due to the production system, you can get a couple of lines that pop up before the crash, because it is *very* much, elseways. Just let me know if you need more. It's in crash-1



By: Serge Vecher (serge-v) 2006-12-31 11:10:33.000-0600

I wonder, if the crash is due to peer audio being on 0.0.0.0 address in SDP. Can you please produce the debug log *exactly* as per my note, not a snippet of the debug file?

By: cajus (cajus) 2007-01-01 08:28:09.000-0600

As said before - it is a production system and I don't want to run asterisk from within the shell without forking in this case. I can't send you the complete log: until startup and the first core dump it grew to 478MB. crash-2 is a 2MB big 10 minute snapshot around the core mentioned.

Here are some additional hints, that may be related to the problem:
* There are several lines marked with ZOMBIE. I'm not sure, if this happens
 "by design" or it is a bug.
* Looking around Dec 29 11:38:49 shows up with some really screwed up entries
 that look like being a result of a buffer overflow. The database entries
 are clean. Maybe this is another issue.
* Debugging around the locations shown in the backtrace, I noticed that the
 bridged->tech pointer contains 0x00 as of the crash. This is quite strange
 and I was not able to find the reason for that yet.

I don't know, if there's a difference between the debug.log and the console output. If so, let me know how to work around this fact. Thanks!

By: Fred Schroeder (apten) 2007-01-02 09:38:12.000-0600

I was able to reproduce the error submitted by cajus in a lab environment.
Setup:
-3 Snom360 named snom1, snom2, snom3
-snom1 & snom2 are callback agents and member of a queue named edv
-snom2 has setup a redirection on busy to an extension that enters a queue
-snom2 is talking -> line is busy
-a call from snom3 is routed to snom1 via edv-queue
-snom1 accepts call and tries to make a blind transfer to snom2

=> * crashes with segfault
(channel id at this point looks very weird including strange characters)

I added a verbosedebug as requested and the corrosponding core-file.

Although the scenario looks quite unusal I do not think that such a scenario should make asterik crash. ;)



By: Serge Vecher (serge-v) 2007-01-02 12:25:28.000-0600

Apten: was your Asterisk installation compiled with make dont-optimize. If not, can you please recompile and redo the bt.

Also, please produce a bit more informative trace log as per:
1) Prepare test environment (reduce the amount of unrelated traffic on the server);
2) Make sure your logger.conf has the following line:
  console => notice,warning,error,debug
3) restart Asterisk with the following command:
  'asterisk -Tvvvvvdddddngc | tee /tmp/verbosedebug.txt'
4) Enable SIP transaction logging with the following CLI commands:
set debug 4
set verbose 4
sip debug
5) Trim startup information and attach verbosedebug.txt to the issue.

By: Fred Schroeder (apten) 2007-01-02 12:51:40.000-0600

Everything has been done according to your description (as you probably can see from the output in the files).
1) check
2) check
3) check
4) check
5) check but file renamed to verbosedebug2-trimmed.txt

If you need further information, pls let me know.

By: Serge Vecher (serge-v) 2007-01-02 13:08:06.000-0600

strange; there is no verbose output in verbosedebug2-trimmed.txt. I would double-check the console => line in logger.conf.

I see that the weirdness is starting when we trying to handle the REFER. AFAIK, this kind of transfer was not well implemented in the 1.2 branch. Are you able to test the hot-of-press 1.4.0 release?

By: Fred Schroeder (apten) 2007-01-02 13:25:02.000-0600

That is really strange. I will double check the configuration in a few minutes, but everything has been setup the way you described it. I thought that the debug statement at each line of the file and the number of lines was a hint that everything is right and very verbose. ;)

I am afraid that we will not be able to test 1.4.0 version right now. (We upgraded to 1.2.14 just about a week ago due to hints from IRC.)

Could the weird channel ids be a hint to some kind of overflow or unintialized variable as cajus suggested earlier?

By: Fred Schroeder (apten) 2007-01-02 13:43:25.000-0600

Just double-checked: here are the lines from logger.conf
debug => debug
console => notice,warning,error,debug
messages => notice,warning,error


Any idea?

By: Serge Vecher (serge-v) 2007-01-02 13:48:27.000-0600

if you will, please post the untrimmed file to get some clues from there...

By: Serge Vecher (serge-v) 2007-01-02 13:52:25.000-0600

cajus: I've examined crash-2.bz2 and found out that you are using chan_capi. FYI, it is an unsupported third-party ISDN driver; you should really use chan_mISDN distributed with Asterisk. We will attempt to take a crack at this issue with Apten help who seems to be using stock components.

By: Fred Schroeder (apten) 2007-01-02 14:08:21.000-0600

Since this is a lab-environment it is not fully configured. And there might be unnecessary modules etc. loaded at startup.
But the error is the same as in a production environment we are using.
Nevertheless I added the untrimmed verbosedebug with all start-up information.

By: Fred Schroeder (apten) 2007-01-02 14:10:58.000-0600

Additional info: in the lab-environment only SIP-devices are used. (3 Snom360)

By: Serge Vecher (serge-v) 2007-01-02 14:17:15.000-0600

it looks like you've forgotten step 4 completely ;)

By: Fred Schroeder (apten) 2007-01-02 14:25:39.000-0600

It was the second time I reproduced it - I might have forgotten that step while restarting.
I check if I still have got the debug-output from the first run. If not I will have to reproduce the error tomorrow. Sorry. :(

By: Fred Schroeder (apten) 2007-01-02 15:08:27.000-0600

Found this file from the first crash.
In this case we do not see the weird channel-ids but the system crashed, too.

I will produce another log tomorrow.

By: Fred Schroeder (apten) 2007-01-03 03:13:19.000-0600

I was able to reproduce the crash several more times and added new debug-information in file verbosedebug4-trimmed.txt and bt-full-4.txt .

By: Serge Vecher (serge-v) 2007-01-03 08:58:16.000-0600

good work, Apten: I believe all of the information needed for a developer has been gathered.

By: Olle Johansson (oej) 2007-02-15 15:12:31.000-0600

A crash in malloc() ?

There's been similar strange bug reports solved by upgrading libc

By: Olle Johansson (oej) 2007-02-15 15:13:42.000-0600

[Dec  4 17:45:22]     -- Stopped music on hold on ?G???G??1@default-38fd,1<ZOMBIE>
[Dec  4 17:45:22] WARNING[13800]: channel.c:2828 ast_channel_masquerade: Can't masquerade channel 'AsyncGoto/AsyncGoto/A' into itself!

...there's some weird stuff going on in this file (verbosedebug4-trimmed.txt)

By: Olle Johansson (oej) 2007-02-15 15:18:52.000-0600

Apten said:
---
-3 Snom360 named snom1, snom2, snom3
-snom1 & snom2 are callback agents and member of a queue named edv
-snom2 has setup a redirection on busy to an extension that enters a queue
-snom2 is talking -> line is busy
-a call from snom3 is routed to snom1 via edv-queue
-snom1 accepts call and tries to make a blind transfer to snom2
----
Seems like snom1 makes a transfer to snom2 that does redirection to a queue.
How is that redirection done?

By: Olle Johansson (oej) 2007-02-15 15:26:03.000-0600

Found a 302 redirect (answering my own questions)
-- Now forwarding Local/106@default-adf7,2 to 'Local/111@default' (thanks to SIP/snom2-081b8690)

Two local channels involved. This bug report has everything. Local channel, queues and SIP transfers and redirects.

The Invite/hold and REFER is matched to a Zombie channel. That's bad. It should be gone. Need to go through this in detail and find out what that zombie channel comes from. There's a masquerade in the middle of it all, propably caused by the use of chan_local.

By: Serge Vecher (serge-v) 2007-03-07 12:53:19.000-0600

Apten, there were changes to chan_local handling in 1.2.16. Can you please test?

By: Serge Vecher (serge-v) 2007-03-26 12:49:25

cajus, apten: what's the status?

By: Jason Parker (jparker) 2007-04-25 13:26:17

Closing.  Once you are able to test the latest version of Asterisk (there have been 4 releases since), if this still fails, please contact a bug marshal to reopen this issue.