|Summary:||ASTERISK-06116: Asterisk crashes when a call is blind-transfered to Parking|
|Date Opened:||2006-01-17 14:03:39.000-0600||Date Closed:||2006-06-01 15:10:30|
|Environment:||Attachments:||( 0) AsteriskCrash20060120-1114.zip|
( 1) Channel_deadlock_12h00.txt
( 2) Channel_deadlock_23h28.txt
( 3) deadlock.zip
( 4) deadlock_20060120-1425_gdb_Output.txt
( 5) deadlock_20060122-1406_Full_Log.txt
( 6) deadlock_20060122-1406_Full_Log_Zap39.txt
( 7) deadlock_20060122-1406_gdb_Output.txt
( 8) deadlock_20060124-2119_gdb_Output.txt
( 9) deadlock_20060124-2143_gdb_Output.txt
|Description:||Once or twice a day, ever since our implementation of Asterisk (version 1.08 up to the latest release), we have been experiencing a deadlock in Asterisk. Talking to the users, it seems that someone was always parking a call at the same time as the deadlock occurs.|
****** ADDITIONAL INFORMATION ******
I have a core dump available on the server. I have included the results from gbd as well as a snippet from the CLI, FULL and MESSAGE logs.
25 minutes before Asterisk deadlocked today, it crashed and produced a core dump automatically (safe_asterisk). Again, it happened while parking a call.
|Comments:||By: Tilghman Lesher (tilghman) 2006-01-17 18:06:56.000-0600|
Please upgrade to the latest 1.2 tree and see if you still get this problem. This is NOT a deadlock.
By: Shaun (sdaigle) 2006-01-18 09:11:33.000-0600
This is a production machine and cannot be upgraded during the day. I will perform the upgrade this evening, after our regular business hours, and report back my finding.
By: Shaun (sdaigle) 2006-01-19 07:24:53.000-0600
I have upgraded to Asterisk 1.2.2 last night. I also noticed that the "timing" in /etc/zaptel.conf was set wrong. Prior to my change, the master timing originated from span 1 (spans 1,2,3 are channel banks and span 4 is a PRI from the telco). Reading through documentation, it clearly says that I should be getting my master timing from the telco, and never from a channel bank. My new config looks like this:
span=1,0,0,esf,b8zs ; used to be span=1,1,0,esf,b8zs
span=4,1,0,esf,b8zs ; used to be span=4,0,0,esf,b8zs
I don't know if this will make a change or not, but it can't hurt I guess.
A few hours after upgrading to 1.2.2, the console "froze" on the server, and I could no longer connect to it via putty. This is a new behaviour that I've never experienced before. However, Asterisk was still functioning normaly. I have since rebooted the server and the console is operational again.
I will report back as soon as anything new develops.
By: Shaun (sdaigle) 2006-01-20 09:53:33.000-0600
I have not yet experienced the "locking" of Asterisk, however, it crashed yet one more time this morning.
The current version of asterisk as shown on the CLI is:
Again, while parking a call (call originating from IAX channel), Asterisk died and safe_asterisk automatically restarted Asterisk.
See the zippped logs and gdb output for more info.
By: Tilghman Lesher (tilghman) 2006-01-20 10:40:23.000-0600
Please do not upload zipfiles. They require us to download, unzip, and view the files, instead of just viewing them online. Just upload the text files.
By: Shaun (sdaigle) 2006-01-20 10:46:11.000-0600
I will from now on... sorry.
By: Shaun (sdaigle) 2006-01-20 11:47:44.000-0600
I managed to deadlock Asterisk twice in a row using Call Parking of an IAX channel.
I can reproduce this behaviour anytime. Here's how I do it:
1) Have someone call me through an IAX channel
2) Using my phone, I park the call using my SIP phone. It may, or may not deadlock. After picking up the call and parking it again a few times, it deadlocks everytime.
I'm attaching my gdb log.
By: Shaun (sdaigle) 2006-01-22 11:19:35.000-0600
It deadlocked again during the weekend. I've attached a new file showing gdb output.
By: Shaun (sdaigle) 2006-01-23 07:39:08.000-0600
I attached two more files with data originating from Asterisk's full log. I included data pertaining to threads that were waiting for a lock. I also included data relating to Zap/39 as it appears that it was the first channel that did not release the lock... in the log file, I marked "odd" entries with ***.
By: Shaun (sdaigle) 2006-01-24 19:24:50.000-0600
I reinstalled Asterisk from scratch, installed it onto another machine, installed a new Digium TE410P with the 2nd generation firmware, and I'm still experiencing the same issue. I can reproduce the issue when parking any type of channel (Zap or IAX)... it is not only related to IAX as per my previous post. I managed to freeze Asterisk three times at will in a matter of minutes, simply by continuously parking calls... after parking a number of calls (approx between 20 and 30 times), Asterisk freezes and displays a message such as this on the CLI when doing a show channels:
Jan 24 21:18:33 WARNING: channel.c:787 channel_find_locked: Avoided deadlock for '0x82a00c8', 10 retries!
I'm attaching output from the three seperate gdb dumps.
Asterisk SVN-branch-1.2-r8194M built by root @ imgast2 on a i686 running Linux on 2006-01-19 17:37:09 UTC
By: Shaun (sdaigle) 2006-01-25 20:12:13.000-0600
I upgraded to 1.2.3 this morning and the issue still persists.
By: Shaun (sdaigle) 2006-01-26 10:49:33.000-0600
Something I just taught of... the way we do call parking is via Blind Transfers... which is not the proper way to do it... I believe the proper way of parking a call using a SIP phone would be using attended transfers as Asterisk is trying to say where the call will be parked. Using blind transfers, Asterisk starts to "say" where the call is parked, but since the call is hung up (because of the blind transfer), it does not succeed.
Any chance that this is the reason why the channels are deadlocking?
By: Matt O'Gorman (mogorman) 2006-01-30 14:43:40.000-0600
can you provide any more info about how to reproduce this bug, i tried for over an extended period of time and was unable to duplicate it
By: Shaun (sdaigle) 2006-01-31 08:53:42.000-0600
Here are the exact steps I use to reproduce the issue:
1) I call my own extension (a sip phone, either via another SIP phone, Zap channel, IAX channel, it doesn't matter).
2) I answer the call.
3) Using the transfer button on my phone, I blind transfer the call to the "park extension". (NOTE: Asterisk tries to "say" to me where the call is parked, but since I performed a blind transfer, the channel is hungup on my end and looking at the CLI, I can see that Asterisk simply stops "telling" me where the call is parked)
4) I pickup the parked call on my phone.
5) I park the call again (blind transfer)
6) I repeat steps 4-5 until it deadlocks, which takes about 5 to 10 minutes.
By: Serge Vecher (serge-v) 2006-01-31 09:16:05.000-0600
sdaigle: can you please repeat the exercise with the following commands entered prior to capturing the log: 1) set verbose 4; 2) set verbose 4; 3) sip debug. Please capture the full output from beginning to until Asterisk deadlocks and post as an attachment. Thanks!
By: Shaun (sdaigle) 2006-02-01 20:53:56.000-0600
I attached the SIP debug data as requested. Sorry for the zip file... the log was too big to attach as plain text.
This particular time, the channels deadlocked after parking 99 calls.
By: Shaun (sdaigle) 2006-02-01 21:26:48.000-0600
I just did the same exercise as I just did a few minutes before and this time it deadlocked after parking 122 calls.
I'm attaching the gdb output from both execises.
By: Olle Johansson (oej) 2006-03-07 14:57:00.000-0600
This needs to be looked at.
By: jharragi (jharragi) 2006-03-15 13:48:44.000-0600
Does the crash occur when parking or retrieving the parked call? The reason I ask is I had crashing upon retrieving parked calls on recent svn HEAD I backed off the svn about a month and it no longer crashes but gets one way audio while spewing...
Mar 15 14:11:52 WARNING: chan_sip.c:2571 sip_write: Asked to transmit frame type 64, while native formats is 4 (read/write = 4/4)
...until the call drops...
-- Stopped music on hold on SIP/6601-73fb
I'm wondering if this is related to bug ASTERISK-4004?
By: Shaun (sdaigle) 2006-03-17 07:08:06.000-0600
The problem occurs while parking the call, not when retrieving it. I'm able to work around the problem by performing an "attendanded transfer" vs "blind transfer".
Usually, when we park a call, the "line" hangs up on the "parker's" phone, and the call is retrieveable on any phone. When the problem occurs, the line does NOT hangup on the "parker's" phone... the call "seems" to be on hold, but in reality, the call is lost. When this occurs, all SIP activity stops. Existing SIP calls are OK, as long as they don't do anything (like transfers, place calls on hold etc), ZAP calls ar OK too. People can still make ZAP to ZAP calls, but no new SIP calls can be made.
I hope this extra information will help identify the issue.
By: Samy Kamkar (samyk) 2006-03-22 01:45:11.000-0600
I'm able to reproduce this problem in Asterisk 1.0.9 with two SIP phones.
Steps to reproduce:
1) Joe (SIP phone) calls Samy (SIP phone)
2) Samy answers call
3) Samy blind transfers Joe to parking extension 9000 and has no more active calls
4) Samy picks up the parked call by dialing 9001
5) Repeat steps 3-4 between 3 to 20 times until a permanent channel lock occurs
"show channels" and "show parkedcals" then result in "ast_channel_walk_locked: Avoided deadlock for ..." messages for the SIP channel that performed the blind transfer.
I've attached CLI output of the problem occurring and the result upon executing "show channels". Note that I've changed the ast_channel_walk_locked to retry 100 times rather than 10 for kicks.
By: Olle Johansson (oej) 2006-03-22 02:02:47.000-0600
Please repeat this with full debugging and also run "sip show channels" as well as "show channels". Thanks.
By: Samy Kamkar (samyk) 2006-03-22 02:15:19.000-0600
I restarted Asterisk in order for everything to be "fresh" and reproduced the problem.
Attached is a typescript with verbose+debug data, along with "show channels" and "sip show channels".
Within the debug data, I noted when I actually executed the "show channels" and "sip show channels". I also show just a verbose snapshot of when the problem occurred.
By: Samy Kamkar (samyk) 2006-03-23 18:05:04.000-0600
I also cannot reproduce this with attended transfers.
By: opsys (opsys) 2006-04-30 01:06:33
Is this still Occuring? I have not come accross this problem.
Can we get a DeBug if it is avalable.
By: Andrey S Pankov (casper) 2006-05-05 13:18:15
If we would be back in the times when markster was managing the tree the issue
whould be resolved or at least it would be more clear what's happening.
Someone need access the hung box and gdb it (like it was in the above mentioned
times). We'll be waiting for ages for reporters' feedback like this... :(
By: Shaun (sdaigle) 2006-05-08 08:13:42
We are still on 1.2.4. We are testing 22.214.171.124 in our lab and will most likely be upgrading to the latest tree mid-week this week. As soon as this is done, I will submit a new gdb output (I could submit one tonight, but I'm sure it would not look any different than the ones I submitted before).
By: Serge Vecher (serge-v) 2006-05-16 10:06:48
sdaigle: did you get a chance to get a new backtrace? 1.2.8 was out yesterday, so you may want to get a BT from there. Thanks.
By: Juan Pablo Abuyeres (jpabuyer) 2006-05-16 15:11:39
I don't see 1.2.8 on the web, nor on svn... are you sure?
By: Serge Vecher (serge-v) 2006-05-16 15:24:49
positive -- it's so new that it is not on the web yet.
However, it's on svn:
By: Olle Johansson (oej) 2006-05-16 15:27:32
vechers: No new release is out before we have tar balls released on the ftp server.
By: Serge Vecher (serge-v) 2006-05-24 11:59:49
sdaigle: can you please test the patch 20060524_bug7053_trail.patch in 7053? Thanks.
By: Serge Vecher (serge-v) 2006-06-01 15:10:13
alright, since original posters are not responding and work is on-going in 7053 to address this, I'm closing this issue. If anybody decides to come back to this issue, please report results of testing in 7053. Thank you.