[Home]

Summary:ASTERISK-06978: SIP call blind-transfered into Parking makes it stop working
Reporter:David Cornewell (dcornewell)Labels:
Date Opened:2006-05-15 13:37:03Date Closed:2007-01-09 12:53:13.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_features
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk_messages.txt
( 1) djc.txt
( 2) parkprob.txt
Description:About once a week, transfering a call to our parking exten (100) fails, with the call lost in limbo. After that, all attempts to place additional calls to parking fail, and calls previously parked cannot be retrieved.

We cannot duplicate this with any known method reliably. It happens throughout the work day, and once it has happened, is only fixed by restarting asterisk (see below).

We cannot unload res_features.so and reload it when this happens either.

Everything else seems to function normally. We also use the automon feature, and that continues to work after parking dies.

We can provide any additonal details you may require in order to fix this problem. Thanks.



****** ADDITIONAL INFORMATION ******

We're currently using 1.2.7.1, and have had this bug since at least 1.2.2, but our usage of parking has increased as well, so I'm unsure of the exact version this bug appeared.

We currently stop and restart Asterisk every night at 5AM with a 'asterisk -rx "stop now"' command in cron.

Comments:By: Serge Vecher (serge-v) 2006-05-15 14:07:17

Hmm, there are a couple of bugs open now (7053, 7090) involving the parking feauture, but they result in a lock up of chan_sip, no res_features. What channels are involved in your case?

By: Andrey S Pankov (casper) 2006-05-15 14:59:14

Maybe you have some logs from asterisk for that?
set verbose 4
set debug 4
Do not forget to enable debug output to a log file or * console.
Thanks!

By: David Cornewell (dcornewell) 2006-05-15 15:27:39

vechers -
We have Zap channels on a Digium 1 port PRI card and 40 polycom SIP phones.  We recieve a call on the Zap channel and transfer to 100 (our parking lot). When it breaks, we won't hear the announcement of "101".  Going in to the console and typing "show channels" shows a parked call that doesn't exist. We cannot retrieve it.  Also, typing "show parkedcalls" will cause the console to stop responding to commands.  It shows the parked calls header, then no data. If I type "show channels" I just get back a command prompt.  Quiting and "asterisk -r" gets me back to where it works again though.  Everything else works.  We recieve calls, transfer between SIP phones, call out.  We just can't park.

casper -
I uploaded some of my log file from around the time it stopped working today.  there are some warnings and errors.  I am going to set up logging to log more stuff for the next time it crashes.

Thank you for your help.

By: Andrey S Pankov (casper) 2006-05-15 16:02:17

Oops... it seems like that bug was already fixed somewhere... vechers?

By: Serge Vecher (serge-v) 2006-05-15 16:21:06

casper: I can't recall that this bug was fixed. I've just reviewed the notes for newly released 1.2.8 and see nothing fixing this from 1.2.7-1. In fact, all the "usual suspects" I have identified in note 0046202 are still open. I believe bweschke is currently working on this (recalling from the notes in those bug reports). I guess we have to wait for a verbose console log to be sure ...

By: Serge Vecher (serge-v) 2006-05-22 15:42:35

dcornewell: let's try to narrow down the issue by testing the relationship (or lack thereof) to 7053. How do you transfer a call to parking from the Polycom:
a) *8
b) blind transfer
c) attended transfer

By: David Cornewell (dcornewell) 2006-05-22 16:04:16

I appreciate you keeping on this.  Our "cheat sheets" told everyone to use an attended transfer, but we found that there were people using blind transfer.  We've told them to stop doing that and I don't think we have lost parking since.  I don't want to say that fixed it, but it may have.  We noticed there were a couple bugs reported that had that as the cause.

If there is any other info you want, i will be happy to provide.  I may be slow to respond as i am out of town right now, but I would like to make sure we don't lose parking any more.  

thank you.

By: BJ Weschke (bweschke) 2006-05-22 17:54:56

It sounds like the parking thread/lot is going into deadlock. I'd be interested to see what the threads are up to (by attaching to the running process with gdb) when/if this happens again.

By: Serge Vecher (serge-v) 2006-05-22 18:24:37

linking this to 7053, as the scenario and workaround are the same. lkimes has been investigating the problem in detail, so hopefully with a concerted effort, we will have a patch soon.

By: David Cornewell (dcornewell) 2006-05-24 09:15:12

"I'd be interested to see what the threads are up to"

The last time it happened, i attached to asterisk with strace.  I did "show parked calls" to make the console stop responding.  Nothing showed up.  If it happens again, i will attach with GDB.  

So far we seem to be avoiding this by not doing blind transfer.  I think our plan for now is to wait for the next release or a patch to see if that fixes it.

Thank you again for all you help/input.

By: Serge Vecher (serge-v) 2006-05-24 11:58:08

dcornewell: flyinwill has just posted a patch in 7053 that should fix the issue can you please apply it (20060524_bug7053_trail.patch) and report if that solves your problem? Thanks,

By: Serge Vecher (serge-v) 2006-05-25 20:48:27

dcornewell: any luck testing flynwill's patch?

By: David Cornewell (dcornewell) 2006-05-26 06:43:32

Sorry, i haven't had the time.  I've been in Baltimore and we haven't crashed since we stopped using blind transfer so it hasn't been a pressing issue.  Actually, where can I download the patch?  I couldn't find it.

By: Serge Vecher (serge-v) 2006-05-26 08:37:10

dcornewell: go to
http://bugs.digium.com/view.php?id=7053 and apply patch named 20060524_bug7053_trail.patch to your Asterisk sources.

By: Serge Vecher (serge-v) 2006-05-30 16:56:57

any luck?

By: David Cornewell (dcornewell) 2006-06-05 12:50:01

Sorry, i have been on vacation and haven't tried this yet.  Is this patch a part of 1.2.8?  If so, I will just upgrade.

By: Serge Vecher (serge-v) 2006-06-05 12:56:34

dcornewell: it is not part of 1.2.8. You will need to manually apply it. It's quite simple, you can just swap two lines -- see flynwill's comments in 7053.

By: David Cornewell (dcornewell) 2006-06-06 18:14:33

I applied the patch tonight.  Sorry it took so long.  I stuck with 1.2.7.1 just to keep things the same.  Before I applied I tried blind transferring a bunch of calls to park.  I wasn't able to lock up call parking, but I did make chan_sip lock.  I don't think it was answering calls on zap channels either.  I couldn't call from my polycom at least.

After applying the patch, I blind transfered some more calls and it is still working.  We'll see how it goes in production tomorrow.  Being that it is so hard to reproduce, it is hard telling if this is fixed or not.

I was a little concerned by the comment in the patch code.  "Don't really know if this is safe" I think is what it said.  At least you're honest ;]  I suppose that's better than my usual /**** BIG HACK ****/  comments.

I will post another comment if it breaks again.  Thank you again for your time.

By: Serge Vecher (serge-v) 2006-08-02 13:58:33

any updates here?

By: David Cornewell (dcornewell) 2006-08-02 14:20:16

Actually, it is locked up right now.  This is only the second time since the patch so I haven't put any time into it.  I have asterisk with verosity=50 and core debug=1.  I attached a log (djc.txt) of what has happened today if anyone wants to pour through it.  There is so much though...  Someone was put on hold at approx. 12:15 and it worked.  At 12:30 it was noticed to be broke.  The times are relative to what is in the log file.

If there is anything I can provide, let me know.  We are on 1.2.7.1.  It just hasn't been enough of a pain to require my time.



By: Serge Vecher (serge-v) 2006-08-02 14:25:16

any chance of trying 1.2.10 here?

By: David Cornewell (dcornewell) 2006-08-03 08:54:19

Yeah, I think we can do that.  It may be a while before we upgrade.  We have people on the system at night sometimes.  I will keep you posted.

By: David Cornewell (dcornewell) 2006-08-04 12:51:04

We switched to asterisk 1.2.10, libpri 1.2.3, zaptel 1.2.7 last night.  Call parking hasn't locked up, but we are loosing calls now.  We will probably be down-grading.  Wierdest thing. My cell phone got hung up on mid ring, my polycom keeps ringing.  We might get a test system up to get this problem resolved, then we'll upgrade.  I'll let you know if we get to the point where we can test the call parking problem

By: David Cornewell (dcornewell) 2006-08-09 08:55:56

Ok, so our dropped calls problem turned out to be our telco. We figured that out before we downgraded, so we are stil on asterisk 1.2.10, libpri 1.2.3, zaptel 1.2.7.  Parking is locked up right now.  I uploaded todays log (parkprob.txt) for info. I have core debug on 1, verbose at 50.  

I have one idea.  I noticed an error about an AGI running on a dead channel. I wrote some dictation software that doctor's call in, record, and them upload dictation to a website. Its in C. It captures SIGHUP and SIGPIPE so that it can upload the file before dying. All it does with the SIG's is set a flag saying, "hey, were done here", uploads the file and exists.

Anyway, I remember having to carefully check return values for invalid stdin/stdout.  I also added this software about the time we upgraded from 1.0 CVS HEAD to 1.2 stable and started having this parking problem.  Any chance my AGI hanging around a few seconds might be pissing off asterisk/call parking?

also, "asterisk -x stop now" will not shut it down.  we have to kill -9.

As always, let me know if you need anything else.  Appreciate the help.



By: Serge Vecher (serge-v) 2006-08-09 10:22:54

1) Well, do you want to patch the 1.2.10 source with flynwill's change from
http://bugs.digium.com/view.php?id=7053 and see what happens?
2) Not too verse on AGI so will skip that.
3) afaik, you would need to issue 'asterisk -rx stop now', not '-x'

By: David Cornewell (dcornewell) 2006-08-09 10:41:30

I applied the patch. I will get it in place tonight when I restart asterisk. I had a typo on the stop now command.  I do use -rx.  

-Thanks

By: Serge Vecher (serge-v) 2006-08-09 11:16:55

ok, I haven't seen the 'asterisk -rx stop now' problem with 1.2.10 reported yet -- see if you can do some more digging and if you think it's a bug, please open a new report for that.

By: David Cornewell (dcornewell) 2006-08-09 11:29:34

Well, I believe the stop now problem is only when call parking is locked up.  I guess we don't stop it when it is working though.  It could happen at other times.  I meant it as symptom of the current problem.  I'll try it when asterisk is running normally and see if it still happens.

By: Serge Vecher (serge-v) 2006-08-25 11:15:45

I take it flynwill's patch fixes the issue?

By: David Cornewell (dcornewell) 2006-09-12 13:02:55

We had another lock up on 1.2.11 with the patch.  Another guy here dug through the code and added a -DDETECT_DEADLOCKS in our compiled version.  It gave us the following errors.  They repeat, but not always in this order.  I can give a larger dump if needed.

Sep 12 13:54:36 ERROR[1882]: include/asterisk/lock.h:245 __ast_pthread_mutex_lock: channel.c line 1810 (ast_read): Deadlock? waited 825 sec for mutex '&chan->lock'?
Sep 12 13:54:36 ERROR[1882]: include/asterisk/lock.h:248 __ast_pthread_mutex_lock: chan_sip.c line 6755 (get_sip_pvt_byid_locked): '&chan->lock' was locked here.
Sep 12 13:54:36 ERROR[9124]: ../include/asterisk/lock.h:245 __ast_pthread_mutex_lock: res_features.c line 1708 (park_exec): Deadlock? waited 810 sec for mutex '&parking_lock'?
Sep 12 13:54:36 ERROR[9124]: ../include/asterisk/lock.h:248 __ast_pthread_mutex_lock: res_features.c line 1505 (do_parking_thread): '&parking_lock' was locked here.

--EDIT
Should I go to the new version? I noticed a "look ma, no more deadlocks" comment.



By: Serge Vecher (serge-v) 2006-09-13 11:34:21

well: that partical fix is for deadlocks related to usage of transfer with agents in callback mode. I would upgrade to 1.2.12.1 anyway, as 1.2.11 release had some issues... Also, when reporting, please specify whether this is with or without flyinwill's patch.

By: David Cornewell (dcornewell) 2006-09-13 15:03:21

when I said "1.2.11 with the patch", I meant with flyinwill's patch.

We upgraded to 1.2.12.1 last night.  We did not apply the patch.  If parking locks again, i'll apply the patch and try again.

By: Serge Vecher (serge-v) 2006-09-13 15:05:57

dcornewell: sorry, I've overlooked that ... Been looking at too many bugs today ...

By: David Cornewell (dcornewell) 2006-10-06 11:43:02

Ok, we had parking lock again, so I patched it and it locked again the next day.  V1.2.12.1.  When I log onto the console, there are a lot of the messages below. I also noticed there were 3 AGI applications that seemed locked up.  They weren't in the list when I did an 'asterisk -rx "show channels"', but had a process is ps.  We've noticed them before, but always when parking is locked, never when it is working. Related?  I may move them to another box and use FASTAGI. They are in C and I just finished fastcagi so I can do it.

Messages on console:
Oct  5 20:10:31 ERROR[17200]: ../include/asterisk/lock.h:248 __ast_pthread_mutex_lock: res_features.c line 1505 (do_parking_thread): '&parking_lock' was locked here.
Oct  5 20:10:31 ERROR[5195]: include/asterisk/lock.h:245 __ast_pthread_mutex_lock: channel.c line 1843 (ast_read): Deadlock? waited 13090 sec for mutex '&chan->lock'?
Oct  5 20:10:31 ERROR[5195]: include/asterisk/lock.h:248 __ast_pthread_mutex_lock: chan_sip.c line 6762 (get_sip_pvt_byid_locked): '&chan->lock' was locked here.
Oct  5 20:10:31 ERROR[17200]: ../include/asterisk/lock.h:245 __ast_pthread_mutex_lock: res_features.c line 1708 (park_exec): Deadlock? waited 12965 sec for mutex '&parking_lock'?
Oct  5 20:10:31 ERROR[17200]: ../include/asterisk/lock.h:248 __ast_pthread_mutex_lock: res_features.c line 1505 (do_parking_thread): '&parking_lock' was locked here.
Oct  5 20:10:33 ERROR[17386]: ../include/asterisk/lock.h:245 __ast_pthread_mutex_lock: res_features.c line 291 (ast_park_call): Deadlock? waited 12885 sec for mutex '&parking_lock'?
Oct  5 20:10:33 ERROR[17386]: ../include/asterisk/lock.h:248 __ast_pthread_mutex_lock: res_features.c line 1505 (do_parking_thread): '&parking_lock' was locked here.



By: Serge Vecher (serge-v) 2006-10-06 11:58:31

dcornewell: Asterisk 1.2.12.1 had a known problem with features ... Can you please try the tests with the latest 1.2 branch code, where the problem was fixed?

By: jmls (jmls) 2006-11-05 12:48:54.000-0600

dcornewell: were you able to try with the latest 1.2 branch code ? If so, have you had problems since ?

By: David Cornewell (dcornewell) 2006-11-06 09:08:44.000-0600

We are unwilling to go to the latest SVN branch. We decided that loosing parking once every week or two is not bad enough to risk it. I see that 1.2.13 is out.  Would that contain the changes?  If so, we'll try out a release.  Our parking just locked up today so it would be a good time.

By: Serge Vecher (serge-v) 2006-11-09 11:14:59.000-0600

dcornewell: yes, 1.2.13 has the fix I've mentioned. Please update and report back.

By: Serge Vecher (serge-v) 2007-01-09 12:53:11.000-0600

Please reopen the bug if you are able to reproduce this problem with the latest 1.2 or 1.4 release. Thanks.