[Home]

Summary:ASTERISK-07412: [patch] coredump on blind transfer unless compiled with DEBUG_CHANNEL_LOCKS
Reporter:Kaloyan Kovachev (knk)Labels:
Date Opened:2006-07-28 10:46:12Date Closed:2007-01-10 13:36:21.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/Transfers
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) console_crash.txt
( 1) console_ok.txt
( 2) fix.diff
( 3) nochanges_bt_full.txt
( 4) with_debug_locks.txt.gz
Description:I don't think this is related in any way with 7557, as i have succeded to reproduce it without using local channel (it just happends more frequently i think and the bt shows something diferent in my case) and please don't hate me as this is the third coredump i have managed to cause this week :)
It crashes immedeately on transfer. I have setup a very simple dialplan - only Dial command with options TWHKtwhk

****** ADDITIONAL INFORMATION ******

Not sure yet if it is related or candidate(s) for another bug report, but ...
without DEBUG_CHANNEL_LOCKS

ERROR[16924]: ../include/asterisk/lock.h:405 __ast_pthread_mutex_unlock: chan_sip.c line 13414 (handle_request_refer): mutex '&current.chan2->lock' freed more times than we've locked!
ERROR[16924]: ../include/asterisk/lock.h:418 __ast_pthread_mutex_unlock: chan_sip.c line 13414 (handle_request_refer): Error releasing mutex: Operation not permitted

with DEBUG_CHANNEL_LOCKS

ERROR[32202]: include/asterisk/lock.h:405 __ast_pthread_mutex_unlock: channel.c line 4325 (ast_channel_unlock): mutex '&chan->lock' freed more times than we've locked!
ERROR[32202]: include/asterisk/lock.h:418 __ast_pthread_mutex_unlock: channel.c line 4325 (ast_channel_unlock): Error releasing mutex: Operation not permitted

sometimes (when not crashed) i get
WARNING[19890]: chan_sip.c:11875 handle_response: Notify answer on an owned channel?

i don't get (in this case) DTMF logged to the console even it is on in logger.conf and there is no audio at all (during the first call), but when transfered (without crashing) i can hear the other end ringing and later the audio (and also get the DTMFs), when transfered to MOH (or dialed directly) i can hear the music and DTMFs are shown to the console.
Comments:By: Serge Vecher (serge-v) 2006-07-28 11:18:27

KNK: do you want to post the snippet of the dialplan and also say how do you make a sip transfer (via native transfer button or a feature). Perhaps a sip debug would be useful too...

By: Kaloyan Kovachev (knk) 2006-07-29 09:49:33

the dialplan is simple:
exten => 1005,1,Dial(SIP/1005||TWHKtwhk)
exten => 1005,n,Hangup()

exten => 1006,1,Dial(Local/1007@Internal)
exten => 1006,n,Hangup()

exten => 1007,1,Dial(SIP/1006||TWHKtwhk)
exten => 1007,n,Hangup()
______
as DTMFs are not recognised i have used the forward button of ATcom's AT320 (ext 1070) to make the transfers. It makes blind transfer and first i was testing by transefering the call 1070-1005 to 1006, but to exclude the local channel the coredump attached is while transefring to 1007 (expect new bridge after transfer SIP/1005-SIP/1006). 1005 and 1006 are the two ports of a SPA2100.
it happends randomly, but will grab the sip debug (with and without crash) on tuesday.

By: Kaloyan Kovachev (knk) 2006-07-31 10:01:04

i was lucky in causing a crash from the first atempt, so here is the SIP debug

By: Kaloyan Kovachev (knk) 2006-07-31 10:22:15

and a full log file for a succeeded transfer

By: Serge Vecher (serge-v) 2006-07-31 10:23:12

Hmm, I don't see the [DEBUG] output in this log. Can you please make sure your logger.conf has the following line:
  console => notice,warning,error,debug

Restart asterisk and see if you can reproduce the problem again?
Also, is that bt from non-optimized build?

By: Kaloyan Kovachev (knk) 2006-07-31 10:32:15

just posted from file instead of console ... should i repeat it from console?
yes it is non-optimized built with the folowing config:
[ ] 1.  DEBUG_CHANNEL_LOCKS
[*] 2.  DEBUG_SCHEDULER
[*] 3.  DEBUG_THREADS
[ ] 4.  DETECT_DEADLOCKS
[*] 5.  DONT_OPTIMIZE
[ ] 6.  DUMP_SCHEDULER
[ ] 7.  LOW_MEMORY
[*] 8.  MALLOC_DEBUG
[ ] 9.  MTX_PROFILE
[ ] 10. RADIO_RELAX
[ ] 11. TRACE_FRAMES
and the version is still 38389 (not updated yet)

By: Serge Vecher (serge-v) 2006-07-31 10:37:15

KNK: console output is preferrable, as some info is not written to a file even with the same logger settings.

By: Kaloyan Kovachev (knk) 2006-08-01 10:19:30

and here is the console output in both cases with debug and verbosity set to 100
from the logs - asterisk is crashing exactly during
"ERROR[1310]: ../include/asterisk/lock.h:405 __ast_pthread_mutex_unlock:  chan_sip.c line 13414 (handle_request_refer): mutex '&current.chan2->lock' freed more times than we've locked!"
and the last file with_debug_locks.txt - when DEBUG_CHANNEL_LOCKS is enabled asterisk is not crashing

By: Serge Vecher (serge-v) 2006-08-01 16:29:17

interesting ... Please always attach logs as text files for easier review. Thanks.

By: Kaloyan Kovachev (knk) 2006-08-02 08:14:05

sorry, the last file was over 1Mb, so i have decided to compress it.

Looking at the code the second channel is never locked. In fact it may not even exist, so it is safe to delete the line completely (too simple, but attaching the patch). Tested and there are no crashes till now, but there is still no audio during the first call and even DTMFs are not recognized, so i will svn update and post another bug if it still the same.

vechers: is it OK, if just point to the logs attached here or they should be attached once again (with the fix applied).

By: Serge Vecher (serge-v) 2006-08-02 08:57:00

1. Alright, I'm moving this to SIP Transfers.
2. I've noticed that your revision indicates code mods. Could you please elaborate... Perhaps you want to do distclean and make a fresh checkout (save menuselect preferences first)?
3. I think the logs you've provided are sufficient for now. If you get an opportunity, please redo the log with DEBUG_CHANNEL_LOCKS with a limited amount of endpoints involved, so it's easier to process ;)

By: Kaloyan Kovachev (knk) 2006-08-02 09:39:09

yes there changes made for the bug 6335 (got this crash while testing it), but except the change in the config structure the code is not involved in the call at all, because timelimit_recheck is 0.
i have tested the clean code (without changes) before opening the bug and there was a crash, but will add the coredump in few minutes without the changes (the revision is now 38730 and there is still no audio during the first call, but this is for the next bugid)
there are just 3 endpoints (1070, 1005 and 1006) and it is the minimum possible!? as this is not a production server they are also the only endpoints registering to this server.

By: Serge Vecher (serge-v) 2006-09-06 10:36:25

KNK: is this still an issue in latest trunk or are you running this with that little patch?

By: Kaloyan Kovachev (knk) 2006-09-07 03:07:17

I am running patched asterisk without the second unlock, but will try again and will post the results

By: Kaloyan Kovachev (knk) 2006-09-07 05:34:31

yes it is still the same (it crashed in about 20 attempts), but with other DEBUG options (SCHEDULER and THREADS) turned on.
with DEBUG_CHANNEL_LOCKS and DETECT_DEADLOCKS i get:
ERROR[2746]: ../include/asterisk/lock.h:405 __ast_pthread_mutex_unlock: channel.c line 4401 (ast_channel_unlock): mutex '&chan->lock' freed more times than we've locked!
ERROR[2746]: ../include/asterisk/lock.h:418 __ast_pthread_mutex_unlock: channel.c line 4401 (ast_channel_unlock): Error releasing mutex: Operation not permitted

without DEBUG_CHANNEL_LOCKS, but with DETECT_DEADLOCKS:
ERROR[7356] ../include/asterisk/lock.h: chan_sip.c line 13499 (handle_request_refer): mutex '&current.chan2->lock' freed more times than we've locked!
ERROR[7356] ../include/asterisk/lock.h: chan_sip.c line 13499 (handle_request_refer): Error releasing mutex: Operation not permitted

couldn't crash it with all DEBUG options off and with optimization yet.

By: jmls (jmls) 2006-11-01 10:29:08.000-0600

ping. Any news or comments people ?

By: John Lange (johnlange) 2006-11-13 12:37:48.000-0600

This patch fixed the unlock error message but does not fix the underlying problem with transfers.

If you blind transfer a few times then do a "show channels" you get:

*CLI> sip show channels
Peer             User/ANR    Call ID      Seq (Tx/Rx)  Form  Hold     Last Message
192.168.5.170    2049757118  34e6d35a3dd  00102/00000  ulaw  No       Tx: ACK
192.168.5.180    2049757116  3fb64ab8146  00104/00102  unkn  No       Rx: BYE         Done
192.168.5.160    2049757113  ce4270a5-bb  00101/00102  ulaw  No       Rx: ACK
192.168.5.180    2049757117  6f9295f1610  00124/00102  unkn  No       Tx: BYE         Done
192.168.5.180    2049757117  48a77de8698  00126/00102  unkn  No       Tx: BYE         Done
192.168.5.180    2049757116  27f4b930481  00128/00102  unkn  No       Tx: BYE         Done
6 active SIP channels

-------

The broken call legs never go away and thereafter you get a flood of:

[Nov 13 12:27:29] WARNING[7641]: chan_sip.c:12033 handle_response: Remote host can't match request BYE to call '27f4b9304812d02b7d57d1967378f22d@192.168.5.101'. Giving up.
[Nov 13 12:27:38] WARNING[7641]: chan_sip.c:12033 handle_response: Remote host can't match request BYE to call '48a77de86985725f14477916478cc515@192.168.5.101'. Giving up.
[Nov 13 12:27:39] WARNING[7641]: chan_sip.c:12033 handle_response: Remote host can't match request BYE to call '3fb64ab8146d4186225319381f6a7362@192.168.5.101'. Giving up.

-----
which continues indefinately.

See also:

http://bugs.digium.com/view.php?id=8336

http://bugs.digium.com/view.php?id=8340

By: Serge Vecher (serge-v) 2006-11-17 09:47:13.000-0600

johnlange: can you please enable sip debug and attach the output here from the recent 1.4 / trunk?

By: Anthony LaMantia (alamantia) 2007-01-04 16:41:35.000-0600

johnlange: do you have the output serge-v asked for?

By: John Lange (johnlange) 2007-01-04 18:22:59.000-0600

I'm sorry but I no longer have this test environment setup or any machines running 1.4. I'm side tracked on other projects at the moment but once i get a chance I'll test this again.

By: Serge Vecher (serge-v) 2007-01-05 08:30:29.000-0600

KNK, are you still seeing this issue? Can you upload the debug log?

By: Kaloyan Kovachev (knk) 2007-01-05 08:49:32.000-0600

i will be able to test this during the next week (not earlier, sorry) and will post the results here.
as a side note in my setup i always use 4+ -v options at startup and never had left over channels as 'stop gracefully' always suceeded, but will check this too with and without the patch

By: Kaloyan Kovachev (knk) 2007-01-10 04:37:52.000-0600

Tested 1.4 release and couldn't cause a crash (in about 20-30 transfers), but there are other problems now ...

A device (confirmed with PAP2 and IP phone - AT320) can't register if it listens on port 5060, but changing the port (to 5062 or any other) solves the problem. By attaching netcat to UDP 5060, it shows that the SIP packets are received from the device, but SIP debug does not show the packet at all.

Every transfer (blind or attended, from the device itself or with functions.conf feature code) completes 10 seconds after hangup from the transferee and falls to the "t" extension in the context of the transferrer (tested with 3 different contexts).
I will repeat the tests with the latest 1.4 revision later and will post the bugs if confirmed.

By: Serge Vecher (serge-v) 2007-01-10 10:47:34.000-0600

ok, since the original issue is gone, let's close this report. The linksys port issue is a new one, so I'd suggesting opening a new report for that with a sip debug and packet capture attached.

There are a couple of reports open with transfer not working properly in 1.4, please supply the sip debugs in one of those reports.

Thanks for testing and reporting!