Summary: | ASTERISK-02985: Crash on masquerading <ZOMBIE> channels | ||
Reporter: | paradise (paradise) | Labels: | |
Date Opened: | 2004-12-14 03:20:48.000-0600 | Date Closed: | 2008-01-15 15:22:07.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) BackTrace5.txt ( 1) BackTrace6.txt ( 2) BtFull6.txt ( 3) CrachDebug6.txt ( 4) CrashDebug.txt ( 5) CrashDebug2.txt ( 6) CrashDebug3.txt ( 7) CrashDebug4.txt ( 8) CrashDebug5.txt ( 9) SipDebug5.txt (10) SipDebug6.txt | |
Description: | I've recently faced numerous crashes(at least once per day) on my * box which is running in a real world environment. by capturing the debug information and logging the crashes. I found that at the crash time * tries to run the function ast_channel_masquerade() on a sip channel which is ZOMBIE. ****** ADDITIONAL INFORMATION ****** by reading the code in channel.c it seems that the mentioned function is aware of zombie channels. at this point i'm unable to find out what's going on there. the log for my two recent crashes with gdb result of core dump files is attached. (/usr/sbin/asterisk is not stripped) | ||
Comments: | By: paradise (paradise) 2004-12-14 11:35:49.000-0600 got another crash! with the same debug logs at the same point... By: Mark Spencer (markster) 2004-12-14 14:11:50.000-0600 you should run gdb ./asterisk core.foo that should give us some symbols. Without symbols the backtrace is entirely useless. Also, it might be helpful to try to narrow down the sequence of events that is causing this to occur. By: paradise (paradise) 2004-12-15 00:49:57.000-0600 This is a more detailed gdb output. I'm going to learn: how to narrow the sequence of events. thanks. By: paradise (paradise) 2004-12-15 02:15:35.000-0600 as the crash always occurs on SIP ZOMBIE Channels, by following the code i found that the only ast_mutex_lock() is run from sip_fixup() in chan_sip.c and seems that the crash happens at that point. By: paradise (paradise) 2004-12-15 13:53:58.000-0600 Sorry! the last uploaded file has no useful info. please use CrashDebug3.txt which contains the last 4 crashes gdb bt info. hope that, it's a real bug not a support issue! thanks edited on: 12-15-04 16:00 By: Mark Spencer (markster) 2004-12-15 14:45:29.000-0600 Have you patched your sources in any way? By: Mark Spencer (markster) 2004-12-15 15:16:12.000-0600 Please build with "make clean ; make valgrind ; make install", be sure you're on latest cvs head, and try to collect other information about the sequence of events which triggers this problem. It would appear to be related to an unusual timing of events (perhaps a hangup and a transfer occuring at nearly identical times)? By: paradise (paradise) 2004-12-15 15:54:19.000-0600 I'm sure that my * is not patched. does it make sense to use CVS HEAD or STABLE(which i prefer) branch? By: Mark Spencer (markster) 2004-12-15 16:56:55.000-0600 Use stable to confirm that the problem hasn't already been fixed. By: paradise (paradise) 2004-12-15 17:18:13.000-0600 done with latest CVS Stable and set debug to 10. is it enough? By: Mark Spencer (markster) 2004-12-15 18:28:41.000-0600 Sorry I meant use "head", my fault! By: paradise (paradise) 2004-12-15 23:23:14.000-0600 OK is it necessary to use "make valgrind" instead of "make"? By: Mark Spencer (markster) 2004-12-17 10:13:16.000-0600 Yes, that way the backtrace will hopefully be more meaningful that what you have suppplied so far. By: paradise (paradise) 2004-12-17 10:22:22.000-0600 done with latest CVS HEAD. now i'm waiting for a crash... ;-) edited on: 12-17-04 10:23 By: paradise (paradise) 2004-12-17 11:22:09.000-0600 got another crash! gdb bt info is attached as CrashDebug4.txt thanks By: paradise (paradise) 2004-12-17 13:30:34.000-0600 is it still needed to stay on CVS HEAD? i feel more safe on CVS Stable! edited on: 12-18-04 01:39 By: paradise (paradise) 2004-12-18 03:18:41.000-0600 got another crash! (i'm switched to cvs stable again) this crash cause differs from last ones: should i open a new bug for this? #0 0x08086dd2 in get_sample (buffer=0xaddca464, len=0xaddca4b4) at fskmodem.c:34 34 retval = (float) **buffer / 256; (gdb) bt #0 0x08086dd2 in get_sample (buffer=0xaddca464, len=0xaddca4b4) at fskmodem.c:34 #1 0x08086ae7 in fsk_serie (fskd=0xadb99008, buffer=0xb5300000, len=0xaddca4b4, outbyte=0xaddca4ac) at fskmodem.c:211 #2 0x08083959 in callerid_feed (cid=0xadb99008, ubuf=0xaddca8d0 ")/;GTL????????????????????????????)\0169??????????????????????????????? ?????????????????????????????????}??q????????\235\226\216\212\211\210\211 \211\212\214\221??1!\032\025\006\002\002\002\002\002\004\016\0353\237\207 \216?\221\225\216\216\216\216\216\217\221\221\225\230\231\234", len=160, codec=4) at callerid.c:258 #3 0xb6f1f586 in ss_thread (data=0xb3e2fc20) at chan_zap.c:5390 #4 0xb7fb998c in start_thread () from /lib/tls/libpthread.so.0 ASTERISK-1 0xb7eaa7da in clone () from /lib/tls/libc.so.6 edited on: 12-18-04 03:20 By: Mark Spencer (markster) 2004-12-18 08:27:11.000-0600 Please, if you want my help in fixing your problem, try to do the diagnostics I ask you to perform, using the versions I ask you to run and the compile commands, debugging commands and backtracing commands that I ask you to execute. Run on CVS head, with a "make clean ; make valgrind ; make install" and be sure you have clean zaptel and libpri builds and get me a backtrace of the original problem you are trying to solve. And one last time, please try to give a better description of the timing of events at the time of the crash -- clearly there appears to be something unusual here such as a transfer and hangup occuring at the same time. By: paradise (paradise) 2004-12-18 08:38:14.000-0600 The CrashDebug4.txt is what you want... this crash is occured when i was using latest CVS HEAD and the code was compile with the options you said. this is the debug logs at the crash time: (please let me know if more debuging and logging is needed) Dec 17 20:33:43 DEBUG[2446]: Got RTCP report of 44 bytes Dec 17 20:33:43 DEBUG[2446]: Generator got voice, switching to phase locked mode Dec 17 20:33:43 DEBUG[2446]: Scheduling timer at 0 sample intervals Dec 17 20:33:47 DEBUG[2446]: Got RTCP report of 68 bytes Dec 17 20:33:47 DEBUG[2446]: Read 480 bytes of audio while expecting 640 Dec 17 20:33:47 DEBUG[2446]: Set channel SIP/2020-2a12 to write format slin Dec 17 20:33:47 DEBUG[2446]: Scheduling timer at 160 sample intervals Dec 17 20:33:47 DEBUG[2446]: Set channel SIP/2020-9f57 to write format ulaw Dec 17 20:33:47 DEBUG[2446]: Scheduling timer at 0 sample intervals Dec 17 20:33:47 DEBUG[2446]: Stopping retransmission on '763e854a3ff4ddc47793a72970b3ee06@172.22.22. 1' of Response 7283245: Found Dec 17 20:33:47 DEBUG[2446]: Stopping retransmission on '722272257225VhGZ-2020--2020@172.22.22.11' o f Response 7267348: Found Dec 17 20:33:47 DEBUG[2446]: Got RTCP report of 44 bytes Dec 17 20:33:47 DEBUG[2446]: Generator got voice, switching to phase locked mode Dec 17 20:33:47 DEBUG[2446]: Scheduling timer at 0 sample intervals Dec 17 20:33:47 DEBUG[2446]: Difference is 35848, ms is 4501 Dec 17 20:33:51 DEBUG[2446]: Got RTCP report of 68 bytes Dec 17 20:33:52 DEBUG[2446]: Got RTCP report of 68 bytes Dec 17 20:33:54 DEBUG[2446]: We found a REFER! Dec 17 20:33:54 DEBUG[2446]: Assigning Replace-Call-ID Info 722272257225VhGZ-2020--2020@172.22.22.11 to REPLACE_CALL_ID Dec 17 20:33:54 DEBUG[2446]: 202 Accepted (supervised) Dec 17 20:33:54 DEBUG[2446]: Set channel SIP/2020-2a12 to write format ulaw Dec 17 20:33:54 DEBUG[2446]: Scheduling timer at 0 sample intervals Dec 17 20:33:54 DEBUG[2446]: Planning to masquerade SIP/2020-2a12 into the structure of SIP/2020-2a1 2 Dec 17 20:33:54 DEBUG[2446]: Done planning to masquerade SIP/2020-2a12 into the structure of SIP/202 0-2a12 Dec 17 20:33:54 DEBUG[2446]: Actually Masquerading SIP/2020-2a12(6) into the structure of SIP/2020-2 a12(6) Dec 17 20:33:54 DEBUG[2446]: Got clone lock on 'SIP/2020-2a12' at 0x9d3d7240 Dec 17 20:33:54 DEBUG[2446]: sip_hangup(SIP/2020-2a12<MASQ>) Dec 17 20:33:54 DEBUG[2446]: update_user_counter(2020) - decrement inUse counter Dec 17 20:33:54 DEBUG[2446]: Set channel SIP/2020-2a12<ZOMBIE> to write format ulaw Dec 17 20:33:54 DEBUG[2446]: Set channel SIP/2020-2a12<ZOMBIE> to read format ulaw Dec 17 20:33:54 DEBUG[2446]: Putting channel SIP/2020-2a12<ZOMBIE> in 4/4 formats By: Mark Spencer (markster) 2004-12-18 17:55:15.000-0600 I would also like to see the corresponding SIP debug. Look at the events that are going on: First, we get a REFER... But then, before the REFER can even complete, we get a hangup on the masqueraded channel. It would be great if threre was just a bit more info there. By: paradise (paradise) 2004-12-18 23:58:15.000-0600 It's almost impossible to generate the crash situation for me. but i can log sip debug messages too. is it enough? is there any way to send sip debug messages to a specific file? thanks By: paradise (paradise) 2004-12-19 10:19:53.000-0600 another crash! this time i've brought more information: CrashDebug5.txt: which contains the * debug log. SipDebug5.txt: contains sip debug at crash time. BackTrace5.txt: the backtrace of core dump file. By: Mark Spencer (markster) 2004-12-19 12:51:37.000-0600 Clearly your line numbers in your backtrace indicate you are not running CVS head or that you have it patched in some way. I have made a few locking changes in CVS head you should probably upgrade to, but I will be a little surprised if they are related to the problem you're seeing. If you need to, just totally erase everything you have and check out a clean copy, but whatever you have to do, get a clean version of CVS head running, don't add any patches to it whatsoever, build it with the "make clean ; make valgrind ; make install" and make the problem happen. By: Mark Spencer (markster) 2004-12-22 22:27:15.000-0600 One more time here, please test this with latest CVS and no patches (make clean ; make valgrind ; make install) and provide us an additional backtrace. Thanks. By: paradise (paradise) 2004-12-22 23:56:11.000-0600 I've just got the CVS HEAD and will run my box with it. thanks By: Mark Spencer (markster) 2004-12-24 03:55:14.000-0600 Any update here? By: paradise (paradise) 2004-12-24 08:52:15.000-0600 just waiting for a crash ;-) i will let you know the result. thanks By: Mark Spencer (markster) 2004-12-28 15:53:30.000-0600 No crash yet? Do you think the problem was a phantom that is already gone? By: paradise (paradise) 2004-12-30 08:29:15.000-0600 Phantom Back again! i'm using CVS-HEAD-12/22/04 CrashDebug6.txt, SipDebug6.txt, BackTrace6.txt. By: Mark Spencer (markster) 2004-12-30 18:57:09.000-0600 If you haven't CVS updated, I'd like to login to your machine and mess around with your core dump. Please find me on IRC. By: paradise (paradise) 2004-12-31 00:20:26.000-0600 I've updated to latest CVS-HEAD when(12-22-04 22:27) you said. is it needed to update my * everyday by each CVS change? btw, there is no internet link to the running box, but a dialup connection to the building which i run the box there. i can put the core dump file any where you want. is it enough? By: paradise (paradise) 2004-12-31 00:29:50.000-0600 BTW, i've also added a bt full of core dump file. By: Mark Spencer (markster) 2005-01-09 03:37:27.000-0600 Ah ha. This is happening because somehow we are trying to masquerade a channel into itself. This behavior should now be fixed in CVS, although I'm still not sure of the root cause. Look for: "Supervised transfer attempted to transfer into same call id" OR "Can't masquerade channel" in your log files even if it doesn't crash, after updating to latest CVS. By: paradise (paradise) 2005-01-09 08:13:23.000-0600 thanks! will be added to CVS Stable? edited on: 01-09-05 08:15 By: Mark Spencer (markster) 2005-01-09 12:18:54.000-0600 Well let me know if this fixes it first. By: paradise (paradise) 2005-01-12 02:11:36.000-0600 from last three days i just got one log regarding this bug: Jan 12 10:19:43 WARNING[2445]: Can't masquerade channel 'SIP/2218-88a6' into its elf! Jan 12 10:19:43 WARNING[2445]: Failed to masquerade SIP/2218-88a6 into SIP/2218- 88a6 at this event my box hasn't crashed. :-) By: Mark Spencer (markster) 2005-01-16 02:06:49.000-0600 Okay great, this last change should make it not even show the error message. Find me on IRC or reopen if you still get it after updating to latest CVS head. By: Russell Bryant (russell) 2005-01-16 18:46:43.000-0600 fixed in 1.0 By: Digium Subversion (svnbot) 2008-01-15 15:20:44.000-0600 Repository: asterisk Revision: 4726 U trunk/channel.c U trunk/channels/chan_sip.c ------------------------------------------------------------------------ r4726 | markster | 2008-01-15 15:20:43 -0600 (Tue, 15 Jan 2008) | 2 lines Don't allow masquerading into oneself (bug ASTERISK-2985) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=4726 By: Digium Subversion (svnbot) 2008-01-15 15:21:38.000-0600 Repository: asterisk Revision: 4788 U branches/v1-0/channels/chan_sip.c ------------------------------------------------------------------------ r4788 | russell | 2008-01-15 15:21:38 -0600 (Tue, 15 Jan 2008) | 2 lines Don't allow masquerading into oneself (bug ASTERISK-2985) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=4788 By: Digium Subversion (svnbot) 2008-01-15 15:22:02.000-0600 Repository: asterisk Revision: 4811 U trunk/channels/chan_sip.c ------------------------------------------------------------------------ r4811 | markster | 2008-01-15 15:22:02 -0600 (Tue, 15 Jan 2008) | 2 lines Don't even attempt to masquerade a channel into itself (bug ASTERISK-2985) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=4811 By: Digium Subversion (svnbot) 2008-01-15 15:22:07.000-0600 Repository: asterisk Revision: 4817 U branches/v1-0/channels/chan_sip.c ------------------------------------------------------------------------ r4817 | russell | 2008-01-15 15:22:06 -0600 (Tue, 15 Jan 2008) | 2 lines Don't even attempt to masquerade a channel into itself (bug ASTERISK-2985) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=4817 |