[Home]

Summary:ASTERISK-02985: Crash on masquerading <ZOMBIE> channels
Reporter:paradise (paradise)Labels:
Date Opened:2004-12-14 03:20:48.000-0600Date Closed:2008-01-15 15:22:07.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) BackTrace5.txt
( 1) BackTrace6.txt
( 2) BtFull6.txt
( 3) CrachDebug6.txt
( 4) CrashDebug.txt
( 5) CrashDebug2.txt
( 6) CrashDebug3.txt
( 7) CrashDebug4.txt
( 8) CrashDebug5.txt
( 9) SipDebug5.txt
(10) SipDebug6.txt
Description:I've recently faced numerous crashes(at least once per day) on my * box which is running in a real world environment.
by capturing the debug information and logging the crashes.
I found that at the crash time * tries to run the function ast_channel_masquerade() on a sip channel which is ZOMBIE.


****** ADDITIONAL INFORMATION ******

by reading the code in channel.c it seems that the mentioned function is aware of zombie channels.
at this point i'm unable to find out what's going on there.
the log for my two recent crashes with gdb result of core dump files is attached. (/usr/sbin/asterisk is not stripped)
Comments:By: paradise (paradise) 2004-12-14 11:35:49.000-0600

got another crash! with the same debug logs at the same point...

By: Mark Spencer (markster) 2004-12-14 14:11:50.000-0600

you should run gdb ./asterisk core.foo

that should give us some symbols.  Without symbols the backtrace is entirely useless.

Also, it might be helpful to try to narrow down the sequence of events that is causing this to occur.

By: paradise (paradise) 2004-12-15 00:49:57.000-0600

This is a more detailed gdb output.
I'm going to learn: how to narrow the sequence of events.
thanks.

By: paradise (paradise) 2004-12-15 02:15:35.000-0600

as the crash always occurs on SIP ZOMBIE Channels,
by following the code i found that the only ast_mutex_lock() is run from
sip_fixup() in chan_sip.c and seems that the crash happens at that point.

By: paradise (paradise) 2004-12-15 13:53:58.000-0600

Sorry! the last uploaded file has no useful info.
please use CrashDebug3.txt which contains the last 4 crashes gdb bt info.
hope that, it's a real bug not a support issue!
thanks

edited on: 12-15-04 16:00

By: Mark Spencer (markster) 2004-12-15 14:45:29.000-0600

Have you patched your sources in any way?

By: Mark Spencer (markster) 2004-12-15 15:16:12.000-0600

Please build with "make clean ; make valgrind ; make install", be sure you're on latest cvs head, and try to collect other information about the sequence of events which triggers this problem.  It would appear to be related to an unusual timing of events (perhaps a hangup and a transfer occuring at nearly identical times)?

By: paradise (paradise) 2004-12-15 15:54:19.000-0600

I'm sure that my * is not patched.
does it make sense to use CVS HEAD or STABLE(which i prefer) branch?

By: Mark Spencer (markster) 2004-12-15 16:56:55.000-0600

Use stable to confirm that the problem hasn't already been fixed.

By: paradise (paradise) 2004-12-15 17:18:13.000-0600

done with latest CVS Stable and set debug to 10.
is it enough?

By: Mark Spencer (markster) 2004-12-15 18:28:41.000-0600

Sorry I meant use "head", my fault!

By: paradise (paradise) 2004-12-15 23:23:14.000-0600

OK
is it necessary to use "make valgrind" instead of "make"?

By: Mark Spencer (markster) 2004-12-17 10:13:16.000-0600

Yes, that way the backtrace will hopefully be more meaningful that what you have suppplied so far.

By: paradise (paradise) 2004-12-17 10:22:22.000-0600

done with latest CVS HEAD.
now i'm waiting for a crash... ;-)

edited on: 12-17-04 10:23

By: paradise (paradise) 2004-12-17 11:22:09.000-0600

got another crash!
gdb bt info is attached as CrashDebug4.txt
thanks

By: paradise (paradise) 2004-12-17 13:30:34.000-0600

is it still needed to stay on CVS HEAD?
i feel more safe on CVS Stable!

edited on: 12-18-04 01:39

By: paradise (paradise) 2004-12-18 03:18:41.000-0600

got another crash! (i'm switched to cvs stable again)
this crash cause differs from last ones:
should i open a new bug for this?

#0  0x08086dd2 in get_sample (buffer=0xaddca464, len=0xaddca4b4) at fskmodem.c:34
34              retval = (float) **buffer / 256;
(gdb) bt
#0  0x08086dd2 in get_sample (buffer=0xaddca464, len=0xaddca4b4) at fskmodem.c:34
#1  0x08086ae7 in fsk_serie (fskd=0xadb99008, buffer=0xb5300000, len=0xaddca4b4, outbyte=0xaddca4ac) at fskmodem.c:211
#2  0x08083959 in callerid_feed (cid=0xadb99008,
   ubuf=0xaddca8d0 ")/;GTL????????????????????????????)\0169???????????????????????????????
?????????????????????????????????}??q????????\235\226\216\212\211\210\211
\211\212\214\221??1!\032\025\006\002\002\002\002\002\004\016\0353\237\207
\216?\221\225\216\216\216\216\216\217\221\221\225\230\231\234", len=160, codec=4) at callerid.c:258
#3  0xb6f1f586 in ss_thread (data=0xb3e2fc20) at chan_zap.c:5390
#4  0xb7fb998c in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-1  0xb7eaa7da in clone () from /lib/tls/libc.so.6

edited on: 12-18-04 03:20

By: Mark Spencer (markster) 2004-12-18 08:27:11.000-0600

Please, if you want my help in fixing your problem, try to do the diagnostics I ask you to perform, using the versions I ask you to run and the compile commands, debugging commands and backtracing commands that I ask you to execute.

Run on CVS head, with a "make clean ; make valgrind ; make install" and be sure you have clean zaptel and libpri builds and get me a backtrace of the original problem you are trying to solve.

And one last time, please try to give a better description of the timing of events at the time of the crash -- clearly there appears to be something unusual here such as a transfer and hangup occuring at the same time.

By: paradise (paradise) 2004-12-18 08:38:14.000-0600

The CrashDebug4.txt is what you want...
this crash is occured when i was using latest CVS HEAD and the code was compile with the options you said.
this is the debug logs at the crash time:
(please let me know if more debuging and logging is needed)

Dec 17 20:33:43 DEBUG[2446]: Got RTCP report of 44 bytes
Dec 17 20:33:43 DEBUG[2446]: Generator got voice, switching to phase locked mode
Dec 17 20:33:43 DEBUG[2446]: Scheduling timer at 0 sample intervals
Dec 17 20:33:47 DEBUG[2446]: Got RTCP report of 68 bytes
Dec 17 20:33:47 DEBUG[2446]: Read 480 bytes of audio while expecting 640
Dec 17 20:33:47 DEBUG[2446]: Set channel SIP/2020-2a12 to write format slin
Dec 17 20:33:47 DEBUG[2446]: Scheduling timer at 160 sample intervals
Dec 17 20:33:47 DEBUG[2446]: Set channel SIP/2020-9f57 to write format ulaw
Dec 17 20:33:47 DEBUG[2446]: Scheduling timer at 0 sample intervals
Dec 17 20:33:47 DEBUG[2446]: Stopping retransmission on '763e854a3ff4ddc47793a72970b3ee06@172.22.22.
1' of Response 7283245: Found
Dec 17 20:33:47 DEBUG[2446]: Stopping retransmission on '722272257225VhGZ-2020--2020@172.22.22.11' o
f Response 7267348: Found
Dec 17 20:33:47 DEBUG[2446]: Got RTCP report of 44 bytes
Dec 17 20:33:47 DEBUG[2446]: Generator got voice, switching to phase locked mode
Dec 17 20:33:47 DEBUG[2446]: Scheduling timer at 0 sample intervals
Dec 17 20:33:47 DEBUG[2446]: Difference is 35848, ms is 4501
Dec 17 20:33:51 DEBUG[2446]: Got RTCP report of 68 bytes
Dec 17 20:33:52 DEBUG[2446]: Got RTCP report of 68 bytes
Dec 17 20:33:54 DEBUG[2446]: We found a REFER!
Dec 17 20:33:54 DEBUG[2446]: Assigning Replace-Call-ID Info 722272257225VhGZ-2020--2020@172.22.22.11
to REPLACE_CALL_ID
Dec 17 20:33:54 DEBUG[2446]: 202 Accepted (supervised)
Dec 17 20:33:54 DEBUG[2446]: Set channel SIP/2020-2a12 to write format ulaw
Dec 17 20:33:54 DEBUG[2446]: Scheduling timer at 0 sample intervals
Dec 17 20:33:54 DEBUG[2446]: Planning to masquerade SIP/2020-2a12 into the structure of SIP/2020-2a1
2
Dec 17 20:33:54 DEBUG[2446]: Done planning to masquerade SIP/2020-2a12 into the structure of SIP/202
0-2a12
Dec 17 20:33:54 DEBUG[2446]: Actually Masquerading SIP/2020-2a12(6) into the structure of SIP/2020-2
a12(6)
Dec 17 20:33:54 DEBUG[2446]: Got clone lock on 'SIP/2020-2a12' at 0x9d3d7240
Dec 17 20:33:54 DEBUG[2446]: sip_hangup(SIP/2020-2a12<MASQ>)
Dec 17 20:33:54 DEBUG[2446]: update_user_counter(2020) - decrement inUse counter
Dec 17 20:33:54 DEBUG[2446]: Set channel SIP/2020-2a12<ZOMBIE> to write format ulaw
Dec 17 20:33:54 DEBUG[2446]: Set channel SIP/2020-2a12<ZOMBIE> to read format ulaw
Dec 17 20:33:54 DEBUG[2446]: Putting channel SIP/2020-2a12<ZOMBIE> in 4/4 formats

By: Mark Spencer (markster) 2004-12-18 17:55:15.000-0600

I would also like to see the corresponding SIP debug.  Look at the events that are going on:

First, we get a REFER...  But then, before the REFER can even complete, we get a hangup on the masqueraded channel.  It would be great if threre was just a bit more info there.

By: paradise (paradise) 2004-12-18 23:58:15.000-0600

It's almost impossible to generate the crash situation for me.
but i can log sip debug messages too. is it enough?
is there any way to send sip debug messages to a specific file?
thanks

By: paradise (paradise) 2004-12-19 10:19:53.000-0600

another crash!
this time i've brought more information:
CrashDebug5.txt: which contains the * debug log.
SipDebug5.txt: contains sip debug at crash time.
BackTrace5.txt: the backtrace of core dump file.

By: Mark Spencer (markster) 2004-12-19 12:51:37.000-0600

Clearly your line numbers in your backtrace indicate you are not running CVS head or that you have it patched in some way.  I have made a few locking changes in CVS head you should probably upgrade to, but I will be a little surprised if they are related to the problem you're seeing.

If you need to, just totally erase everything you have and check out a clean copy, but whatever you have to do, get a clean version of CVS head running, don't add any patches to it whatsoever, build it with the "make clean ; make valgrind ; make install" and make the problem happen.

By: Mark Spencer (markster) 2004-12-22 22:27:15.000-0600

One more time here, please test this with latest CVS and no patches (make clean ; make valgrind ; make install) and provide us an additional backtrace.  Thanks.

By: paradise (paradise) 2004-12-22 23:56:11.000-0600

I've just got the CVS HEAD and will run my box with it.
thanks

By: Mark Spencer (markster) 2004-12-24 03:55:14.000-0600

Any update here?

By: paradise (paradise) 2004-12-24 08:52:15.000-0600

just waiting for a crash ;-)
i will let you know the result.
thanks

By: Mark Spencer (markster) 2004-12-28 15:53:30.000-0600

No crash yet?  Do you think the problem was a phantom that is already gone?

By: paradise (paradise) 2004-12-30 08:29:15.000-0600

Phantom Back again!
i'm using CVS-HEAD-12/22/04
CrashDebug6.txt, SipDebug6.txt, BackTrace6.txt.

By: Mark Spencer (markster) 2004-12-30 18:57:09.000-0600

If you haven't CVS updated, I'd like to login to your machine and mess around with your core dump.  Please find me on IRC.

By: paradise (paradise) 2004-12-31 00:20:26.000-0600

I've updated to latest CVS-HEAD when(12-22-04 22:27) you said.
is it needed to update my * everyday by each CVS change?
btw, there is no internet link to the running box, but a dialup connection to the building which i run the box there.
i can put the core dump file any where you want. is it enough?

By: paradise (paradise) 2004-12-31 00:29:50.000-0600

BTW, i've also added a bt full of core dump file.

By: Mark Spencer (markster) 2005-01-09 03:37:27.000-0600

Ah ha.  This is happening because somehow we are trying to masquerade a channel into itself.  This behavior should now be fixed in CVS, although I'm still not sure of the root cause.  Look for:

"Supervised transfer attempted to transfer into same call id"

OR

"Can't masquerade channel"

in your log files even if it doesn't crash, after updating to latest CVS.

By: paradise (paradise) 2005-01-09 08:13:23.000-0600

thanks!
will be added to CVS Stable?

edited on: 01-09-05 08:15

By: Mark Spencer (markster) 2005-01-09 12:18:54.000-0600

Well let me know if this fixes it first.

By: paradise (paradise) 2005-01-12 02:11:36.000-0600

from last three days i just got one log regarding this bug:

Jan 12 10:19:43 WARNING[2445]: Can't masquerade channel 'SIP/2218-88a6' into its
elf!
Jan 12 10:19:43 WARNING[2445]: Failed to masquerade SIP/2218-88a6 into SIP/2218-
88a6

at this event my box hasn't crashed. :-)

By: Mark Spencer (markster) 2005-01-16 02:06:49.000-0600

Okay great, this last change should make it not even show the error message.  Find me on IRC or reopen if you still get it after updating to latest CVS head.

By: Russell Bryant (russell) 2005-01-16 18:46:43.000-0600

fixed in 1.0

By: Digium Subversion (svnbot) 2008-01-15 15:20:44.000-0600

Repository: asterisk
Revision: 4726

U   trunk/channel.c
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r4726 | markster | 2008-01-15 15:20:43 -0600 (Tue, 15 Jan 2008) | 2 lines

Don't allow masquerading into oneself (bug ASTERISK-2985)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=4726

By: Digium Subversion (svnbot) 2008-01-15 15:21:38.000-0600

Repository: asterisk
Revision: 4788

U   branches/v1-0/channels/chan_sip.c

------------------------------------------------------------------------
r4788 | russell | 2008-01-15 15:21:38 -0600 (Tue, 15 Jan 2008) | 2 lines

Don't allow masquerading into oneself (bug ASTERISK-2985)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=4788

By: Digium Subversion (svnbot) 2008-01-15 15:22:02.000-0600

Repository: asterisk
Revision: 4811

U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r4811 | markster | 2008-01-15 15:22:02 -0600 (Tue, 15 Jan 2008) | 2 lines

Don't even attempt to masquerade a channel into itself (bug ASTERISK-2985)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=4811

By: Digium Subversion (svnbot) 2008-01-15 15:22:07.000-0600

Repository: asterisk
Revision: 4817

U   branches/v1-0/channels/chan_sip.c

------------------------------------------------------------------------
r4817 | russell | 2008-01-15 15:22:06 -0600 (Tue, 15 Jan 2008) | 2 lines

Don't even attempt to masquerade a channel into itself (bug ASTERISK-2985)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=4817