[Home]

Summary:ASTERISK-05662: random Crashs when using Monitor on SIP extensions
Reporter:paradise (paradise)Labels:
Date Opened:2005-11-21 01:49:17.000-0600Date Closed:2011-06-07 14:00:29
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) cancel_destroy.diff.txt
( 1) chan_sip.diff.txt
( 2) chan_sip.diff-v2.txt
( 3) chan_sip.diff-v3.txt
( 4) Crash1.txt
( 5) Crash2.txt
( 6) Crash3.txt
( 7) Crash4.txt
( 8) gdblog
( 9) p_pointer.txt
(10) tripwire_crash.txt
Description:i've got two crashes since upgrading to 1.2



****** ADDITIONAL INFORMATION ******

#0  0x00a6edf7 in pthread_mutex_trylock () from /lib/tls/libpthread.so.0
No symbol table info available.
#1  0x08060928 in ast_queue_hangup (chan=0x6d080018) at lock.h:589
       f = {frametype = 4, subclass = 1, datalen = 0, samples = 0, mallocd = 0, offset = 0, src = 0x0, data = 0x0,
 delivery = {tv_sec = 0, tv_usec = 0}, prev = 0x0, next = 0x0}
#2  0xb7c70932 in __sip_autodestruct (data=0xb780af60) at chan_sip.c:1318
       __PRETTY_FUNCTION__ = "__sip_autodestruct"
#3  0x080565b8 in ast_sched_runq (con=0x812e9d0) at sched.c:373
       tv = warning: Unhandled dwarf expresion opcode DW_OP_piece
{tv_sec = 0, tv_usec = 135353244}
       x = 0
       res = 893517
#4  0xb7c9c051 in do_monitor (data=0x0) at chan_sip.c:11251
       res = 0
       sip = (struct sip_pvt *) 0x812e9d0
       peer = (struct sip_peer *) 0x0
       t = 1132559064
       fastrestart = 0
       lastpeernum = -1
       curpeernum = 20
       reloading = 135457232
       __PRETTY_FUNCTION__ = "do_monitor"
ASTERISK-1  0x00a6d1d5 in start_thread () from /lib/tls/libpthread.so.0
No symbol table info available.
ASTERISK-2  0x007ca2da in clone () from /lib/tls/libc.so.6
No symbol table info available.
Comments:By: paradise (paradise) 2005-11-22 00:35:09.000-0600

i still have random crashes when even not using Monitor().
two backtraces are attached.



By: paradise (paradise) 2005-11-23 08:43:23.000-0600

upgraded to CVS HEAD.
but still my box crashes with similar backtraces.
isn't there any comments on this issue?
is it confirmed as a bug?

By: Olle Johansson (oej) 2005-11-23 09:26:54.000-0600

Can you give us a SIP debug up to the crash, so we see what happens? Set verbose to 4, debug to 4 and turn on SIP debug.

By: paradise (paradise) 2005-11-23 12:08:50.000-0600

but i cant re-produce the crash.

By: Mark Spencer (markster) 2005-11-25 13:25:37.000-0600

So the bug doesn't happen any longer?

By: Mark Spencer (markster) 2005-11-29 22:18:06.000-0600

Closing since *apparently* this isn't an issue any longer.

By: paradise (paradise) 2005-12-03 08:54:47.000-0600

this is still an issue for me.
i've noticed that at crash time the last log msg is:

Dec  2 12:51:24 WARNING[11867] chan_sip.c: Autodestruct on call '' with owner in place
Dec  2 10:21:06 WARNING[11867] chan_sip.c: Autodestruct on call '^S�^O' with owner in place

and finally I could avoid these crashes with the uploaded patch.

By: paradise (paradise) 2005-12-04 01:02:02.000-0600

oops! just mixed up with my editor and code formatting.
btw, the last patch works fine.

;-)



By: Olle Johansson (oej) 2005-12-04 20:02:31.000-0600

Paradise, kpfleming and myself do not understand your patches at all. We need more information on the actual problem. For some reason, the patches does not make sense at all and we would really like to understand what is happening on your system. Can you provide us with more information, like debugging output of a larger session?

By: paradise (paradise) 2005-12-04 22:40:34.000-0600

1st i should mention that the subject of this bug is not correct. it's my fault ofcourse. ;-)

i dont really know how and when it occurs! but it seems that at the crash time sip_destroy() attempts to hangup a corrupted session (pvt pointer). and i discovered that most of these corrupted sessions have garbage strings on their "callid" member.
another problem is that i have faced random disconnects while i'm talking on sip devices. by looking at logs i've catch that all these interrupts occur in the same point of code:

if (p->owner) {
 ast_log(LOG_WARNING, "Autodestruct on call '%s' with owner in place\n", p->callid);
 ast_queue_hangup(p->owner);
} else {
 sip_destroy(p);
}

i have also put another patch to show some other pvt members' value at crash time:

Dec  4 15:56:37 WARNING[27715] chan_sip.c: Autodestruct on call '^S�^O' with owner in place.
p->sa: '108.111.99.107'
p->recv: '149.85.10.121'
p->ourip: '115.103.48.48'
p->exten: 'xt'

all of the values are junks!

the funny thing is that when i remove my hints in extensions.conf, my box never  crashes at all. (now i have at least 3 per day).

i have eyebeam and snom phones which are used to monitor extensions.



By: alexb (alexb) 2005-12-05 16:21:49.000-0600

It sounds somehow like issue ASTERISK-5508

Today morning our Asterisk suddenly crashed, just after an incoming call from ISDN. Nobody answered the call 'cause we all were out for lunch. When we came back, the Asterisk service was down and software phones were still ringing! Hardware phones weren't ringing, instead.
BTW, we do not use eyeBeam's presence service anymore, however hints are still in extensions.conf.
I've attached the gdb output, I hope it can help.

Asterisk SVN-trunk-r7230 built by root @ xxx.xxxxx.xx on a i686 running Linux on 2005-11-30 12:17:38 UTC

By: paradise (paradise) 2005-12-06 00:46:52.000-0600

another crashes avoided by my patch. it shows funny strings in callid! seems to be some parts of presence request in eyebeam.

Dec  5 13:32:42 WARNING[23136] chan_sip.c: Autodestruct on call '' with owner in place.
p->sa: '111.110.105.120'
p->recv: '45.84.121.112'
p->ourip: '99.97.116.105'
p->exten: 'idf+xml
Subscription-State: active
Content-Length: 520

<?xml version="1.0" encoding="ISO-8859-1"?>
<presence xmlns="urn:ietf:params:xml:ns:pidf"
xmlns:pp="urn:ietf:params:xml:ns:pidf:person"
xmlns:es="urn:ietf:params:xml:ns:pidf:rpid:status:rpid-status"
xmlns:ep="urn:ietf:params:xml:ns:pidf:rpid:rpid-person"
entity="sip:21@192.168.2.1">
<pp:person><status>
<ep:activities><ep:away/></ep:activities>
</status></pp:person>
<note>Not online</note>
<tuple id="22">
<contact priority="1">sip:22@192.168.2.1</contact>
<status><basic>closed</basic></status>
</tuple>
</presence>
'
Dec  5 13:32:42 WARNING[23136] chan_sip.c: Crash Avoided on "".

Dec  5 13:36:51 WARNING[13269] chan_sip.c: Crash Avoided on "f:params:xml:ns:pidf:rpid:status:rpid-status"
xmlns:ep="urn:ietf:params:xml:ns:pidf:rpid:rpid-person"
entity="sip:54@192.168.2.1">
<pp:person><status>
</status></pp:person>
<note>Ready</note>
<tuple id="54">
<contact priority="1">sip:54@192.168.2.1</contact>
<status><basic>open</basic></status>
</tuple>
</presence>
".



By: alexb (alexb) 2005-12-06 02:47:07.000-0600

Reported to Counterpath's support http://support.counterpath.net/viewtopic.php?t=5008 and asked for their help, too.

By: paradise (paradise) 2005-12-07 15:43:59.000-0600

oej: any comments?

By: paradise (paradise) 2005-12-20 02:59:37.000-0600

another crash!
the *p at crash time points to a garbage pvt.
p_pointer.txt is attached.

By: paradise (paradise) 2005-12-20 03:05:54.000-0600

just to remind:
when no hints, no crash.
thanks.

By: paradise (paradise) 2006-01-04 07:27:44.000-0600

got another crash with latest trunk version

backtrace attached.

By: Philip Walls (malverian) 2006-01-10 11:14:38.000-0600

I'm experiencing this problem as well.

Here we use SNOM 320 phones as well as a inhouse developed soft phone that uses device hinting/subscriptions. Some employees have >20 subscriptions on their soft phone.

I submitted 0006182 which I believe should be marked as a duplicate of this bug. The crash has nothing to do with using Monitor() however, I do not use this application at all.



By: Philip Walls (malverian) 2006-01-10 11:29:15.000-0600

So all of the information is here in this one bug:

This problem has been an issue for me since I began using Asterisk CVS around at least September 2005. The problem has persisted through releases 1.2.0 and 1.2.1.

By: paradise (paradise) 2006-01-24 12:10:58.000-0600

just upgraded to latest trunk. my box still crashes.
is it needed to attach new BTs?

By: Michael Gernoth (mgernoth) 2006-01-24 17:53:15.000-0600

Does it help to add the line
               sip_cancel_destroy(p);
directly above the line
               ast_clear_flag(p, SIP_OUTGOING);
(that is around line 10672 of current svn) to chan_sip.c?
This is just something I thought of when looking at the subscription code (and not knowing much of chan_sip internals).

Proper patch will follow, if it works.

Also I'm not sure about the handling of mailboxsize a few lines down. IMHO it should be mailboxsize=sizeof(mailbox)-1 and properly zeroing *mailbox before use should be done.

[EDIT]: I have just uploaded the described patch as cancel_destroy.diff.txt



By: Olle Johansson (oej) 2006-01-25 04:44:47.000-0600

Please check if the problem still exists in current head of svn trunk. Thank you.

By: Olle Johansson (oej) 2006-01-30 12:41:49.000-0600

Any updates?

By: Andrew Gough (tripwire) 2006-01-30 14:38:50.000-0600

I am experiencing the same problem. currently running 1.2.1 all extensions are eyebeam softphones. paradise saved me (via miling list) by suggesting I remove hints from extensions.conf, which cured the instability but now I can't get presence info.

We have only 6 extensions and a two line chan_capi ISDN trunk and external IAX itsp trunk

BT uploaded for reference. (tripwire_crash.txt)

If I can give anymore info just shout.

By: Serge Vecher (serge-v) 2006-01-30 14:54:53.000-0600

tripwire: a patch from a related bug ASTERISK-6025 has been committed to 1.2 stable (not v1.2.3) and trunk. Please test with either 1.2 stable or trunk and report if there is a problem. Thanks!

By: Olle Johansson (oej) 2006-01-31 00:11:46.000-0600

Please try the latest 1.2 in subversion or svn trunk. Thank you. The fix is not in older versions.

By: X-Files (x-files) 2006-01-31 02:19:05.000-0600

worked, 2 day testing.

By: paradise (paradise) 2006-02-07 01:03:29.000-0600

the problem seems to be fixed. thanks!

By: X-Files (x-files) 2006-02-07 02:55:16.000-0600

yep, fixed, please close

By: Andrew Gough (tripwire) 2006-02-07 04:13:31.000-0600

Fixed for me also.

By: Olle Johansson (oej) 2006-03-05 04:25:40.000-0600

Closing on reporter's request.