Summary:ASTERISK-07624: [patch] segfault when processing SUBSCRIBE MWI
Reporter:Sergey Tamkovich (sergee)Labels:
Date Opened:2006-08-29 13:59:01Date Closed:2006-11-14 07:24:07.000-0600
Versions:Frequency of
Environment:Attachments:( 0) patch2.diff
Description:Asterisk crashes when processing SUBSCRIBE MWI. This happens only with asterisk-realtime. chan_sip doesn't execute ASTOBJ_REF on the pointer returned by realtime_peer() so this pointer is free()-ed when sip_peer still in use - this is where segfault happenes.

Small patch will fix this problem. Idea of patch is belong to PCadach.

In the "Additional information" you can see a part of log - sip_peer is destroyed in this part. Full log can be found here: http://pastebin.ca/raw/152938


Reliably Transmitting (NAT) to
NOTIFY sip:1199@;uniq=6F29334C96730DAC9D312E7F99C6 SIP/2.0
Via: SIP/2.0/UDP;branch=z9hG4bK6c06534d;rport
From: "asterisk" <sip:asterisk@sip.betamax.de>;tag=as547698df
To: <sip:sip:1199@;uniq=6F29334C96730DAC9D312E7F99C6>;tag=1767697328
Contact: <sip:asterisk@>
Call-ID: 88E93216576A0913AE4B410DA71E@
CSeq: 102 NOTIFY
User-Agent: Asterisk PBX
Max-Forwards: 70
Event: message-summary
Content-Type: application/simple-message-summary
Subscription-State: active
Content-Length: 95

Messages-Waiting: yes
Message-Account: sip:asterisk@sip.betamax.de
Voice-Message: 3/0 (0/0)

[Aug 29 10:31:19] DEBUG[23070]: chan_sip.c:1902 __sip_reliable_xmit: *** SIP TIMER: Initalizing retransmit timer on packet: Id  ASTERISK-1527
[Aug 29 10:31:19] DEBUG[23070]: chan_sip.c:2257 sip_destroy_peer: Destroying SIP peer 1199
[Aug 29 10:31:19] DEBUG[23070]: chan_sip.c:2976 sip_destroy: Destroying SIP dialog
Comments:By: Paul Cadach (pcadach) 2006-08-29 23:35:19

Backtrace is not informative because it shows stack overwrite only. When realtime peer is destroyed it still be bound into the peer list, so when Asterisk handles answer to NOTIFY, memory used for dynamic peer re-allocated for other data and updating pointers to invalid data produces a crash with stack overwrite.

By: Sergey Tamkovich (sergee) 2006-08-30 02:21:52

patch.diff is not good, it will perfom ASTOBJ_REF twice if flag SIP_PAGE2_RTCACHEFRIENDS is set.

patch2.diff is corrected version. It will perfom ASTOBJ_REF only if flag SIP_PAGE2_RTCACHEFRIENDS is not set.

By: Olle Johansson (oej) 2006-08-30 08:55:18

You are changing a generic function that is used in many other cases - is that the proper thing to do, really? Or do we need to find another patch?

By: Sergey Tamkovich (sergee) 2006-08-30 12:23:10

oej, i'm not sure that it is a proper way, however i can't invent anything better. I think that bug should be fixed in realtime_peer() - right where it lives :)

By: Joshua C. Colp (jcolp) 2006-10-04 10:39:11

So the issue is that the subscription structure is getting linked to a peer that is going to go away since it's realtime. What if we increment the reference count in place(s) where it will have to stick around (like the subscription stuff) instead of on a global basis which does seem incorrect?

By: Sergey Tamkovich (sergee) 2006-10-06 06:16:12

File, the problem is not in SUBSCRIPTION, SUBSCRIPTIONS is just an example, where you can see effect of this bug.

IMHO bug in general function - "realtime_peer", here is a piece of code:

       if (ast_test_flag(&global_flags[1], SIP_PAGE2_RTCACHEFRIENDS)) {
               /* Cache peer */
               ast_copy_flags(&peer->flags[1],&global_flags[1], SIP_PAGE2_RTAUTOCLEAR|SIP_PAGE2_RTCACHEFRIENDS);
               if (ast_test_flag(&global_flags[1], SIP_PAGE2_RTAUTOCLEAR)) {
                       if (peer->expire > -1) {
                               ast_sched_del(sched, peer->expire);
                       peer->expire = ast_sched_add(sched, (global_rtautoclear) * 1000, expire_register, (void *)peer);
       } else {
               ast_set_flag(&peer->flags[0], SIP_REALTIME);

So if SIP_PAGE2_RTCACHEFRIENDS flag is set, then reference count for peer incresed (ASTOBJ_CONTAINER_LINK) and doesn't increased in the opposite case.

As far as i understand - this is a bug, and "refcount" should be used for all objects, am i wrong?

so my main point: there can be other places besides subscriptions, where this bug can show itself,

I think that moving of ASTOBJ_REF to the place before peer usage won't help, because peer dies somewhere between "realtime_peer" call and "subscriptions code". If we move it, we'll made "peer" unprotected, with wrong refcount - so situation would be the same as it is now.

Besides, if SIP_PAGE2_RTCACHEFRIENDS flag is set, then refcount increased right from "realtime_peer" - not from other places...

Sorry for bad english, i hope i made myself clear.

By: Sergey Tamkovich (sergee) 2006-10-25 05:18:49

just fyi: after applying this patch i don't have any segfaults...

By: Olle Johansson (oej) 2006-10-29 10:59:03.000-0600

We already add an ASTOBJ_REF to the authpeer. Found a few cases where we do not release the peer at subscription time though.

By: Olle Johansson (oej) 2006-10-29 11:06:27.000-0600

Finding other bugs in MWI subscription while trying to understand this.

By: Olle Johansson (oej) 2006-10-29 11:07:07.000-0600

try with latest svn trunk. Thanks.

By: Olle Johansson (oej) 2006-10-29 11:17:22.000-0600

Added changes to 1.4 too, since the same problem exists there.

By: Sergey Tamkovich (sergee) 2006-10-30 04:23:45.000-0600

Revision: 46328

Still crashing in the same place without my patch...

By: Olle Johansson (oej) 2006-10-30 10:03:46.000-0600

The svn servers was not in synch. Please update and try again. Thanks.

By: Olle Johansson (oej) 2006-11-10 13:09:53.000-0600

Any news? I need feedback, please.

By: Sergey Tamkovich (sergee) 2006-11-13 09:21:32.000-0600

oej, please excuse me for long delay.

I just compiled Revision 46400

314 Realtime peers

Runing for 2 hours now - without segfaults, i will post a message tomorrow (after 24 h :)

By: Sergey Tamkovich (sergee) 2006-11-14 06:54:33.000-0600

24 hours online under load
no segfaults caught.

Oej, thank you very much.
I think we can close this issue.