[Home]

Summary:ASTERISK-16234: [patch] [regression] Segmentation fault in scheduled event
Reporter:anonymouz666 (anonymouz666)Labels:
Date Opened:2010-06-10 13:27:37Date Closed:2010-08-06 16:37:29
Priority:CriticalRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) chan_sip-litegdb.txt
( 1) keepalive_diff_1.4_v2.diff
( 2) keepalive_diff_1.4.diff
Description:(gdb) bt
#0  0x0071a76d in strncasecmp () from /lib/libc.so.6
#1  0x008df159 in send_provisional_keepalive_full (pvt=0xb54e0af0, with_sdp=1) at chan_sip.c:2340
#2  0x008df2a8 in send_provisional_keepalive_with_sdp (data=0xb54e0af0) at chan_sip.c:2365
#3  0x080f6c5c in ast_sched_runq (con=0x937f9b0) at sched.c:363
#4  0x009244ec in do_monitor (data=0x0) at chan_sip.c:17048
ASTERISK-1  0x08106068 in dummy_start (data=0x9381d48) at utils.c:856
ASTERISK-2  0x007fc832 in start_thread () from /lib/libpthread.so.0
ASTERISK-3  0x0077ae0e in clone () from /lib/libc.so.6
(gdb)



(gdb) bt full
#0  0x0071a76d in strncasecmp () from /lib/libc.so.6
No symbol table info available.
#1  0x008df159 in send_provisional_keepalive_full (pvt=0xb54e0af0, with_sdp=1) at chan_sip.c:2340
       msg = 0x0
#2  0x008df2a8 in send_provisional_keepalive_with_sdp (data=0xb54e0af0) at chan_sip.c:2365
       pvt = 0xb54e0af0
#3  0x080f6c5c in ast_sched_runq (con=0x937f9b0) at sched.c:363
       current = 0xb6101948
       tv = {tv_sec = 1276191797, tv_usec = 160228}
       numevents = 0
       res = -1216462136
#4  0x009244ec in do_monitor (data=0x0) at chan_sip.c:17048
       res = 0
       sip = 0x0
       peer = 0x0
       t = 1276191797
       fastrestart = 0
       lastpeernum = -1
       curpeernum = 1
       reloading = 0
       __PRETTY_FUNCTION__ = "do_monitor"
ASTERISK-1  0x08106068 in dummy_start (data=0x9381d48) at utils.c:856
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {8437748, 0, -1216459888, -1216461896, -193775784, 1260196912}, __mask_was_saved = 0}},
         __pad = {0xb77e4470, 0x0, 0x0, 0x0}}
       __cancel_routine = 0x8067d05 <ast_unregister_thread>
       __cancel_arg = 0xb77e4b90
       not_first_call = 0
       ret = 0x0
       a = {start_routine = 0x923e7b <do_monitor>, data = 0x0,
         name = 0x9381d58 "do_monitor", ' ' <repeats 11 times>, "started at [17101] chan_sip.c restart_monitor()"}
ASTERISK-2  0x007fc832 in start_thread () from /lib/libpthread.so.0
No symbol table info available.
ASTERISK-3  0x0077ae0e in clone () from /lib/libc.so.6
No symbol table info available.



****** ADDITIONAL INFORMATION ******

CPU is a Quad Intel(R) Xeon(R) CPU E5504  @ 2.00GHz
load average: 1.53, 1.61, 1.75
OS: CentOS release 5.5 (Final)
162 SIP peers Online, current with 42 active calls.

Apparently the same bug as https://issues.asterisk.org/view.php?id=17272, but on version 1.4.33-rc1.

This is happening 3-6 times a day and at the time of the segfault the asterisk log file isn´t showing anything specific that causes the crash.
Comments:By: Leif Madsen (lmadsen) 2010-06-14 10:22:21

I'm not sure why this reviewboard link is attached to this issue and why this is marked as a NewFeature.

By: anonymouz666 (anonymouz666) 2010-06-14 11:00:13

I initially marked as a NewFeature because send_provisional_keepalive_full() is a function from a new feature (?) from the Reviewboard link (https://reviewboard.asterisk.org/r/315/) incorporated in version 1.4. Just for information, we removed ALL the code regarding the "Re-send non-100 provisional responses" commit and the crash has stopped. We had two boxes running with this problem. The other the crash still occur and it is available to test patches. Thanks.

By: Evandro César Arruda (ecarruda) 2010-06-20 21:47:57

I'm having this problem too, after two sip reloads, asterisk died, i used gdb on running intance and i can see many and many sip request and scheduling, i sending i file with this.

I'm running 1.4.32

After this problem my asterisk stop to answer commands on cli, with lsof i can see all rasterisk socket sessions opened ( but all rasterisk are closed ) and when i try to execute module unload chan_sip.so asterisk returns to me 183 in use, and i have just 32 peers on this server, if i enable debug, screens rolls very fast with sip messages related on the sent txt.



By: anonymouz666 (anonymouz666) 2010-06-28 08:09:04

The box1 is still crashing and need to be fixed. If I remove all the code that causes this issue (like in the box2), I will not be able to test any further patches against this bug. I will wait a few days to see if anyone has time to take a look on this.

By: under (under) 2010-07-18 15:48:21

Hi.
We face same problems (in different chunks of code related to sched_runq) on production servers on 1.6.0.26 under heavy load.
Guys from Digium suggested moving to 1.6.2 (see issue ASTERISK-16040) - they rewrote significantly cleanup stuff there. But we're not ready to do this on production servers because we we will look at 1.6.2 only after it goes behind 1.6.2.25 :-) and in security maintenance mode.
I made a patch which is supposed to fix the problem and currently am testing it under load of 500 short calls by sipp.
Will post this patch shortly if everything goes ok.

By: Leif Madsen (lmadsen) 2010-07-20 10:56:02

under: based on the time frame for 1.6.2.x support, we will never see a version of 1.6.2 in maintenance mode past 1.6.2.15.

Since 1.6.0.x is no longer in maintenance mode, then if you post a patch we can move this forward possibly by at least determining if the same issue is present in 1.6.2.

Beyond that, debugging will be done against Asterisk 1.6.2.x

By: under (under) 2010-07-20 12:15:39

lmadsen, I will put fix on production server with 500 simultaneous calls next monday. If the fix helps - i will post it here.

By: Leif Madsen (lmadsen) 2010-07-21 12:27:48

Are you using res_timing_pthread here? If so, please test after this revision:

http://svn.digium.com/view/asterisk?view=rev&revision=278465

The system will be much more stable if so.

By: David Vossel (dvossel) 2010-07-23 17:25:23

I uploaded a patch. If please test the patch report if your results.

By: under (under) 2010-07-27 02:57:06

dvossel, thanks.
But I don't think this patch addresses this issue, because according to gdb's trace from https://issues.asterisk.org/view.php?id=17272 struct pvt is already freed when scheduled callback is called, Therefore likely Asterisk with your patch will crash in "ast_mutex_lock(&pvt->lock);" (from your patch) in the same situation.



By: David Vossel (dvossel) 2010-08-06 16:24:10

I understand the problem better now and have approached it differently.  I uploaded a new patch that should resolve the issue.

By: Digium Subversion (svnbot) 2010-08-06 16:34:37

Repository: asterisk
Revision: 281185

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r281185 | dvossel | 2010-08-06 16:34:36 -0500 (Fri, 06 Aug 2010) | 20 lines

chan_sip: fixes provisional keepalive scheduled item crash

There is a scheduler item in chan_sip that keeps sending the
last provisional message in response to an INVITE Request for
a period of time until a final response to that INVITE is
sent.  Because of the way this scheduler item works, it requires
a reference to a sip_pvt pointer to work properly.  The problem
with this is that it is currently possible (but rare) for the
sip_pvt to get destroyed and that scheduler item to still
exist.  When this occurs, the scheduler event fires and attempts
to access a freed sip_pvt which causes a crash.

(closes issue ASTERISK-16234)
Reported by: anonymouz666
Patches:
     keepalive_diff_1.4_v2.diff uploaded by dvossel (license 671)

Review: https://reviewboard.asterisk.org/r/849/


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=281185

By: Digium Subversion (svnbot) 2010-08-06 16:37:27

Repository: asterisk
Revision: 281186

_U  branches/1.6.2/

------------------------------------------------------------------------
r281186 | dvossel | 2010-08-06 16:37:27 -0500 (Fri, 06 Aug 2010) | 25 lines

Blocked revisions 281185 via svnmerge

........
 r281185 | dvossel | 2010-08-06 16:34:38 -0500 (Fri, 06 Aug 2010) | 20 lines
 
 chan_sip: fixes provisional keepalive scheduled item crash
 
 There is a scheduler item in chan_sip that keeps sending the
 last provisional message in response to an INVITE Request for
 a period of time until a final response to that INVITE is
 sent.  Because of the way this scheduler item works, it requires
 a reference to a sip_pvt pointer to work properly.  The problem
 with this is that it is currently possible (but rare) for the
 sip_pvt to get destroyed and that scheduler item to still
 exist.  When this occurs, the scheduler event fires and attempts
 to access a freed sip_pvt which causes a crash.
 
 (closes issue ASTERISK-16234)
 Reported by: anonymouz666
 Patches:
       keepalive_diff_1.4_v2.diff uploaded by dvossel (license 671)
 
 Review: https://reviewboard.asterisk.org/r/849/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=281186