Summary: | ASTERISK-16234: [patch] [regression] Segmentation fault in scheduled event | ||
Reporter: | anonymouz666 (anonymouz666) | Labels: | |
Date Opened: | 2010-06-10 13:27:37 | Date Closed: | 2010-08-06 16:37:29 |
Priority: | Critical | Regression? | Yes |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) chan_sip-litegdb.txt ( 1) keepalive_diff_1.4_v2.diff ( 2) keepalive_diff_1.4.diff | |
Description: | (gdb) bt #0 0x0071a76d in strncasecmp () from /lib/libc.so.6 #1 0x008df159 in send_provisional_keepalive_full (pvt=0xb54e0af0, with_sdp=1) at chan_sip.c:2340 #2 0x008df2a8 in send_provisional_keepalive_with_sdp (data=0xb54e0af0) at chan_sip.c:2365 #3 0x080f6c5c in ast_sched_runq (con=0x937f9b0) at sched.c:363 #4 0x009244ec in do_monitor (data=0x0) at chan_sip.c:17048 ASTERISK-1 0x08106068 in dummy_start (data=0x9381d48) at utils.c:856 ASTERISK-2 0x007fc832 in start_thread () from /lib/libpthread.so.0 ASTERISK-3 0x0077ae0e in clone () from /lib/libc.so.6 (gdb) (gdb) bt full #0 0x0071a76d in strncasecmp () from /lib/libc.so.6 No symbol table info available. #1 0x008df159 in send_provisional_keepalive_full (pvt=0xb54e0af0, with_sdp=1) at chan_sip.c:2340 msg = 0x0 #2 0x008df2a8 in send_provisional_keepalive_with_sdp (data=0xb54e0af0) at chan_sip.c:2365 pvt = 0xb54e0af0 #3 0x080f6c5c in ast_sched_runq (con=0x937f9b0) at sched.c:363 current = 0xb6101948 tv = {tv_sec = 1276191797, tv_usec = 160228} numevents = 0 res = -1216462136 #4 0x009244ec in do_monitor (data=0x0) at chan_sip.c:17048 res = 0 sip = 0x0 peer = 0x0 t = 1276191797 fastrestart = 0 lastpeernum = -1 curpeernum = 1 reloading = 0 __PRETTY_FUNCTION__ = "do_monitor" ASTERISK-1 0x08106068 in dummy_start (data=0x9381d48) at utils.c:856 __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {8437748, 0, -1216459888, -1216461896, -193775784, 1260196912}, __mask_was_saved = 0}}, __pad = {0xb77e4470, 0x0, 0x0, 0x0}} __cancel_routine = 0x8067d05 <ast_unregister_thread> __cancel_arg = 0xb77e4b90 not_first_call = 0 ret = 0x0 a = {start_routine = 0x923e7b <do_monitor>, data = 0x0, name = 0x9381d58 "do_monitor", ' ' <repeats 11 times>, "started at [17101] chan_sip.c restart_monitor()"} ASTERISK-2 0x007fc832 in start_thread () from /lib/libpthread.so.0 No symbol table info available. ASTERISK-3 0x0077ae0e in clone () from /lib/libc.so.6 No symbol table info available. ****** ADDITIONAL INFORMATION ****** CPU is a Quad Intel(R) Xeon(R) CPU E5504 @ 2.00GHz load average: 1.53, 1.61, 1.75 OS: CentOS release 5.5 (Final) 162 SIP peers Online, current with 42 active calls. Apparently the same bug as https://issues.asterisk.org/view.php?id=17272, but on version 1.4.33-rc1. This is happening 3-6 times a day and at the time of the segfault the asterisk log file isn´t showing anything specific that causes the crash. | ||
Comments: | By: Leif Madsen (lmadsen) 2010-06-14 10:22:21 I'm not sure why this reviewboard link is attached to this issue and why this is marked as a NewFeature. By: anonymouz666 (anonymouz666) 2010-06-14 11:00:13 I initially marked as a NewFeature because send_provisional_keepalive_full() is a function from a new feature (?) from the Reviewboard link (https://reviewboard.asterisk.org/r/315/) incorporated in version 1.4. Just for information, we removed ALL the code regarding the "Re-send non-100 provisional responses" commit and the crash has stopped. We had two boxes running with this problem. The other the crash still occur and it is available to test patches. Thanks. By: Evandro César Arruda (ecarruda) 2010-06-20 21:47:57 I'm having this problem too, after two sip reloads, asterisk died, i used gdb on running intance and i can see many and many sip request and scheduling, i sending i file with this. I'm running 1.4.32 After this problem my asterisk stop to answer commands on cli, with lsof i can see all rasterisk socket sessions opened ( but all rasterisk are closed ) and when i try to execute module unload chan_sip.so asterisk returns to me 183 in use, and i have just 32 peers on this server, if i enable debug, screens rolls very fast with sip messages related on the sent txt. By: anonymouz666 (anonymouz666) 2010-06-28 08:09:04 The box1 is still crashing and need to be fixed. If I remove all the code that causes this issue (like in the box2), I will not be able to test any further patches against this bug. I will wait a few days to see if anyone has time to take a look on this. By: under (under) 2010-07-18 15:48:21 Hi. We face same problems (in different chunks of code related to sched_runq) on production servers on 1.6.0.26 under heavy load. Guys from Digium suggested moving to 1.6.2 (see issue ASTERISK-16040) - they rewrote significantly cleanup stuff there. But we're not ready to do this on production servers because we we will look at 1.6.2 only after it goes behind 1.6.2.25 :-) and in security maintenance mode. I made a patch which is supposed to fix the problem and currently am testing it under load of 500 short calls by sipp. Will post this patch shortly if everything goes ok. By: Leif Madsen (lmadsen) 2010-07-20 10:56:02 under: based on the time frame for 1.6.2.x support, we will never see a version of 1.6.2 in maintenance mode past 1.6.2.15. Since 1.6.0.x is no longer in maintenance mode, then if you post a patch we can move this forward possibly by at least determining if the same issue is present in 1.6.2. Beyond that, debugging will be done against Asterisk 1.6.2.x By: under (under) 2010-07-20 12:15:39 lmadsen, I will put fix on production server with 500 simultaneous calls next monday. If the fix helps - i will post it here. By: Leif Madsen (lmadsen) 2010-07-21 12:27:48 Are you using res_timing_pthread here? If so, please test after this revision: http://svn.digium.com/view/asterisk?view=rev&revision=278465 The system will be much more stable if so. By: David Vossel (dvossel) 2010-07-23 17:25:23 I uploaded a patch. If please test the patch report if your results. By: under (under) 2010-07-27 02:57:06 dvossel, thanks. But I don't think this patch addresses this issue, because according to gdb's trace from https://issues.asterisk.org/view.php?id=17272 struct pvt is already freed when scheduled callback is called, Therefore likely Asterisk with your patch will crash in "ast_mutex_lock(&pvt->lock);" (from your patch) in the same situation. By: David Vossel (dvossel) 2010-08-06 16:24:10 I understand the problem better now and have approached it differently. I uploaded a new patch that should resolve the issue. By: Digium Subversion (svnbot) 2010-08-06 16:34:37 Repository: asterisk Revision: 281185 U branches/1.4/channels/chan_sip.c ------------------------------------------------------------------------ r281185 | dvossel | 2010-08-06 16:34:36 -0500 (Fri, 06 Aug 2010) | 20 lines chan_sip: fixes provisional keepalive scheduled item crash There is a scheduler item in chan_sip that keeps sending the last provisional message in response to an INVITE Request for a period of time until a final response to that INVITE is sent. Because of the way this scheduler item works, it requires a reference to a sip_pvt pointer to work properly. The problem with this is that it is currently possible (but rare) for the sip_pvt to get destroyed and that scheduler item to still exist. When this occurs, the scheduler event fires and attempts to access a freed sip_pvt which causes a crash. (closes issue ASTERISK-16234) Reported by: anonymouz666 Patches: keepalive_diff_1.4_v2.diff uploaded by dvossel (license 671) Review: https://reviewboard.asterisk.org/r/849/ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=281185 By: Digium Subversion (svnbot) 2010-08-06 16:37:27 Repository: asterisk Revision: 281186 _U branches/1.6.2/ ------------------------------------------------------------------------ r281186 | dvossel | 2010-08-06 16:37:27 -0500 (Fri, 06 Aug 2010) | 25 lines Blocked revisions 281185 via svnmerge ........ r281185 | dvossel | 2010-08-06 16:34:38 -0500 (Fri, 06 Aug 2010) | 20 lines chan_sip: fixes provisional keepalive scheduled item crash There is a scheduler item in chan_sip that keeps sending the last provisional message in response to an INVITE Request for a period of time until a final response to that INVITE is sent. Because of the way this scheduler item works, it requires a reference to a sip_pvt pointer to work properly. The problem with this is that it is currently possible (but rare) for the sip_pvt to get destroyed and that scheduler item to still exist. When this occurs, the scheduler event fires and attempts to access a freed sip_pvt which causes a crash. (closes issue ASTERISK-16234) Reported by: anonymouz666 Patches: keepalive_diff_1.4_v2.diff uploaded by dvossel (license 671) Review: https://reviewboard.asterisk.org/r/849/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=281186 |