Summary:ASTERISK-16040: [patch] Segmentation fault in scheduled event
Reporter:under (under)Labels:
Date Opened:2010-05-01 14:48:40Date Closed:2011-06-07 14:04:58
Versions:Frequency of
Environment:Attachments:( 0) sched.diff
Description:(gdb) bt
#0  0x2b1e066c in send_provisional_keepalive_full (pvt=0x3d8d9268, with_sdp=1) at asterisk/channels/chan_sip.c:3324
#1  0x2b1e080f in send_provisional_keepalive_with_sdp (data=0x3d8d9268) at asterisk/channels/chan_sip.c:3349
#2  0x0815e013 in ast_sched_runq (con=0x2af08da8) at asterisk/main/sched.c:369
#3  0x2b234341 in do_monitor (data=0x0) at asterisk/channels/chan_sip.c:20498
#4  0x0816e112 in dummy_start (data=0x294e96e8) at asterisk/main/utils.c:861
ASTERISK-1  0x282176ff in pthread_getprio () from /lib/libthr.so.3
ASTERISK-2  0x00000000 in ?? ()
(gdb) fr 0
#0  0x2b1e066c in send_provisional_keepalive_full (pvt=0x3d8d9268, with_sdp=1) at asterisk/channels/chan_sip.c:3324
3324            if (!pvt->last_provisional || !strncasecmp(pvt->last_provisional, "100", 3)) {
(gdb) p pvt->last_provisional
Cannot access memory at address 0x3d8da9b8


Problem appeared while testing asterisk with sipp cal generator with 100 simultaneous short calls (5 seconds of connection)
Comments:By: under (under) 2010-05-01 14:55:24

As far as I see issue happens because scheduled callback is called on behalf of destroyed object.

AST_SCHED_DEL(sched, p->provisional_keepalive_sched_id) from
__sip_destroy() is expected to cancel the callback invocation, but actually there is no guarantee that callback won't be invoked (or isn't already being run) after
AST_SCHED_DEL() and, accordingly, on object deletion.

By: under (under) 2010-05-01 15:00:42

Attached patch. I don't know if it helps, because this issue happened only once so far, and currently it doesn't reproduce.

By: Paul Belanger (pabelanger) 2010-05-03 09:46:56

Thanks for the patch.

By: under (under) 2010-05-04 03:15:19

I tested system under load with this patch, and it goes to deadlock (ast_channel->lock) after several hours. It seems that happens, because timing has changed a bit after this patch.

By: Leif Madsen (lmadsen) 2010-05-07 10:31:14

Do you still have the core dump? I'm wondering if you can run some more of the commands in the backtrace.txt file?

Have you been able to reproduce this yet? I'm not sure if there is enough information to move this forward as is. Thanks!

By: under (under) 2010-05-11 02:37:02

no, this happened only once under heavy load

By: Russell Bryant (russell) 2010-05-11 10:33:35

1.6.0 is about to go into security maintenance only.  Much of the related code to problems like this in chan_sip has drastically changed after 1.6.0.  I think that I'm only willing to pursue this if it can be reproduced on 1.6.2.

By: Leif Madsen (lmadsen) 2010-05-17 10:36:06

I'm suspending this issue for now. If the reporter can reproduce this on the 1.6.2 branch then please open a new issue with debugging information, including the backtrace with DONT_OPTIMIZE enabled in menuselect (per the doc/backtrace.txt file in your Asterisk source)