ASTERISK-20226: Segfault in chan_sip while performing connected line update

[Home]

Summary: ASTERISK-20226: Segfault in chan_sip while performing connected line update

Reporter: Jared Smith (jsmith) Labels:

Date Opened: 2012-08-13 17:20:35 Date Closed: 2012-11-30 10:22:45.000-0600

Priority: Major Regression? No

Status: Closed/Complete Components: Channels/chan_sip/General General

Versions: 1.8.15.0 Frequency of
Occurrence Occasional

Related
Issues:
is related to ASTERISK-20227 Segfault (possible memory corruption?)

Environment: Linux Attachments: ( 0) asterisk_backtrace_20121029_8002.txt
( 1) ASTERISK-20226.patch
( 2) ASTERISK-20226.txt
( 3) backtrace.29064
( 4) backtrace.controlframes.txt

Description: Seeing a strange segfault on a new install of Asterisk 1.8.15.0. Pasting the backtrace below at mjordan's request.

[mjordan]

Removed backtrace and attached as file to this issue.

Appears to occur during a connected line update initiated from local_attended_transfer in chan_sip.

Comments: By: Jared Smith (jsmith) 2012-08-13 17:48:56.275-0500

An updated copy of the backtrace, this time with more debugging symbols attached for glibc.
By: Rusty Newton (rnewton) 2012-08-16 18:29:00.753-0500

Jared, as on the other issue, many values are optimized out : https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

Thanks!
By: Jared Smith (jsmith) 2012-08-25 14:56:18.315-0500

This is another crash while trying to queue control frames for connected line updates, this time with DONT_OPTIMIZE and BETTER_BACKTRACES enabled.
By: Rusty Newton (rnewton) 2012-09-24 09:35:53.551-0500

Didn't see this one, due to the Enter Feedback button not being hit. We see it now, adding to queue.
By: Matt Jordan (mjordan) 2012-11-05 10:38:33.903-0600

Another crash related to this issue
By: Matt Jordan (mjordan) 2012-11-07 12:59:56.014-0600

Jared - what timing source are you using on the machine that has these crashes?
By: Jared Smith (jsmith) 2012-11-07 13:12:14.299-0600

We're using DAHDI timing:

12:11:31 # asterisk -rx 'timing test'
Attempting to test a timer with 50 ticks per second.
Using the 'DAHDI' timing module for this test.
It has been 1017 milliseconds, and we got 51 timer ticks

By: Mark Michelson (mmichelson) 2012-11-15 10:41:26.660-0600

I'm uploading ASTERISK-20226.patch to the issue. I provided this to Jared in another medium yesterday.

The patch is based on my observation that the channel onto which the frame is being queued is the target.chan1 channel of local_attended_transfer(). This corresponds to the transferer channel that is bridged to the transfer target. Jared told me that the agents in his call center are using blind transfers, so this means that the transferer channel has hung up by the time the connected line update is queued. What we have to do is ensure that we grab a reference to the channel so that the channel cannot disappear out from under us.

It may be that we should grab this reference even sooner (i.e. before sending a NOTIFY with sipfrag) but this should be fine since we have the channel locked by the time we get to this point in the code.