|Summary:||ASTERISK-20226: Segfault in chan_sip while performing connected line update|
|Reporter:||Jared Smith (jsmith)||Labels:|
|Date Opened:||2012-08-13 17:20:35||Date Closed:||2012-11-30 10:22:45.000-0600|
|Environment:||Linux||Attachments:||( 0) asterisk_backtrace_20121029_8002.txt|
( 1) ASTERISK-20226.patch
( 2) ASTERISK-20226.txt
( 3) backtrace.29064
( 4) backtrace.controlframes.txt
|Description:||Seeing a strange segfault on a new install of Asterisk 184.108.40.206. Pasting the backtrace below at mjordan's request.|
Removed backtrace and attached as file to this issue.
Appears to occur during a connected line update initiated from local_attended_transfer in chan_sip.
|Comments:||By: Jared Smith (jsmith) 2012-08-13 17:48:56.275-0500|
An updated copy of the backtrace, this time with more debugging symbols attached for glibc.
By: Rusty Newton (rnewton) 2012-08-16 18:29:00.753-0500
Jared, as on the other issue, many values are optimized out : https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace
By: Jared Smith (jsmith) 2012-08-25 14:56:18.315-0500
This is another crash while trying to queue control frames for connected line updates, this time with DONT_OPTIMIZE and BETTER_BACKTRACES enabled.
By: Rusty Newton (rnewton) 2012-09-24 09:35:53.551-0500
Didn't see this one, due to the Enter Feedback button not being hit. We see it now, adding to queue.
By: Matt Jordan (mjordan) 2012-11-05 10:38:33.903-0600
Another crash related to this issue
By: Matt Jordan (mjordan) 2012-11-07 12:59:56.014-0600
Jared - what timing source are you using on the machine that has these crashes?
By: Jared Smith (jsmith) 2012-11-07 13:12:14.299-0600
We're using DAHDI timing:
12:11:31 # asterisk -rx 'timing test'
Attempting to test a timer with 50 ticks per second.
Using the 'DAHDI' timing module for this test.
It has been 1017 milliseconds, and we got 51 timer ticks
By: Mark Michelson (mmichelson) 2012-11-15 10:41:26.660-0600
I'm uploading ASTERISK-20226.patch to the issue. I provided this to Jared in another medium yesterday.
The patch is based on my observation that the channel onto which the frame is being queued is the target.chan1 channel of local_attended_transfer(). This corresponds to the transferer channel that is bridged to the transfer target. Jared told me that the agents in his call center are using blind transfers, so this means that the transferer channel has hung up by the time the connected line update is queued. What we have to do is ensure that we grab a reference to the channel so that the channel cannot disappear out from under us.
It may be that we should grab this reference even sooner (i.e. before sending a NOTIFY with sipfrag) but this should be fine since we have the channel locked by the time we get to this point in the code.