|Summary:||ASTERISK-08748: chan_skinny randomly crashing server|
|Date Opened:||2007-02-07 10:24:33.000-0600||Date Closed:||2007-06-30 09:20:07|
|Environment:||Attachments:||( 0) bt_full.txt|
( 1) bt-full.txt
( 2) debug.txt
( 3) transmit.patch
|Description:||With the release version of asterisk. chan_skinny is randomly crashing asterisk|
Unoptimized backtrace is attached.
|Comments:||By: dea (dea) 2007-02-07 11:36:04.000-0600|
Your backtrace shows why, without really showing why. Can you upload
a log with both verbose and debug set to at least 3?
Somehow this call managed to get started without a session(bad). The
verbose/debug log should confirm this, and hopefully point to where
additional checks for the existance of the session should be performed.
By: sbisker (sbisker) 2007-02-07 12:12:22.000-0600
I have set the options for asterisk to -vvvvvgdddddnp
When it crashes again, I will attach /var/log/asterisk/messages .
By: Serge Vecher (serge-v) 2007-02-07 16:24:02.000-0600
it is best to set debug output for console in logger.conf and upload a console log instead.
By: Anthony LaMantia-2 (anthonyl) 2007-02-08 11:54:41.000-0600
it would seem the line s = d->session; is borked i assume the problem comes from the fact l->session is never checked after calling line_by_deviceid in handle_stimulus_message. or the sessions may just be being destoryed it should be checked for before calling transmit_response or inside transmit_response for safety anyway considering how this code is laid out.
By: Damien Wedhorn (wedhorn) 2007-02-08 18:22:58.000-0600
I agree, I think it is fairly easy to add a check in transmit_response. There are many functions calling transmit_response so it would make sense to do the check in there.
We may want to pass back an error so the calling function (skinny_new) is at least aware that the session has been dropped.
By: sbisker (sbisker) 2007-02-23 09:20:13.000-0600
I uploaded the debug trace prior to the system crashing. Any luck on providing the fix in transmit_response?
By: sbisker (sbisker) 2007-03-05 12:12:12.000-0600
Just checking status on this one.
By: Damien Wedhorn (wedhorn) 2007-03-05 14:17:38.000-0600
Added small patch. If there is no session it should log it ("transmit response: no session") and continue without transmitting the message. Compiled ok, not tested. Seeing as this is so intermittent, can you test and monitor you logs for the above message.
Just a general observation, we tend to have errors thrown that are not utilised. In transmit_response there are now a couple of situations that will return an error and nothing is done with the error.
By: sbisker (sbisker) 2007-03-05 14:32:38.000-0600
Patched against 1.4.0. I will monitor the logs for the message.
By: sbisker (sbisker) 2007-03-06 08:45:28.000-0600
Same problem. The patch didn't stop asterisk from core dumping. Further there were no messages in the logs that had "no session" in the entry.
By: dea (dea) 2007-03-06 11:30:04.000-0600
The test for (!s) and return needs to be above ast_mutex_lock(&s->lock); (line 1393) in transmit_response() otherwise we attempt to lock a non-existant
session, which is what the bt shows.
sbisker, if you can move the five new lines in wedhorn's patch up above the
ast_mutex_lock, it should work.
By: Russell Bryant (russell) 2007-03-06 12:03:58.000-0600
This crash should not happen anymore as of rev 58023 and 58025. However, this problem is surely indicative of a deeper problem. If you can isolate any situation where this occurs and you get a WARNING message, please let us know.