Summary:ASTERISK-08748: chan_skinny randomly crashing server
Reporter:sbisker (sbisker)Labels:
Date Opened:2007-02-07 10:24:33.000-0600Date Closed:2007-06-30 09:20:07
Versions:Frequency of
Environment:Attachments:( 0) bt_full.txt
( 1) bt-full.txt
( 2) debug.txt
( 3) transmit.patch
Description:With the release version of asterisk.  chan_skinny is randomly crashing asterisk

Unoptimized backtrace is attached.
Comments:By: dea (dea) 2007-02-07 11:36:04.000-0600

 Your backtrace shows why, without really showing why.  Can you upload
a log with both verbose and debug set to at least 3?

Somehow this call managed to get started without a session(bad).  The
verbose/debug log should confirm this, and hopefully point to where
additional checks for the existance of the session should be performed.

By: sbisker (sbisker) 2007-02-07 12:12:22.000-0600

I have set the options for asterisk to -vvvvvgdddddnp

When it crashes again, I will attach /var/log/asterisk/messages .


By: Serge Vecher (serge-v) 2007-02-07 16:24:02.000-0600

it is best to set debug output for console in logger.conf and upload a console log instead.

By: Anthony LaMantia-2 (anthonyl) 2007-02-08 11:54:41.000-0600

it would seem the line s = d->session; is borked i assume the problem comes from the fact l->session is never checked after calling line_by_deviceid in handle_stimulus_message. or the sessions may just be being destoryed it should be checked for before calling transmit_response or inside transmit_response for safety anyway considering how this code is laid out.

By: Damien Wedhorn (wedhorn) 2007-02-08 18:22:58.000-0600

I agree, I think it is fairly easy to add a check in transmit_response. There are many functions calling transmit_response so it would make sense to do the check in there.

We may want to pass back an error so the calling function (skinny_new) is at least aware that the session has been dropped.

By: sbisker (sbisker) 2007-02-23 09:20:13.000-0600

I uploaded the debug trace prior to the system crashing.  Any luck on providing the fix in transmit_response?

By: sbisker (sbisker) 2007-03-05 12:12:12.000-0600

Just checking status on this one.

By: Damien Wedhorn (wedhorn) 2007-03-05 14:17:38.000-0600

Added small patch. If there is no session it should log it ("transmit response: no session") and continue without transmitting the message. Compiled ok, not tested. Seeing as this is so intermittent, can you test and monitor you logs for the above message.

Just a general observation, we tend to have errors thrown that are not utilised. In transmit_response there are now a couple of situations that will return an error and nothing is done with the error.

By: sbisker (sbisker) 2007-03-05 14:32:38.000-0600

Patched against 1.4.0.  I will monitor the logs for the message.

By: sbisker (sbisker) 2007-03-06 08:45:28.000-0600

Same problem.  The patch didn't stop asterisk from core dumping.  Further there were no messages in the logs that had "no session" in the entry.

By: dea (dea) 2007-03-06 11:30:04.000-0600

The test for (!s) and return needs to be above ast_mutex_lock(&s->lock); (line 1393) in transmit_response() otherwise we attempt to lock a non-existant
session, which is what the bt shows.

sbisker, if you can move the five new lines in wedhorn's patch up above the
ast_mutex_lock, it should work.

By: Russell Bryant (russell) 2007-03-06 12:03:58.000-0600

This crash should not happen anymore as of rev 58023 and 58025.  However, this problem is surely indicative of a deeper problem.  If you can isolate any situation where this occurs and you get a WARNING message, please let us know.