Summary:ASTERISK-15226: [patch] Asterisk crashes randomly on mISDN RELEASE_COMPLETE
Reporter:Alexander Zielke (azielke)Labels:
Date Opened:2009-11-27 07:35:46.000-0600Date Closed:2011-07-27 12:55:39
Versions:Frequency of
Environment:Attachments:( 0) backtrace.txt
( 1) queue_hangup-
( 2) queue_hangup-trunk.patch
Description:I experienced random crashes on an asterisk when an Call over mISDN is hung up.

The backtrace for the crash always looked similar (see attachment), so i check the source.
I noticed that there is a check for ch->need_queue_hangup, but it would just print a message and continue executing the rest of the code.
The fix seems really trivial, but tests show that the asterisk didn't crash anymore.
Comments:By: Sven Hirschmueller (sodom) 2009-12-07 04:37:08.000-0600

That patch didn't removes the crash, maybe it reduces it's occourance, on my side.

I hunt the hangup crash issue quite some time now and maybe i found the problem. Sometimes mISDN tries to set the state of ast to DOWN at the end of the channel release process. That is done even if the approbiate functions to hangup/relase ast are called priviously. The code segment where the DOWN state is set is at the end of the release_chan method and seems to be some king of safety to put the ast state to DOWN if something else priviously went wrong. I think that ast isn't valid at this time any more and so the simple state change crashes asterisk as ast isn't valid any more.
I tried to counter the crash in checking for the DOWN state priviously but that doesn't work. So i removed the complete code segement and since then the problem seem to be gone.
It's a little guessing as the last line in my misnd log in a case of a crash seem to be the " --> Setting AST State to down" statement.

I don't know if the problem is realy fixed and to be true i don't understand why this safety code segment is there, like it is. I would offer a patch but my state of the chan_misdn.c is different from all normal states so i could not offer a real patch. I show you the codesegemt to remove. (Added multiple patches from here to my 1.4.26 version to counter that problem also i'am using the transfer patch, so my sources are rather unique.)

if (ast && MISDN_ASTERISK_TECH_PVT(ast)) {
 chan_misdn_log(1, bc->port, "* RELEASING CHANNEL pid:%d ctx:%s dad:%s oad:%s state: %s\n", bc ? bc->pid : -1, ast->context, ast->exten, ast->cid.cid_num, misdn_get_ch_state(ch));
 chan_misdn_log(3, bc->port, " --> * State Down\n");

 if (ast->_state != AST_STATE_RESERVED) {
   chan_misdn_log(3, bc->port, " --> Setting AST State to down\n");
   ast_setstate(ast, AST_STATE_DOWN);

By: Sven Hirschmueller (sodom) 2009-12-07 04:38:50.000-0600

Just realised that you use asterisk 1.6.x not 1.4.x like myself, so maybe you had completely other problems.

By: Alexander Zielke (azielke) 2009-12-07 05:51:42.000-0600

The Last lines in my log  never where " --> Setting AST State to down" for me.

Just before the crash, my misdn-log usually showed something very similar to:
Wed Nov 25 12:10:49 2009: P[ 8]  I IND :RELEASE_COMPLETE oad: dad: pid:538 state:CONNECTED
Wed Nov 25 12:10:49 2009: P[ 8]   --> channel:0 mode:NT cause:16 ocause:16 rad: cad:
Wed Nov 25 12:10:49 2009: P[ 8]   --> info_dad: onumplan:0 dnumplan:0 rnumplan:0 cpnnumplan:0
Wed Nov 25 12:10:49 2009: P[ 8]   --> caps:Speech pi:0 keypad: sending_complete:0
Wed Nov 25 12:10:49 2009: P[ 8]   --> No need to queue hangup
Wed Nov 25 12:10:49 2009: P[ 8]  * IND : HANGUP pid:538 ctx:hangup dad:006... oad: State:CONNECTED
Wed Nov 25 12:10:49 2009: P[ 8]   --> l3id:7c0040
Wed Nov 25 12:10:49 2009: P[ 8]   --> cause:16
Wed Nov 25 12:10:49 2009: P[ 8]   --> out_cause:16
Wed Nov 25 12:10:49 2009: P[ 8]   --> Channel: mISDN/15-u744 hungup new state:CLEANING

And when looking further up in the logs, the interesting lines for this particular call are:
Wed Nov 25 12:10:49 2009: P[ 8]  I IND :DISCONNECT oad: dad:006... pid:538 state:CONNECTED
Wed Nov 25 12:10:49 2009: P[ 8]   --> channel:1 mode:NT cause:16 ocause:16 rad: cad:006...
Wed Nov 25 12:10:49 2009: P[ 8]   --> info_dad:9 onumplan:  dnumplan:  rnumplan:  cpnnumplan:0
Wed Nov 25 12:10:49 2009: P[ 8]   --> caps:Audio 3.1k pi:0 keypad: sending_complete:0
Wed Nov 25 12:10:49 2009: P[ 8]   --> org:2 nt:1, inbandavail:0 state:11
Wed Nov 25 12:10:49 2009: P[ 8]   --> queue_hangup

So, it already called ast_queue_hangup_with_cause for this call. The Logs before the crash even say "No need to queue hangup".

When misdn tries to ast_queue_hangup_with_cause again, it crashed.
When it got called, it tried enable a timer, where the handle was actually NULL.
So our problems, unfortunately, seem to be different.

Since installing the patch until now, the asterisk didn't crash for me (it usually crashed at least once per week, tho sometimes even 5 or 6 times per day).

By: Sven Hirschmueller (sodom) 2009-12-07 06:19:52.000-0600

Hmm, as i said, i realised to late that your using 1.6.x version. I guess chan_misdn is quite different from 1.6.x to 1.4.x.

So others should use my comment as a hint if they still have crashes even with your patch and otherwhise ignore my saying.

Maybe i run into the problem on my side only because i added tons of patches and optional code to get transfer and briding to meetme under control.

By: Maciej Krajewski (jamicque) 2010-01-25 04:23:43.000-0600

I guess that tickets 15952 and 15795 are concerning the same issue...

By: Maciej Krajewski (jamicque) 2010-01-26 10:26:35.000-0600

after applying this patch on asterisk does not compile...

By: Maciej Krajewski (jamicque) 2010-01-26 10:46:52.000-0600

ok it compiles it where my previous modifications :)

By: Maciej Krajewski (jamicque) 2010-01-27 05:18:59.000-0600

patch works great and solves the problem in 1.6.1.x

By: Michael Keuter (mkeuter) 2011-01-31 12:16:07.000-0600

I tested the patch in Asterisk 1.4.40-rc2, but I didn't helped.

I have the segfaults since 1.4.37-rc1, that's why I suspect this change of chan_misdn.c: http://svnview.digium.com/svn/asterisk?view=revision&sortby=file&revision=284478

By: Russell Bryant (russell) 2011-07-27 12:55:34.860-0500

Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.4 and 1.6.x branches has ended. For continued maintenance support please move to the 1.8 branch which is a long term support (LTS) branch. For more information about branch support, please see https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions

If this is still an issue, please open a new issue so it can be re-triaged appropriately. Thanks!