[Home]

Summary:ASTERISK-06602: Asterisk crashes when res_features receives a bogus frame
Reporter:Perry Couprie (perry)Labels:
Date Opened:2006-03-23 09:38:21.000-0600Date Closed:2006-05-24 10:52:10
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_features
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) configuration.txt
( 1) crash-13.log
( 2) crash-14.log
( 3) crash-15.log
( 4) crash-16.log
( 5) crash-18.log
( 6) res_features.diff
Description:Hi,

I haven no idea why asterisk crashed. But i have atached a debug trace.
If you need more information let me know.

Greeting from Amsterdam (Netherlands).


****** ADDITIONAL INFORMATION ******

old backtrace snipped
Comments:By: Tilghman Lesher (tilghman) 2006-03-23 12:24:52.000-0600

Please upgrade to 1.2.5 and see if this issue still occurs.

By: Andrey S Pankov (casper) 2006-03-23 13:22:43.000-0600

You need to look at what is called at channel.c:1980 and then to use at least gdb 'print' to examine variables involved etc...

By: Perry Couprie (perry) 2006-03-24 05:28:35.000-0600

Corydon76: we also ran 1.2.4 on this server, and it crashed daily.
We had to go back to 1.2.1 to get it more stable, now it crashes
once every two weeks with the above crashdump.

The customer will not allow us to upgrade to 1.2.5 because
they are afraid the daily crashes will return.
This system is running in production in a 150 seat
inbound callcenter.

Perry Couprie

By: Andrey S Pankov (casper) 2006-03-29 08:52:54.000-0600

Any possibility to avoid using app_macro?

By: Serge Vecher (serge-v) 2006-05-02 10:22:02

perry: if you can't upgrade to the latest stable release v.1.2.7.1, I'm afraid not can be done. Can you model this at least in a test environment?

By: Vladimir S. Blazhkun (vovan) 2006-05-05 07:42:19

Seems i have the same issue.
asterisk-1.2.7.1 with patch from bug ASTERISK-6882, RH AS 4, HP DL380, hw is fully tested.
See crash-13.log for backtraces.

May  5 13:49:24 DEBUG[1393] channel.c: Got clone lock for masquerade on 'SIP/10.0.0.61-b7c604e8' at 0xb7c54b3c
May  5 13:49:24 DEBUG[1393] channel.c: Set channel SIP/10.0.0.61-b7c604e8 to write format alaw
May  5 13:49:24 DEBUG[1393] channel.c: Set channel SIP/10.0.0.61-b7c604e8 to read format alaw
May  5 13:49:24 DEBUG[1393] channel.c: Putting channel SIP/10.0.0.61-b7c604e8 in 8/8 formats
May  5 13:49:24 DEBUG[1393] channel.c: Released clone lock on 'Transfered/SIP/10.0.0.61-b7c604e8<ZOMBIE>'
May  5 13:49:24 DEBUG[1393] channel.c: Done Masquerading SIP/10.0.0.61-b7c604e8 (6)
May  5 13:49:24 DEBUG[1393] channel.c: Set channel Local/1601@pbx-08a3,1 to write format gsm
May  5 13:49:24 VERBOSE[1393] logger.c:     -- Playing 'beep' (language 'en')
May  5 13:49:24 VERBOSE[6904] logger.c:     -- Remote UNIX connection
May  5 13:49:24 DEBUG[1393] channel.c: Set channel Local/1601@pbx-08a3,1 to write format alaw

By: Serge Vecher (serge-v) 2006-05-05 09:26:15

vovan: is still is a backtrace from non-optimized build (it needs to be)?

By: Vladimir S. Blazhkun (vovan) 2006-05-05 09:39:54

Yes, it is. Builded with make dont-optimize.

By: Andrey S Pankov (casper) 2006-05-05 12:57:21

Or the backtrace is NOT from non-optimized build, or you are NOT using 1.2.7.1.

#make clean
#make dont-optimize (in 1.2.7.1 this will do 'make install' as well)

ASTERISK-2  0x006de535 in ast_bridge_call (chan=0xb7c54a70, peer=0x9a95440, config=0x99ab7b8) at res_features.c:1442
Should be free() - as in bt, but it is ast_frfree().

Is it a reproducible crash? How often does it happen and under what conditions?
It seems like it crashes inside DTMF parsing routine. Are you using app_macro
as the original reporter did? I don't see how this is related.

You may try:

} else {
- ast_frfree(f);
+ if (f) ast_frfree(f);
break

Does it help?
Thanks!


vechers: please do not delete files or backtraces. It does not facilitate
things, it make them even more difficult.
It seems like vovan should had filed a separate report. I can't see how
the both crashes are related.

By: BJ Weschke (bweschke) 2006-05-05 13:27:06

vovan: I remember your bt from the other bug where we fixed 1/2 of your problems. Seems you're still having issues with the 2nd problem from the bt attached. I'm sorry. :(
       I've attached a res_features.diff patch to this bug that I've put together against the 1.2 branch. Please apply it and recompile 'dont-optimize' and let me know what you come up with, making sure that DEBUG and VERBOSE are both turned up > 3 and you're logging that to a file. This patch isn't going to fix your problem, but it should hopefully:
a) provide a little bit of humor to a situation that probably isn't humorous at all right now because you're crashing and that's not fun.
b) give us some good insight as to what is going on. from all the bt's I've seen from you with regard to these crashes, you are dying out always at the same spot. To that extent, this is good because it will be a little easier for us to track down where the rogue frame is coming from that's causing your issue.

Finally - if you could give us some background on your dial plan and queue configuration and the exact steps you're taking to reproduce this (eg. 1. call in 2. press key '*', etc. etc) we'll get a couple more folks to try and reproduce as well.

casper: thanks for your input on this bug and trying to assist with this bug, but it's not 100% right. the patch you've provided to vovan isn't going to work for him because res_features.c is already validating that f is a valid data structure before entering that code block up at 1412.

By: BJ Weschke (bweschke) 2006-05-05 13:35:23

vovan: additionally, this bad frame, I'm assuming is coming from ast_generic_bridge which, with debug turned up should be producing a message like "Got DTMF on channel ", in the DEBUG logs right before things go real bad. It'd be good for us to see that message specifically, because then we'd also be able to begin to track it back to a specific channel driver/technology that is feeding the bad frame.

By: Andrey S Pankov (casper) 2006-05-05 13:37:11

There are lots of places in the code where we have race conditions.
With a check just before the call it is more likely it will not crash.
The other option would be to add an else with ast_log(LOG_ERROR, "NULL!!!\n");

By: Vladimir S. Blazhkun (vovan) 2006-05-05 14:54:23

bweschke:

I've applied your latest patch and recompiled 1.2.7.1 with make dont-optimize, waiting for a new crash.

See attached configuration.txt for some parts of dialplan and agents configs.

Logging and debugging are turned on (/usr/sbin/asterisk -ddddddgvvvvvv).

This crash is reproduced usually on heavy load, for example with 15-20 incoming calls (i don't know yet exact way to reproduce it).

I got this 2 messages about DTMF in a few minutes before latest crash (crash-13.log):

May  5 13:44:22 DEBUG[955] channel.c: Got DTMF on channel (Agent/1413)
May  5 13:46:23 DEBUG[1072] channel.c: Got DTMF on channel (Agent/1410)

By: Vladimir S. Blazhkun (vovan) 2006-05-05 14:57:51

Also i have zillions of:

May  5 09:55:54 WARNING[11937] chan_sip.c: Asked to transmit frame type 64, while native formats is 8 (read/write = 8/8)
May  5 09:55:54 WARNING[11937] chan_sip.c: Asked to transmit frame type 64, while native formats is 8 (read/write = 8/8)
May  5 09:55:54 WARNING[11937] chan_sip.c: Asked to transmit frame type 64, while native formats is 8 (read/write = 8/8)
May  5 09:55:54 WARNING[11937] chan_sip.c: Asked to transmit frame type 64, while native formats is 8 (read/write = 8/8)
May  5 09:55:54 WARNING[11937] chan_sip.c: Asked to transmit frame type 64, while native formats is 8 (read/write = 8/8)
May  5 09:55:54 WARNING[11937] chan_sip.c: Asked to transmit frame type 64, while native formats is 8 (read/write = 8/8)
May  5 09:55:54 WARNING[11937] chan_sip.c: Asked to transmit frame type 64, while native formats is 8 (read/write = 8/8)
May  5 09:55:55 WARNING[11937] chan_sip.c: Asked to transmit frame type 64, while native formats is 8 (read/write = 8/8)

in my logs. :(

By: Andrey S Pankov (casper) 2006-05-06 11:52:16

exten => 12/_1[146]XX,n(logout),Dial(Local/99@apps/n,,D(#))

Maybe the problem is here?

By: Vladimir S. Blazhkun (vovan) 2006-05-06 14:58:19

I dont't think so. Just tested it about 50 times, no crashes. Why do you suspect this extension?

By: Vladimir S. Blazhkun (vovan) 2006-05-10 08:45:32

Latest asterisk log messages (for crash-14.log):

May 10 16:08:45 DEBUG[26013] channel.c: Putting channel SIP/10.0.0.61-b7400018 in 8/8 formats
May 10 16:08:45 DEBUG[26013] channel.c: Released clone lock on 'Transfered/SIP/10.0.0.61-b7400018<ZOMBIE>'
May 10 16:08:45 DEBUG[26013] channel.c: Done Masquerading SIP/10.0.0.61-b7400018 (6)
May 10 16:08:45 DEBUG[26013] channel.c: Set channel Local/1303@pbx-252d,1 to write format gsm
May 10 16:08:45 VERBOSE[26013] logger.c:     -- Playing 'beep' (language 'en')
May 10 16:08:45 DEBUG[26161] rtp.c: Difference is 720, ms is 110
May 10 16:08:46 DEBUG[26013] channel.c: Set channel Local/1303@pbx-252d,1 to write format alaw
May 10 16:08:46 DEBUG[26013] res_features.c: ----BEGIN---- WE'RE WATCHING YOU NOW! YOU, WHO CALL YOURSELF A FRAME!
May 10 16:08:46 VERBOSE[26013] logger.c: >> [ TYPE: Unknown Frametype '160' (160) SUBCLASS: Unknown Subclass (160) ]
[BIGBROTHER]
May 10 16:08:46 DEBUG[26013] res_features.c: ----END------ WE'RE WATCHING YOU NOW! YOU, WHO CALL YOURSELF A FRAME!

By: BJ Weschke (bweschke) 2006-05-10 09:13:28

vovan: where is this "Playing 'beep'" coming from? Do you have the dialplan for 99 and 1303?

By: Vladimir S. Blazhkun (vovan) 2006-05-10 09:22:01

I think that this 'beep' sound is playing during attended transfer via features.conf.

Confirmed, just checked. This sound is playing to 3-rd phone (to which we do the attended transfer) just before bridging with calling phone.



By: Vladimir S. Blazhkun (vovan) 2006-05-11 06:22:19

I've found how to reproduce this crash. Enable presense at Xten eyeBeam softphone and do an attended transfer to it from queue with agents channels.

Latest asterisk log messages (for crash-15.log):
May 11 09:10:43 DEBUG[24353] channel.c: Done Masquerading SIP/10.0.0.61-b753bd10 (6)
May 11 09:10:43 DEBUG[24353] channel.c: Set channel Local/1303@pbx-fbf7,1 to write format gsm
May 11 09:10:43 VERBOSE[24353] logger.c:     -- Playing 'beep' (language 'en')
May 11 09:10:43 DEBUG[24451] rtp.c: Difference is 704, ms is 108
May 11 09:10:43 DEBUG[24439] app_queue.c: It's not our turn (SIP/10.0.0.61-b7521938).
May 11 09:10:44 DEBUG[24419] app_queue.c: It's not our turn (SIP/10.0.0.61-b7508400).
May 11 09:10:44 DEBUG[24353] channel.c: Set channel Local/1303@pbx-fbf7,1 to write format alaw
May 11 09:10:44 DEBUG[24353] res_features.c: ----BEGIN---- WE'RE WATCHING YOU NOW! YOU, WHO CALL YOURSELF A FRAME!
May 11 09:10:44 VERBOSE[24353] logger.c: >> [ TYPE: Unknown Frametype '160' (160) SUBCLASS: Unknown Subclass (160) ]
[BIGBROTHER]
May 11 09:10:44 DEBUG[24353] res_features.c: ----END------ WE'RE WATCHING YOU NOW! YOU, WHO CALL YOURSELF A FRAME!

By: BJ Weschke (bweschke) 2006-05-11 07:58:06

vovan: I'm marking this related to ASTERISK-6904, not because it's the same issue, but because I think the cause is the same. You're coming back from the attended transfer, and the original bridge is deciding to continue executing which is proving problematic because we're getting bogus frames now off a channel that we really don't own anymore. That's my assumption as to what's going on. As soon as we get a valid patch for condition #1, we'll be able to try it here too.

By: Vladimir S. Blazhkun (vovan) 2006-05-15 07:47:13

Latest asterisk log messages (for crash-16.log):
May 15 16:17:45 DEBUG[5744] channel.c: Set channel Local/1205@pbx-8f7d,1 to write format alaw
May 15 16:17:45 DEBUG[5744] res_features.c: ----BEGIN---- WE'RE WATCHING YOU NOW! YOU, WHO CALL YOURSELF A FRAME!
May 15 16:17:45 VERBOSE[5744] logger.c: >> [ TYPE: Unknown Frametype '160' (160) SUBCLASS: Unknown Subclass (160) ] [BIGBROTHER]
May 15 16:17:45 DEBUG[5744] res_features.c: ----END------ WE'RE WATCHING YOU NOW! YOU, WHO CALL YOURSELF A FRAME!
May 15 16:17:45 DEBUG[5744] channel.c: Hanging up channel 'Agent/1407'
May 15 16:17:45 DEBUG[5744] chan_agent.c: Hangup called for state Up
May 15 16:17:45 DEBUG[5744] pbx.c: Spawn extension (default,t,4) exited non-zero on 'Transfered/SIP/10.0.0.61-b7d1ed28<ZOMBIE>'
May 15 16:17:45 DEBUG[22588] devicestate.c: Changing state for Agent/1407 - state 1 (Not in use)
May 15 16:17:45 DEBUG[22588] devicestate.c: Changing state for Agent/1407 - state 1 (Not in use)
May 15 16:17:45 DEBUG[5952] app_queue.c: Device 'Agent/1407' changed to state '1' (Not in use)

By: Vladimir S. Blazhkun (vovan) 2006-05-18 08:03:07

Are there any news concerning this issue? I have 1-2 crashes per day at the same point.

Latest asterisk log messages (for crash-18.log):
May 18 13:00:35 DEBUG[19282] channel.c: Set channel Agent/1413 to write format slin
May 18 13:00:35 DEBUG[19693] channel.c: Set channel Local/1302@pbx-cf40,1 to write format alaw
May 18 13:00:35 DEBUG[19693] res_features.c: ----BEGIN---- WE'RE WATCHING YOU NOW! YOU, WHO CALL YOURSELF A FRAME!
May 18 13:00:35 VERBOSE[19693] logger.c: >> [ TYPE: Unknown Frametype '160' (160) SUBCLASS: Unknown Subclass (160) ] [BIGBROTHER]
May 18 13:00:35 DEBUG[19693] res_features.c: ----END------ WE'RE WATCHING YOU NOW! YOU, WHO CALL YOURSELF A FRAME!
May 18 13:00:35 DEBUG[19693] channel.c: Hanging up channel 'Agent/1403'
May 18 13:00:35 DEBUG[19693] chan_agent.c: Hangup called for state Up
May 18 13:00:35 DEBUG[19693] pbx.c: Spawn extension (default,t,4) exited non-zero on 'Transfered/SIP/10.0.0.61-b7c588e8<ZOMBIE>'
May 18 13:00:35 DEBUG[21467] devicestate.c: Changing state for Agent/1403 - state 1 (Not in use)
May 18 13:00:35 DEBUG[21467] devicestate.c: Changing state for Agent/1403 - state 1 (Not in use)

By: Serge Vecher (serge-v) 2006-05-22 16:09:51

vovan: can you please try the latest 1.2 branch (rev >29196) to see  if the the fix for 7090 fixes your issue as well?

By: Serge Vecher (serge-v) 2006-05-23 11:56:57

actually, make that rev. 29696; bweschke has put in a fix earlier this morning that hopefully will resolve this issue. Please test and report back your findings!

By: Vladimir S. Blazhkun (vovan) 2006-05-24 03:11:36

vechers: Unfortunately, it is not possible for me to compile whole svn release of Asterisk. Can i just do diff to res_features.c from 1.2.7.1 and svn rel. 29696 than apply this diff as a patch to 1.2.7.1?

By: Serge Vecher (serge-v) 2006-05-24 08:48:14

vovan: if you want to isolate just this change, here is a reference for you http://lists.digium.com/pipermail/svn-commits/2006-May/013898.html

By: Kevin P. Fleming (kpfleming) 2006-05-24 10:52:08

The fix committed to branch-1.2 and trunk will take care of this issue. If it does not, please request this bug to be reopened and provide details for reproducing it if possible.