[Home]

Summary:ASTERISK-05873: Asterisk crashes when it attempts to free a bogus frame
Reporter:paradise (paradise)Labels:
Date Opened:2005-12-20 03:21:34.000-0600Date Closed:2006-05-10 11:00:39
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_dial
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20051224__bug6032__debug2.diff.txt
( 1) ast_frfree.c.txt
( 2) bt-adomjan.txt
( 3) extensions.conf
( 4) New_Crash_2.txt
( 5) new_crash_3.txt
( 6) new_crash_more.txt
( 7) New_Crash_more2.txt
( 8) new_crash.txt
Description:no clue about this crash.
it occurs 3-4 times per week.
Comments:By: paradise (paradise) 2005-12-20 03:23:02.000-0600

... and i can not reproduce it.

By: Matt Riddell (zx81) 2005-12-20 04:21:55.000-0600

Are you using OpenH.323?

By: paradise (paradise) 2005-12-20 06:44:29.000-0600

no, never

By: Tilghman Lesher (tilghman) 2005-12-20 06:58:08.000-0600

Additional backtrace information needed:

(gdb) frame 1
(gdb) p *fr
(gdb) frame 2
(gdb) p *chan
(gdb) p *(char *)data

By: paradise (paradise) 2005-12-20 07:29:34.000-0600

Corydon76: Attached!

By: Russell Bryant (russell) 2005-12-22 13:54:42.000-0600

Is this version of Asterisk patched in any way?  What channel drivers are you using?

By: Tilghman Lesher (tilghman) 2005-12-22 13:59:22.000-0600

Sorry... that last one should have been:

(gdb) p (char *)data

By: Tilghman Lesher (tilghman) 2005-12-22 14:46:15.000-0600

Basically, what we have here is that you're receiving a completely bogus frame, and when we try to free it, it's not in allocated memory, so we're failing with a memory error.  We need to try to figure out exactly where this bogus frame is coming from, then we can figure out why and how to fix it.



By: Tilghman Lesher (tilghman) 2005-12-22 14:56:37.000-0600

What SIP codec are you using?

By: paradise (paradise) 2005-12-22 22:38:34.000-0600

Corydon76:
- p (char *)data output uploaded.
- just ULAW is used.

drumkilla:
- no. its not patched.
- i'm just using sip and zap (TE405P) with TA750... and chan_local is also used.



By: paradise (paradise) 2005-12-24 07:57:51.000-0600

> you're receiving a completely bogus frame, and when we try to free it,
> it's not in allocated memory

So as a quick fix it seems that we need to patch ast_frfree() to validate a frame before freeing it, till fixing the main problem.

ie: datalen of the frame in my backtrace is "0". so the frame seems to be bogus.

$1 = {frametype = 12348732, subclass = 11242390, datalen = 0, samples = 12193264, mallocd = 11563360, offset = 11562608,
 src = 0xb07610 "U\211ã\203ñ4\211]?æS\026??\201ÃÖ?\v", data = 0xb07440, delivery = {tv_sec = 11114272, tv_usec = 11555888},
 prev = 0x0, next = 0x0}



By: paradise (paradise) 2005-12-24 08:07:17.000-0600

i uploaded a sample patch.

By: Tilghman Lesher (tilghman) 2005-12-24 08:32:16.000-0600

I'm afraid we can't do it that way.  We have to find out where the bogus frame is coming from and stop it from being queued in the first place.

By: paradise (paradise) 2005-12-24 08:47:06.000-0600

i've seen many crashes which are occured when freeing bogus pointers. ie: bug ASTERISK-5755813

so as a feature to astreisk isn't it necessary to detect bogus things, prevent crashes and just log them as a refrence to fix the main bugs?



By: Tilghman Lesher (tilghman) 2005-12-24 09:38:47.000-0600

Your patch doesn't detect all bogus frames, and indeed will even detect frames that AREN'T bogus.  It's not that simple to detect bogus frames.

Even if you could know what pointers were bogus for free'ing, that would hide problems in Asterisk, creating a hacked mess of memory leaks, instead of allowing us to fix the real problems.  We have to find the real source of the problem NOW; we cannot postpone it by hacking around a _symptom_ of the issue, instead of coding a fix for the _cause_ of the issue.

By: paradise (paradise) 2005-12-24 12:10:47.000-0600

OK. ;-)

another crash with all required backtrace info attached.

By: Tilghman Lesher (tilghman) 2005-12-24 13:54:42.000-0600

Patch to assist in debugging.  What this will help us to do is to figure out which routine is generating the bogus frame.  Be warned, this will create a LOT of extra logs.

By: Tilghman Lesher (tilghman) 2005-12-24 14:01:17.000-0600

Eh, this one's a little better.  Will only create a log entry when it finds a frame that it thinks is bogus.

By: paradise (paradise) 2005-12-31 01:07:46.000-0600

it seems that your patch crashes *. i couldn't use it.

#0  0x08065469 in __ast_read (chan=0x812a708, dropaudio=1038) at channel.c:1994
       f = (struct ast_frame *) 0x0
       blah = 135441152
       prestate = 6
       fromwhere = 0x80f568f "chan->tech->read"
       func = (int (*)(void *)) 0x40e
       data = (void *) 0x0
       res = 1039
       null_frame = {frametype = 5, subclass = 0, datalen = 0, samples = 0, mallocd = 0, offset = 0, src = 0x0, data = 0x0,
 delivery = {tv_sec = 0, tv_usec = 0}, prev = 0x0, next = 0x0}
       __PRETTY_FUNCTION__ = "__ast_read"

By: Olle Johansson (oej) 2006-01-04 12:26:48.000-0600

Corydon: Any ideas on how to catch this frame?


/Housekeeping

By: Niles Ingalls (atheos) 2006-01-10 20:16:15.000-0600

just had the same problem, on an unpatched - unmodified SVN 7921
The crash occured the moment I pressed digit 2, which exits the conference
in my config, and runs a macro.

   -- Executing MeetMe("IAX2/kkai13-21", "1701|AMX") in new stack
 == Parsing '/etc/asterisk/meetme.conf': Found
   -- Created MeetMe conference 1023 for conference '1701'
   -- Playing 'conf-onlyperson' (language 'en')
   -- Started music on hold, class 'soundscapes', on IAX2/kkai13-21
   -- Hungup 'Zap/pseudo-1407569397'
   -- Hungup 'Zap/pseudo-63284361'
   -- Executing Set("IAX2/kkai13-21", "CONF=1701") in new stack
   -- Executing GotoIf("IAX2/kkai13-21", "1?3:4") in new stack
   -- Goto (conf,2,3)
   -- Executing Goto("IAX2/kkai13-21", "confescape|escape|1") in new stack
   -- Goto (confescape,escape,1)
   -- Executing Set("IAX2/kkai13-21", "TIMEOUT(digit)=2") in new stack
   -- Digit timeout set to 2
   -- Executing BackGround("IAX2/kkai13-21", "script25a") in new stack
*** glibc detected *** double free or corruption (!prev): 0x08157088 ***
Aborted


<*added*>
Just discovered that if I change my musiconhold.conf configuration from (all .ul files):
[soundscapes]
mode=files
directory=/var/lib/asterisk/mohmp3/soundscapes_ul
random=yes

TO (all mp3 files):
[soundscapes]
mode=quietmp3
directory=/var/lib/asterisk/mohmp3/soundscapes
random=yes

Then, I don't crash at all.



By: Tilghman Lesher (tilghman) 2006-01-10 22:05:06.000-0600

Ooops.  Let's try that patch again.

By: Russell Bryant (russell) 2006-01-22 11:18:25.000-0600

A fix just went in to fix a crash in the MixMonitor/ChanSpy related code.  This could possibly be related.  Please update if you are having this problem.

By: adomjan (adomjan) 2006-02-08 10:52:22.000-0600

I have still crash with ChanSpy. A found many closed bugs, but this bug still exist and this bugnote is still open.

version:
SVN-branch-1.2-r9156

Backtrace and a simple extensions.conf uploaded.
All calls are sip. * crashes all the time, when I pick up the triggered outgoing call first, and after the called phone.

By: paradise (paradise) 2006-02-10 01:35:00.000-0600

russell: i also had crashes with updated box. this is done with trunk-r8961.
BT is attached.

By: Justin R. Tunney (jtunney) 2006-02-10 16:36:06.000-0600

I don't have time to reproduce this right now but if changing MOH from native to mp3 mode fixes the problem then this bug might be related to ASTERISK-6229 (http://bugs.digium.com/view.php?id=6391)  Sometimes MOH-native will set the channel write format incorrectly and end up writing mu-law data when * expects SLIN.

By: paradise (paradise) 2006-02-20 10:37:13.000-0600

upgraded to latest trunk but still my box crashes.
1-2 times per day. :-(

By: Olle Johansson (oej) 2006-04-04 08:35:18

Any more ideas, Corydon? Russell?

By: Serge Vecher (serge-v) 2006-05-02 10:33:48

paradise: is this still an issue?

By: Serge Vecher (serge-v) 2006-05-10 11:00:39

no response from any of the reporters who've had trouble in this bug. If this still an issue in the latest 1.2, please open a new report with a backtrace attached from a non-optimized build. Thank you.