[Home]

Summary:ASTERISK-17688: [patch] segfault res_musiconhold.so when called party puts call on hold
Reporter:Michael Rack (rcrack2k)Labels:
Date Opened:2011-04-13 03:46:55Date Closed:2012-05-29 10:23:26
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_musiconhold
Versions:Frequency of
Occurrence
Related
Issues:
is duplicated byASTERISK-19636 Asterisk crashes during attended transfer due to bad data pointer passed in HOLD frame from chan_iax2
is duplicated byASTERISK-18756 Asterisk crashed randomly - moh
is duplicated byASTERISK-19597 Failure to pass NULL data pointer with AST_CONTROL_HOLD frame causes crash when MOH is started
Environment:Attachments:( 0) asterisk_crash_havlasm.txt
( 1) backtrace.20110509.txt
( 2) backtrace.txt
( 3) call_map.png
( 4) check_asterisk
( 5) haurein.log
( 6) haurein.striped.log
( 7) havlasm_backtrace_2011-11-02.txt
( 8) res_musiconhold.asterisk-r314015.segfault.ast_strlen_zero.patch
Description:Dear Digitum,

we used Asterisk 1.6.0 a long tome before. Now we switched over to Asterisk 1.8.3. The existing configuration worked without any modifications.

The problem:
When a called party puts our line on hold to transfer the call, asterisk is quit with a segfault in res_musiconhold.so. We can hear our MOH-Class for a random time (2 - 20 seconds that might be the time to transfer), but we have to hear the MOH from the other party, not the MOH from our Asterisk-Server!

The big problem to track the bug down is, that this problem is not always reproducible. Sometimes asterisk crashes, sometimes not. But asterisk always crashes, when we can hear our MOH. Asterisk does not crash when we can hear the MOH from the other party.

Our peer is a IAX2-Peer (pbx-network.de).
We considered this problem also on our backup IAX2-Peer (xlink.at).

Currently we run a trunk version of asterisk at revision 306540 but this problem is still not fixed.

That is the syslog-message when asterisk quits:

Apr  1 09:34:37 voip-01 kernel: [10366699.967654] asterisk[11071]: segfault at 6c003630 ip 00007f598962ba6f sp 00007f59700bfec0 error 4 in res_musiconhold.so[7f5989625000+a000]
Apr  6 11:22:35 voip-01 kernel: [10805178.329682] asterisk[26102]: segfault at 68003a00 ip 00007f758594da6f sp 00007f756c2c2ec0 error 4 in res_musiconhold.so[7f7585947000+a000]
Apr  7 10:13:52 voip-01 kernel: [10887455.598158] asterisk[26676]: segfault at c4004ba0 ip 00007f94e41aaa6f sp 00007f94caf26ea0 error 4 in res_musiconhold.so[7f94e41a4000+a000]

The Asterisk is currently in a Live-Production field, so we could not put our asterisk into a debug-state.

****** ADDITIONAL INFORMATION ******

Asterisk Trunk rev306540

Gentoo Base System release 1.12.13

Linux voip-01 2.6.35.7-amd64-xen-2.6.35.7 #1 SMP Tue Oct 12 10:35:31 Local time zone must be set--see zic  x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ AuthenticAMD GNU/Linux

Running on XEN DOM-U with 768 MB Ram, 1 CPU-Core
Comments:By: Leif Madsen (lmadsen) 2011-04-13 08:54:10

The only way to move this issue forward is to provide debugging information. This would include DEBUG level logging from the console leading up to the crash, along with a backtrace as described here:

* https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information
* https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Leif Madsen (lmadsen) 2011-04-14 09:11:08

OK, so now you just need to provide the backtrace.

Thanks!

By: Michael Rack (rcrack2k) 2011-04-18 03:49:04

I know. Now i've got a core-dump one. The Problem is not always reproducible so i had to wait for a crash.

Apr 18 10:37:34 voip-01 kernel: [11839276.928530] asterisk[14579]: segfault at 9c000c30 ip 00007f20bf9c14b1 sp 00007f20a3294260 error 4 in res_musiconhold.so[7f20bf9ba000+c000]

This was the Problem reported by KERNEL.
The Core-Dump-File is attached.

By: Michael Rack (rcrack2k) 2011-04-18 03:53:58

The file could not be uploaded via the "UPLOAD FILE" method, because the filesize of 2.15mb is to big.

The file is uploaded on my server:
http://www.michaelrack.de/public/download/core-dump.asterisk.bz2

By: Alec Davis (alecdavis) 2011-04-18 03:59:22

try ASTERISK-17378 I think I've seen this before.

the patch bug18781.diff3.txt

If it's the same, this was fixed in trunk at r310288

By: Michael Rack (rcrack2k) 2011-04-18 05:02:10

Sorry, i currently run trunk rev311466 ... The crash is based on 311466.

I think that my problem is not the same as fixed in r310288.

Currently i will try the last trunk version.

My Segfault is generated in res_musiconhold. So the Problem seems not to be the same as in Issue-Report 0018781.

By: Alec Davis (alecdavis) 2011-04-18 05:11:23

RcRaCk2k: with your core-dump, you first need to follow the backtrace info in ~133704 then upload the gdb.txt file that is created from that.

By: Michael Rack (rcrack2k) 2011-04-18 06:26:59

ok, so gdb requires the original executable of asterisk?
Damn. I've installed the last TRUNK before knowing that the core-dump is useless.

Now i have to wait for a new crash + core-dump.
Sorry guys.

By: Michael Rack (rcrack2k) 2011-04-20 08:55:46

So, now i've got the right backtrace.

return (!s || (*s == '\0'));
throws an segmentation fault @include/asterisk/strings.h:65
called from res_musiconhold.c:1311 in function local_ast_moh_start.

Hope that the problem could be located and fixed as far as possible.

By: Michael Rack (rcrack2k) 2011-04-28 04:22:07

So guys, sorry for interrupting, but is anyone checking this problem?

Is there a workaround, so that i can run asterisk without crashing? The System is used on a production server and in a production environment. Currently we have a crontab installed that starts asterisk after it has gone.

But this state is not optimal because on segfault all current calls will be disconnected.

By: Michael Rack (rcrack2k) 2011-04-28 05:26:04

Hi,

i patched the line in "res_musiconhold.c" that is passing "mclass" (out of bounds / null) to the static function ast_strlen_zero in "include/strings.h".

I hope that this patch will save my asterisk for future crashes.

I am not familiar with C / C++ and hope that my patch does not create other problems.

I am a JAVA / PHP Programmer and i mean, that the line "return (!s || (*s == '\0'));" in include/strings.h should not fail in segmentation fault. !s should prevent asterisk from setting '\0' to the address but it did not.

So i hope you can track the problem more down then i can.

By: mickeyratt (mickeyratt) 2011-05-04 07:18:43

Dear All

I encountered this problem with latest (1.6) 1.6.2.18 asterisk version too.
I using on production server for 1 month, so it is very frustrating. In kern.log:

May  4 13:44:08 digium kernel: [3472685.931049] asterisk[26715]: segfault at 880074a0 ip 00007f05acfec5db sp 00007f05945b4740 error 4 in res_musiconhold.so[7f05acfe6000+a000]

Dear RcRaCk2k, can you share your "crontab" solution? Or may I use safe_asterisk "to eliminate" this crash-problem?

By: Michael Rack (rcrack2k) 2011-05-04 08:47:44

I've attached the script that checks asterisk is alive.

PS: My Patch was not work. I had a segfault this day again, but the time to crash was a little bit longer then before. Asterisk was not run with option -g so i have to wait for a crash again, sorry.

By: mickeyratt (mickeyratt) 2011-05-04 09:16:19

RcRaCk2k
Thank you for your check-script!

By: Michael Rack (rcrack2k) 2011-05-09 07:06:09

So... My patch was not working... I could not check the out of bounds... I hope someone else can make a patch.

By: Martin Havlas (havlasm) 2011-10-27 02:26:00.979-0500

asterisk_crash_havlasm.txt = coredump from linux (debian 6.2.1 x64) - attached

well, according to issue: ASTERISK-18756

problem said here is still actual. situation, that causes crash is attached as the call_map.png



By: Martin Havlas (havlasm) 2011-11-02 05:39:32.683-0500

well..... I have unloaded res_musiconhold.so and hoped that will solve problem temporarily. Unfortunately it crashed down again - even that MOH was not loaded. I have next coredump for you.

partial info: it crashes down only when there is a IAX2 connection between asterisk 1.8.x and 1.2.x
(coredum attached: havlasm_backtrace_2011-11-02.txt)

By: Martin Havlas (havlasm) 2011-11-03 05:49:47.175-0500

Im desperate. Not the day when it does not fall. Again, same problem. Again IAX2 trunk 1.8.x vs 1.2.x, Again MOH...
I have to switch to SIP trunk those that have old Asterisk.

Any idea if compile flag "IAX_OLD_FIND" can affect functionality?

By: Martin Havlas (havlasm) 2012-01-03 06:55:15.120-0600

Why developers ignore this issue? At least some expression would be great.

By: Paul Belanger (pabelanger) 2012-01-03 11:00:38.637-0600

Not ignoring, there are over 750 open issue on the tracker, it takes time to triage them all.

By: Misha Slyusarev (misha.slyusarev) 2012-04-04 10:56:29.732-0500

Hi everyone!
Is there any update to this issue? I've got same problem ASTERISK-19636

By: Misha Slyusarev (misha.slyusarev) 2012-04-05 10:50:35.307-0500

Ok, I've got the same issue and it looks like the problem is in the use of 64-bit system. In my case I've switched to 32-bit and everything works fine. Hope that will be helpful.

By: Michael L. Young (elguero) 2012-05-29 10:24:46.413-0500

ASTERISK-19597 should fix this issue.