[Home]

Summary:ASTERISK-11867: My asterisk crashes randomly with very low volume
Reporter:Private Name (falves11)Labels:
Date Opened:2008-04-16 17:22:05Date Closed:2011-06-07 14:01:03
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:CDR/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) valgrind_3.txt
( 1) valgrind11629.txt
( 2) valgrind4036.txt
( 3) valgrind7462.txt
( 4) valgrind9379.txt
( 5) valgrind9711.txt
( 6) valgrindfull_with_gdb1.txt
( 7) valgrindfull_with_gdb2.zip
( 8) valgrindfull_with_gdb3.zip
Description:#0  0x0000000007bd1d16 in ?? ()
(gdb) bt full
#0  0x0000000007bd1d16 in ?? ()
No symbol table info available.
#1  0x00000000000927c0 in ?? ()
No symbol table info available.
#2  0x000000001572de30 in ?? ()
No symbol table info available.
#3  0x0000000000000008 in ?? ()
No symbol table info available.
#4  0x0000000000441a0d in ast_cdr_merge (to=Cannot access memory at address 0xfffffffffffffa80
) at cdr.c:572
       zcdr = (struct ast_cdr *) Cannot access memory at address 0xffffffffffffff30
(gdb)


****** ADDITIONAL INFORMATION ******

I was running Asterisk copiled with optimizations, to see if it made any difference, but under valgrind, and when it blew up, for the 10th time today, I did a gdb asterisk valgrind..core.xxx and I got the dat above.
I also see thisin the valghind.txt file
==16016== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==16016==  Access not within mapped region at address 0x38
==16016==    at 0x4C1FCC2: strlen (mc_replace_strmem.c:242)
==16016==    by 0x7B5205A: vfprintf (in /lib64/libc-2.5.so)
==16016==    by 0x7B73289: vsnprintf (in /lib64/libc-2.5.so)
==16016==    by 0x4D9867: __ast_str_helper (utils.c:1489)
==16016==    by 0x482482: ast_log (logger.c:1015)
==16016==    by 0xC25343D: ??? (chan_sip.c:18663)
==16016==    by 0x4C9E3F: ast_sched_runq (sched.c:365)
==16016==    by 0xC241087: ??? (chan_sip.c:18566)
==16016==    by 0x4D856B: dummy_start (utils.c:870)
==16016==    by 0x55BE2F6: start_thread (in /lib64/libpthread-2.5.so)
==16016==    by 0x7BDA85C: clone (in /lib64/libc-2.5.so)
==16016==
==16016== ERROR SUMMARY: 2461 errors from 14 contexts (suppressed: 5 from 1)
==16016== malloc/free: in use at exit: 35,323,776 bytes in 15,037 blocks.
==16016== malloc/free: 1,859,036 allocs, 1,843,999 frees, 1,381,631,570 bytes allocated.
==16016== For counts of detected errors, rerun with: -v
==16016== searching for pointers to 15,037 not-freed blocks.
==16016== checked 34,179,920 bytes.
==16016==
==16016== LEAK SUMMARY:
==16016==    definitely lost: 2,118,032 bytes in 2,019 blocks.
==16016==      possibly lost: 31,255 bytes in 104 blocks.
==16016==    still reachable: 33,174,489 bytes in 12,914 blocks.
==16016==         suppressed: 0 bytes in 0 blocks.
==16016== Rerun with --leak-check=full to see details of leaked memory.
Comments:By: Private Name (falves11) 2008-04-17 11:13:09

I submitted the trace to the engineers at Virtuozzo and this is what that have to say:

"Basically "Signal 11" - Segmentation Fault - means that the process tries to access the memory address space which was not allocated for this process by the system.
From the call trace provided it could be suggested that there was wrong pointer used to string function from Asterisk thread.

Since the issue arise rarely, it may mean that in some period of time the memory allocation function return the result which is not checked by the application and which is treated by this application as a proper result.
So that you could ask Asterisk developers to check if such issue could happen."


> ==25821== Process terminating with default action of signal 11
>    (SIGSEGV): dumping core
> ==25821==  Access not within mapped region at address 0x1
> ==25821==    at 0x4C1FCC2: strlen (mc_replace_strmem.c:242)
> ==25821==    by 0x7B5205A: vfprintf (in /lib64/libc-2.5.so)
> ==25821==    by 0x7B73289: vsnprintf (in /lib64/libc-2.5.so)
> ==25821==    by 0x4FF629: __ast_str_helper (utils.c:1537)
> ==25821==    by 0x4996EE: ast_log (logger.c:1015)
> ==25821==    by 0xC25E4FC: ??? (chan_sip.c:18663)
> ==25821==    by 0x4EAB1C: ast_sched_runq (sched.c:476)
> ==25821==    by 0xC25DEE9: ??? (chan_sip.c:18566)
> ==25821==    by 0x4FE018: dummy_start (utils.c:918)
> ==25821==    by 0x55BE2F6: start_thread (in /lib64/libpthread-2.5.so)
> ==25821==    by 0x7BDA85C: clone (in /lib64/libc-2.5.so)
> ==25821==


---
Thanks,

Sergey Zenkov
Senior Support Engineer
Parallels

-----

PARALLELS SUMMIT 2008. http://www.parallels.com/summit/ Mark your calendar and plan your travel to Washington, DC for May 19 - 20 to learn more about Hosting, SaaS, Virtualization and other exciting topics at Parallels Summit 2008! Business and technical people alike will find plenty of networking contacts, new ideas, feedback opportunities, and fun. Hope to see you there! For more information or to register, please visit http://www.parallels.com/summit.



> This what is happening every hour. Digium blames VZ.
> I need to restart my node in Centos, please help migrating the VE's
>    out.
>

By: Mark Michelson (mmichelson) 2008-04-18 14:29:34

chan_sip underwent a major change in trunk a couple of days ago. As a result, this problem may be fixed. If it is not, then please upload a new valgrind trace because the ones here do not apply to chan_sip as it is in trunk now.

Thanks.

By: Private Name (falves11) 2008-04-18 17:04:53

It took an hour and 20 mins to crash again.
I am uploading the valgrind all-in-one file.

By: Private Name (falves11) 2008-04-18 17:10:06

if a developer wishes to run my Asterisk in GDB (if that would solve the issue), I volunteer access my box. At this point is either I fix this or I go out of business.

By: Abhay Gupta (agupta) 2008-04-18 22:14:23

After removing VZ the server is up for the past 24 hours . In case there is crash we can upload all bt , btfull and valgrind traces .

We will keep you posted to see if the problem is resolved .

By: Jason Parker (jparker) 2008-05-06 14:20:41

So, it sounds like this is a problem with VZ?  Should we close?

By: Private Name (falves11) 2008-05-06 16:27:17

yes, please close it. But if you have time look into 12566.

By: Joshua C. Colp (jcolp) 2008-05-07 08:44:40

Closed per reporter.