[Home]

Summary:ASTERISK-10937: Random seg faults...
Reporter:akron (akron)Labels:
Date Opened:2007-11-30 06:22:41.000-0600Date Closed:2007-12-26 14:46:11.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/ManagerInterface
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) ast_1415_valgrind.txt
Description:We get sometimes seg faults on our one configuration. We change hardware and it dosnt solve our problems.

****** ADDITIONAL INFORMATION ******

From Valgrind:
==5000== Thread 1:
==5000== Invalid read of size 4
==5000==    at 0x810B2CA: el_gets (read.c:254)
==5000==    by 0x806EEB7: main (asterisk.c:2982)
==5000==  Address 0x4226614 is 68 bytes inside a block of size 776 free'd
==5000==    at 0x401BF6C: free (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==5000==    by 0x806BF1E: quit_handler (asterisk.c:1276)
==5000==    by 0x806CD1C: monitor_sig_flags (asterisk.c:2531)
==5000==    by 0x80FB954: dummy_start (utils.c:843)
==5000==    by 0x402EF5A: pthread_start_thread (in /lib/libpthread-0.10.so)
==5000==    by 0x41BDBE9: clone (in /lib/libc-2.3.6.so)
==5000==
==5000== Invalid read of size 4
==5000==    at 0x810B2E9: el_gets (read.c:258)
==5000==    by 0x806EEB7: main (asterisk.c:2982)
==5000==  Address 0x4226870 is 672 bytes inside a block of size 776 free'd
==5000==    at 0x401BF6C: free (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==5000==    by 0x806BF1E: quit_handler (asterisk.c:1276)
==5000==    by 0x806CD1C: monitor_sig_flags (asterisk.c:2531)
==5000==    by 0x80FB954: dummy_start (utils.c:843)
==5000==    by 0x402EF5A: pthread_start_thread (in /lib/libpthread-0.10.so)
==5000==    by 0x41BDBE9: clone (in /lib/libc-2.3.6.so)
==5000==
==5000== Invalid read of size 1
==5000==    at 0x810B2F2: el_gets (read.c:258)
==5000==    by 0x806EEB7: main (asterisk.c:2982)
==5000==  Address 0x6C is not stack'd, malloc'd or (recently) free'd
==5000==
==5000== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==5000==  Access not within mapped region at address 0x6C
==5000==    at 0x810B2F2: el_gets (read.c:258)
==5000==    by 0x806EEB7: main (asterisk.c:2982)


From gdb:
(gdb) bt full
#0  0x041b451a in poll () from /lib/libc.so.6
No symbol table info available.
#1  0x08085053 in ast_waitfor_nandfds (c=0xbcdff690, n=0, fds=0x0, nfds=0, exception=0x0,
   outfd=0x0, ms=0xbcdff68c) at channel.c:1981
       kbrms = 500
       start = {tv_sec = 1196421845, tv_usec = 249662}
       res = -1126173232
       rms = 500
       x = 0
       y = 0
       max = 0
       sz = 168
       now = 0
       whentohangup = 0
       diff = 500
       winner = (struct ast_channel *) 0x0
       __PRETTY_FUNCTION__ = "ast_waitfor_nandfds"
#2  0x08085ea7 in ast_waitfor_n (c=0xa8, n=168, ms=0xa8) at channel.c:2043
No locals.
#3  0x08070972 in autoservice_run (ign=0x0) at autoservice.c:84
       chan = (struct ast_channel *) 0xa8
       as = (struct asent *) 0xbcdff690
       ms = 500
       mons = {0x47f9338, 0x0 <repeats 222 times>, 0x4034700, 0x0 <repeats 12 times>, 0xbcdffbe0,
 0x0, 0x0, 0x4038ff4, 0xbcdffbe0, 0x814d9e8, 0xbcdffa78, 0x4030355, 0x4033cd1, 0x0, 0x0,
 0x4e9d920, 0x4e9d888, 0x0, 0xbcdffaa8, 0x4038ff4, 0x814d9e8, 0x0, 0xbcdffaa8, 0x4030742}
       x = 0
       __PRETTY_FUNCTION__ = "autoservice_run"
#4  0x080fb955 in dummy_start (data=0x1f4) at utils.c:843
       _buffer = {__routine = 0x8067cd0 <ast_unregister_thread>, __arg = 0x91000f,
 __canceltype = -1126171928, __prev = 0x0}
       ret = (void *) 0x4e9d888
       a = {start_routine = 0x80708e0 <autoservice_run>, data = 0x0,
 name = 0x4e9d888 "autoservice_run      started at [  114] autoservice.c ast_autoservice_start()"}
ASTERISK-1  0x0402ef5b in pthread_start_thread () from /lib/libpthread.so.0
No symbol table info available.
ASTERISK-2  0x041bdbea in clone () from /lib/libc.so.6
No symbol table info available.
Comments:By: Joshua C. Colp (jcolp) 2007-11-30 08:34:19.000-0600

Can you please give 1.4.15 a try? The autoservice stuff was improved a bit. As well what are the call flows like? Calls getting parked?

By: Joshua C. Colp (jcolp) 2007-12-17 11:02:35.000-0600

Suspended due to lack of response.

By: akron (akron) 2007-12-17 11:17:20.000-0600

We switched to 1.4.15 2 days ago and it crashed after 2 days of working. Now we have runing asterisk 1.4.15 with valgrind debuger.

But for 1.4.14 - we changed 1.5 week ago our network configuration to not natted server and It wasnt crashed.

Our simplify configuration for crash setup:
NetworkSwitch with ports to:
--[eth1-192.168.1.1]NatLinuxRouter -> internet
--[eth0-192.168.1.2]Asterisk
--[IpPhone1-192.168.1.5]
--[IpPhone2-192.168.1.6]

Maybe that kind of configuration is problematic.

Good configuration [more than week without crash]:
NetworkSwitch with ports to:
|--[eth1-192.168.1.2]Asterisk-[eth0-222.222.222.222]-Internet
|--[IpPhone1-192.168.1.5]
|--[IpPhone2-192.168.1.6]


By: akron (akron) 2007-12-18 09:00:21.000-0600

Next segfault:

Core was generated by `/usr/sbin/asterisk -vvvvvv -dddddd -g -c'.
Program terminated with signal 6, Aborted.
#0  0x40117c81 in kill () from /lib/libc.so.6
(gdb) full bt
Undefined command: "full".  Try "help".
(gdb) bt full
#0  0x40117c81 in kill () from /lib/libc.so.6
No symbol table info available.
#1  0x4002a4a1 in pthread_kill () from /lib/libpthread.so.0
No symbol table info available.
#2  0x4002a87b in raise () from /lib/libpthread.so.0
No symbol table info available.
#3  0x401178f8 in raise () from /lib/libc.so.6
No symbol table info available.
#4  0x40118f00 in abort () from /lib/libc.so.6
No symbol table info available.
ASTERISK-1  0x4014b6ce in __libc_message () from /lib/libc.so.6
No symbol table info available.
ASTERISK-2  0x40151518 in malloc_consolidate () from /lib/libc.so.6
No symbol table info available.
ASTERISK-3  0x40152422 in _int_malloc () from /lib/libc.so.6
No symbol table info available.
ASTERISK-4  0x40153fa2 in calloc () from /lib/libc.so.6
No symbol table info available.
ASTERISK-5  0x40605eb0 in sip_alloc (callid=0x0, sin=0x0, useglobal_nat=0, intended_method=3)
   at /home/jancio/asterisk-1.4.15/include/asterisk/utils.h:359
       __PRETTY_FUNCTION__ = "sip_alloc"
ASTERISK-6 0x40628daa in sip_poke_peer (peer=0x81aa870) at chan_sip.c:15601
       xmitres = 0
       __PRETTY_FUNCTION__ = "sip_poke_peer"
ASTERISK-7 0x40629a79 in sip_poke_peer_s (data=0x0) at chan_sip.c:7813
No locals.
ASTERISK-8 0x080ef2c8 in ast_sched_runq (con=0x81991a0) at sched.c:359
       cur = (struct sched *) 0x81ef680
       tv = {tv_sec = 135934256, tv_usec = -1098909392}
       numevents = 0
       res = 135934256
ASTERISK-9 0x4064eb9e in do_monitor (data=0x0) at chan_sip.c:15492
       res = 0
       sip = (struct sip_pvt *) 0x0
       peer = (struct sip_peer *) 0x0
       t = 135672448
       fastrestart = 0
       lastpeernum = -1
---Type <return> to continue, or q <return> to quit---
       curpeernum = 11
       reloading = 0
       __PRETTY_FUNCTION__ = "do_monitor"
ASTERISK-10 0x080fc2c5 in dummy_start (data=0x40030ff4) at utils.c:843
       _buffer = {__routine = 0x8067d20 <ast_unregister_thread>, __arg = 0x2800a,
 __canceltype = -1098908952, __prev = 0x0}
       ret = (void *) 0x81a38b8
       a = {start_routine = 0x4064e630 <do_monitor>, data = 0x0,
 name = 0x81a38b8 "do_monitor", ' ' <repeats 11 times>, "started at [15546] chan_sip.c restart_monitor()"}
ASTERISK-11 0x40026f5b in pthread_start_thread () from /lib/libpthread.so.0
No symbol table info available.
ASTERISK-12 0x401b6bea in clone () from /lib/libc.so.6
No symbol table info available.

Valgrind debug in attachment.

By: akron (akron) 2007-12-18 09:05:49.000-0600

Last from log:
 == Connect attempt from '192.168.100.2' unable to authenticate
*** glibc detected *** corrupted double-linked list: 0x08215080 ***
Aborted

By: Tilghman Lesher (tilghman) 2007-12-24 10:45:49.000-0600

Please try the patch uploaded to ASTERISK-11083, as it appears to be the same issue.

By: Digium Subversion (svnbot) 2007-12-26 14:40:19.000-0600

Repository: asterisk
Revision: 94808

U   branches/1.4/main/manager.c

------------------------------------------------------------------------
r94808 | tilghman | 2007-12-26 14:40:18 -0600 (Wed, 26 Dec 2007) | 6 lines

Workaround for what is probably a glibc bug (but we'll see this crop up again
and again, if we don't add the workaround).
Reported by: rolek
Patch by: tilghman
(Closes issue ASTERISK-11083, closes issue ASTERISK-10937)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=94808

By: Digium Subversion (svnbot) 2007-12-26 14:46:11.000-0600

Repository: asterisk
Revision: 94809

_U  trunk/
U   trunk/main/manager.c

------------------------------------------------------------------------
r94809 | tilghman | 2007-12-26 14:46:10 -0600 (Wed, 26 Dec 2007) | 14 lines

Merged revisions 94808 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r94808 | tilghman | 2007-12-26 14:43:38 -0600 (Wed, 26 Dec 2007) | 6 lines

Workaround for what is probably a glibc bug (but we'll see this crop up again
and again, if we don't add the workaround).
Reported by: rolek
Patch by: tilghman
(Closes issue ASTERISK-11083, closes issue ASTERISK-10937)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=94809