|Summary:||ASTERISK-17966: SEGMENTATION FAULTS/CRASH|
|Reporter:||Kenneth Van Velthoven (kvveltho)||Labels:|
|Date Opened:||2011-06-06 03:08:58||Date Closed:||2011-11-02 20:09:39|
|Environment:||CentOS 5.6||Attachments:||( 0) gdb1.txt|
( 1) gdb2.txt
( 2) gdb3.txt
( 3) jira_ast_668_v1.8.patch
( 4) jira_asterisk_17955_v1.8.patch
( 5) jira_asterisk_17966_libss7_v1.0_init.patch
( 6) jira_asterisk_17966_v1.8_glare.patch
( 7) valgrindKVV.txt
|Description:||Crashes randomly 4-5 TIMES A DAY. No matter on high load (180cc calls) or low load.|
|Comments:||By: Kenneth Van Velthoven (kvveltho) 2011-06-06 03:09:50.084-0500|
This is concerning version 126.96.36.199 I cannot select the versionwhile creating the issue?
By: Kenneth Van Velthoven (kvveltho) 2011-06-06 03:11:41.010-0500
I cannot atach files:
Unknown error occurred uploading file.
By: Kenneth Van Velthoven (kvveltho) 2011-06-08 03:10:04.554-0500
Can anyone see what is causing those crashes? Our system crashes sometimes 5-times a day.
By: David Woolley (davidw) 2011-06-08 05:44:52.566-0500
The first two, at least, indicate memory management problems, so you need to run under valgrind, and enable thread debugging.
By: Kenneth Van Velthoven (kvveltho) 2011-06-08 12:30:59.612-0500
David, how do I run under valgrind? Do I need to do "make valgrind" ? I've enabled ne neccessary debug options but I cannot find an explaination on how to run valgrind -- .... I don't have that executable.
By: David Woolley (davidw) 2011-06-08 12:53:08.540-0500
You will probably need to install valgrind first: http://valgrind.org/
Caution, it runs extremely slowly. It runs the machine code interpretively.
By: Kenneth Van Velthoven (kvveltho) 2011-06-08 13:59:35.902-0500
I've run Asterisk for a couple of minutes using valgrind. It is indeed veeeery slow, I coudn't get a call established, that slow. Hope the bit of valgrind capture helps. File attached.
By: Kenneth Van Velthoven (kvveltho) 2011-06-16 04:04:10.957-0500
Also saw this when crashing, after thelast message Asterisk dies:
[Jun 16 10:51:39] WARNING: chan_sip.c:2927 dialog_unlink_all: Unable to cancel schedule ID 16. This is probably a bug (chan_sip.c: dialog_unlink_all, line 2927).
[Jun 16 10:51:39] WARNING: chan_sip.c:2930 dialog_unlink_all: Unable to cancel schedule ID 0. This is probably a bug (chan_sip.c: dialog_unlink_all, line 2930).
[Jun 16 10:51:39] ERROR: astobj2.c:258 internal_ao2_ref: refcount -1 on object 0x2aaae4168358
[Jun 16 10:51:39] ERROR: astobj2.c:258 internal_ao2_ref: refcount -1 on object 0x2aaae42f2018
By: Kinsey Moore (kmoore) 2011-08-16 08:47:40.070-0500
Make sure Asterisk is compiled with MALLOC_DEBUG and get a refcount log from /tmp/refs of this error occurring. This should help us figure out where the extra deref is located.
By: Richard Mudgett (rmudgett) 2011-09-16 12:32:29.856-0500
[^jira_asterisk_17966_libss7_v1.0_init.patch] Fixes the use of uninitialized value in sls_to_link() reported by valgrind.
I cannot see why valgrind is reporting write(buf) pointing to uninitialized bytes. It may be a problem within the system's libc write function. It is probably benign.
The remaining invalid read in ast_channel_set_caller_event() was already fixed and I think is in v1.8.5.
By: Richard Mudgett (rmudgett) 2011-09-16 13:42:09.986-0500
[^jira_asterisk_17955_v1.8.patch] Fixes one crash backtrace reported by ASTERISK-17955 and also adds some missing libss7 access lock protection.
By: Richard Mudgett (rmudgett) 2011-09-16 13:51:25.339-0500
The [^jira_ast_668_v1.8.patch] fixes a deadlock because the ss7 linkset lock was never released and another potential deadlock when creating a new channel for an incoming call.
By: Kenneth Van Velthoven (kvveltho) 2011-09-16 13:54:44.127-0500
Great news there are some patches.
Meanwhile I have 1.8 on the server, still with the same problem.
The two 1.8 patches can be applied to any version of 1.8 asterisk?
The libss7 patch should be applied to libss7?
For the 3 patches can you tell me how to apply them? Which commands should I execute?
By: Richard Mudgett (rmudgett) 2011-09-16 14:50:49.359-0500
The v1.8 patches are against the current Asterisk 1.8 SVN branch. They should apply to v1.8.5. The deadlock patch definitely will not apply to versions earlier than v1.8.5.
The libss7 patch is against the 1.0 SVN branch. I have already committed the libss7 patch to the SVN 1.0 branch and there may be other fixes in SVN that are not in the 1.0 version.
patch -p0 -i <patch_file>
in the root source directory of Asterisk and libss7 for the respective patches.
Have you seen an ERROR dahdi_ss7_error Event queue full! message in your logs?
By: Kenneth Van Velthoven (kvveltho) 2011-09-16 18:04:32.808-0500
What I did:
make clean of the currect ss7 install
svn co http://svn.digium.com/svn/libss7/branches/1.0/
(didn't apply the ssè patch as you told me it was allready commited)
make/make install of the 1.0 branch
svn checkout http://svn.digium.com/svn/asterisk/trunk
applied both 1.8 patches to this version
When starting Asterisk I got:
[Sep 17 00:51:47] NOTICE: codec_g729a.c:760 load_module: G.729A transcoding module version 1.8.4_3.1.5, Copyright (C) 1999-2009 Digium, Inc.
[Sep 17 00:51:47] NOTICE: codec_g729a.c:771 load_module: for use in the OpenSSL Toolkit. (http://www.openssl.org/)
[Sep 17 00:51:47] NOTICE: codec_g729a.c:772 load_module: Copyright (C) 1998-2006 The OpenSSL Project
== Manager registered action G729LicenseStatus
== Manager registered action G729LicenseList
== Host-ID: 72:06:8c:71:82:a6:f8:5c:32:58:ef:ce:ea:90:38:93:77:87:e0:76
== Found license 'G729-ZXAQRWJLHM8R' providing 10 channels
== Found license 'G729-FMJTDFLJDAD3' providing 30 channels
== Found license 'G729-FTBA4BN2E3TK' providing 20 channels
== Found license 'G729-75FK8PWYQ9HN' providing 135 channels
== Found license 'G729-L2W2EZNH3L5X' providing 20 channels
== Found total of 215 G.729 licenses
[Sep 17 00:51:47] WARNING: translate.c:1060 __ast_register_translator: empty buf size, you need to supply one
Seems that the current g729 module doesn't work with SVN version of asterisk? Or is the trunk versin Asterisk 10 ?
Then I've downloaded latest 1.8.7 RC1
Wen applying the patches I got:
[root@linux7 asterisk-188.8.131.52-rc1]# patch -p0 -i jira_ast_668_v1.8.patch patching file channels/sig_ss7.c
[root@linux7 asterisk-184.108.40.206-rc1]# patch -p0 -i jira_asterisk_17955_v1.8.patch
patching file channels/sig_ss7.c
Hunk #2 succeeded at 611 (offset 13 lines).
Hunk #4 succeeded at 1549 (offset 8 lines).
First path didn't do anything?
Now 1.8.7rc1 is running and I'll get back to you the coming days with feedback.
By: Richard Mudgett (rmudgett) 2011-09-16 18:25:48.719-0500
http://svn.digium.com/svn/asterisk/trunk is Asterisk trunk where new development/features go. It will eventually become the next version of Asterisk after 10 since 10 is in beta and already branched.
http://svn.digium.com/svn/asterisk/branches/10 is Asterisk 10 where the latest patches and fixes for Asterisk 10 go.
http://svn.digium.com/svn/asterisk/branches/1.8 is Asterisk 1.8 where the latest patches and fixes for Asterisk 1.8 go.
The first patch you applied [^jira_ast_668_v1.8.patch] to the 220.127.116.11-rc1 source applied cleanly since everything was as expected for that patch. (Patch said it patched channels/sig_ss7.c)
The second patch you applied [^jira_asterisk_17955_v1.8.patch] found some patch locations moved from the expected location since you had already applied the first patch. That is expected in this case since both patches modify the same file and don't conflict with each other.
By: Kenneth Van Velthoven (kvveltho) 2011-09-19 04:51:45.139-0500
Bad news. Server allready crashed this morning.
I'll recompile this evenening enabling compiler flags en get back to you.
By: Richard Mudgett (rmudgett) 2011-09-19 15:52:02.732-0500
Can you please attach a SS7 debug trace so we can get an idea of your traffic flow? Thanks.
By: Kenneth Van Velthoven (kvveltho) 2011-09-20 05:13:47.328-0500
I've reverted back to 1.8.5 without the patches. With 1.8.7rc1 + patches the server crashed 6-8 times a days. I put everything back to SIPLinks and now the server will crash only 1 time a day maximum.
By: Richard Mudgett (rmudgett) 2011-09-30 17:49:11.911-0500
[^jira_asterisk_17966_v1.8_glare.patch] Adds protection for channel allocation and better glare handling. I also added a "ss7 show channels" CLI command that might prove useful for future debugging.
The [^jira_ast_668_v1.8.patch] and [^jira_asterisk_17955_v1.8.patch] have already been committed to the Asterisk v1.8 and later SVN branches.
By: Richard Mudgett (rmudgett) 2011-10-11 16:09:32.079-0500
I committed [^jira_asterisk_17966_v1.8_glare.patch] as well. This is as far as I can go with this issue without more information pointing to why Asterisk is crashing since the crash seems to be caused by memory corruption.
By: Richard Mudgett (rmudgett) 2011-10-27 10:56:40.309-0500
Have you tried the latest patch or latest v1.8 SVN since all related patches have been committed?
By: Richard Mudgett (rmudgett) 2011-11-02 20:09:39.419-0500
Suspending since could not reproduce and all created patches for things found have been committed.