[Home]

Summary:ASTERISK-19387: Seg Fault upon Asterisk Startup
Reporter:Vladimir Mikhelson (vmikhelson)Labels:
Date Opened:2012-02-18 18:26:13.000-0600Date Closed:2012-02-23 16:42:20.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:1.8.9.2 1.8.9.3 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:CentOS 5.7, FreePBX 2.9.0.9Attachments:( 0) 2012-02-22--backtrace.txt
( 1) 2012-02-22--gdb.txt
( 2) 2012-02-24--backtrace.txt
( 3) 2012-02-24--gdb.txt
( 4) backtrace.txt
( 5) gdb.txt
Description:Asterisk crashes with the Segmentation Fault upon startup.  Often times it does it several times in a row.  Then it eventually loads.  The problem started after the upgrade to 1.8.x from 1.6.2.x.
Comments:By: Vladimir Mikhelson (vmikhelson) 2012-02-18 18:33:04.703-0600

I do know why I still see <value optimized out> as I recompiled with DONT_OPTIMIZE and BETTER_BACKTRACES.

By: Walter Doekes (wdoekes) 2012-02-20 02:11:55.360-0600

-You did do 'make install' afterwards? Are you running a different copy of asterisk next to it? (E.g. one in /usr and one in /usr/local)-
(Never mind, those values optimized out are in libresample. You haven't recompiled that, so you won't see the benefits of DONT_OPTIMIZE there.)

So.. you're looking at a crash in the libresample libs, called from codec_resample.c:load_module. What version of libresample are you using?

By: Vladimir Mikhelson (vmikhelson) 2012-02-20 18:34:49.282-0600

Installed Packages
libresample.i386                       0.1.3-1_centos5                 installed
libresample-devel.i386                 0.1.3-1_centos5                 installed


Yum tells me no updates are available.

By: Vladimir Mikhelson (vmikhelson) 2012-02-22 01:17:12.309-0600

Installed resample (trunk) with all dependencies in hopes to get a better back trace.

After restart Asterisk seg faulted three (3) times in a row.  Attached are the files generated based on the last core dump.

By: Walter Doekes (wdoekes) 2012-02-22 03:01:33.998-0600

Are you sure the right libs are used? Did you uninstall the CentOS/yum-supplied libresample and recompile?

It says v += t is causing the segfault, but that is really not possible (especially not if they're 0 and 0).

By: Vladimir Mikhelson (vmikhelson) 2012-02-22 09:49:09.119-0600

I did recompile Asterisk. I observed ./configure confirmed seeing resample.  But I did not uninstall yum-supplied resample as I was not sure about other possible dependencies.

Is there a way to verify which library Asterisk used?

By: Walter Doekes (wdoekes) 2012-02-22 10:22:11.043-0600

$ ldd /usr/lib/asterisk/modules/codec_resample.so | grep resample
libresample.so.1 => /usr/lib/libresample.so.1 (0x00007f7f18203000)

Your handcompiled resample is probably installed in /usr/local/lib?

By: Vladimir Mikhelson (vmikhelson) 2012-02-22 10:45:53.098-0600

The compiled libresample.so.1 is installed in /usr/lib.

I went ahead and ran "yum erase libresample*".  Then I ran "make install" for resample.

Then I recompiled and installed Asterisk.

Here are the "ldd /usr/lib/asterisk/modules/codec_resample.so | grep resample" results before and after Asterisk re-install:

Before:
[root@pbx ~]# ldd /usr/lib/asterisk/modules/codec_resample.so | grep resample
       libresample.so.1.0 => /usr/lib/libresample.so.1.0 (0x0079a000)

After:
[root@pbx ~]# ldd /usr/lib/asterisk/modules/codec_resample.so | grep resample
       libresample.so.1.0 => /usr/lib/libresample.so.1.0 (0x00e6b000)

Date and time on the libresample.so.1.0 were changed as expected.

This time no seg fault.



By: Matt Jordan (mjordan) 2012-02-23 16:42:20.564-0600

Closing this out as not a bug - thanks for helping Vladimir hunt this down Walter

By: Vladimir Mikhelson (vmikhelson) 2012-02-23 22:46:23.260-0600

Here is the first casualty.

Cannot run "make menuselect" any more.  It chokes on a bunch of messages like the following:

menuselect_gtk.c:349: error: expected expression before 'GtkContainer'
menuselect_gtk.c:349: warning: implicit declaration of function 'gtk_widget_get_type'
menuselect_gtk.c:349: error: expected expression before 'GtkWidget'
menuselect_gtk.c:349: warning: passing argument 1 of 'gtk_container_add' makes pointer from integer without a cast
menuselect_gtk.c:349: warning: passing argument 2 of 'gtk_container_add' makes pointer from integer without a cast
menuselect_gtk.c:351: error: expected expression before 'GtkBox'
menuselect_gtk.c:351: warning: passing argument 1 of 'gtk_box_pack_end' makes pointer from integer without a cast
make[1]: *** [menuselect_gtk.o] Error 1
make[1]: Leaving directory `/usr/src/1.8.9.3/menuselect'
make: *** [menuselect/gmenuselect] Error 2

It looks like a there is a version conflict between GTK and GLIB.  I needed to compile GLIB in order to compile LIBRESAMPLE.

I uninstalled GLIB by YUM (glib.i386 1.2.10-20.el5), ran "./configure, make, make install" for GLIB 2.31.18 to no avail.

The following may be relevant for analysis:

[root@pbx 1.8.9.3]# rpm -qa |grep gtk
libgtk-java-2.8.7-3.el5
pygtk2-devel-2.10.1-12.el5
gtk-vnc-python-0.3.8-3.el5
gtkhtml2-devel-2.11.0-3
gtkspell-2.0.11-2.1
gtk-sharp2-devel-2.10.0-6.el5.centos
gtkspell-devel-2.0.11-2.1
pygtk2-2.10.1-12.el5
gtk2-devel-2.10.4-21.el5_7.7
ghostscript-gtk-8.70-6.el5_7.6
pygtk2-codegen-2.10.1-12.el5
gtksourceview-devel-1.8.0-1.fc6
gtk-doc-1.7-1.fc6
gtk2-2.10.4-21.el5_7.7
gtk-vnc-0.3.8-3.el5
gtk-sharp2-2.10.0-6.el5.centos
gtk-vnc-devel-0.3.8-3.el5
pygtk2-libglade-2.10.1-12.el5
libgtk-java-devel-2.8.7-3.el5
authconfig-gtk-5.3.21-7.el5
gtksourceview-1.8.0-1.fc6
gtk-xfce-engine-2.4.2-1.el5.centos
gtk-sharp2-gapi-2.10.0-6.el5.centos
gtkhtml3-devel-3.16.3-1.el5
usermode-gtk-1.88-3.el5.2
gtk2-engines-2.8.0-3.el5
gtkhtml2-2.11.0-3
gtkhtml3-3.16.3-1.el5
gtk-sharp2-doc-2.10.0-6.el5.centos
[root@pbx 1.8.9.3]# mc

[root@pbx src]# rpm -qa |grep glib
avahi-glib-0.6.16-10.el5_6
glib-java-0.2.6-3.fc6
glibc-2.5-65.el5_7.3
NetworkManager-glib-0.7.0-13.el5
glib2-2.12.3-4.el5_3.1
glib-java-devel-0.2.6-3.fc6
dbus-glib-0.73-10.el5_5
glibc-devel-2.5-65.el5_7.3
dbus-glib-devel-0.73-10.el5_5
glib2-devel-2.12.3-4.el5_3.1
glibc-common-2.5-65.el5_7.3
glibc-headers-2.5-65.el5_7.3
[root@pbx src]# pkg-config --cflags gtk+-2.0
-pthread -I/usr/local/include/glib-2.0 -I/usr/local/lib/glib-2.0/include -I/usr/include/gtk-2.0 -I/usr/lib/gtk-2.0/include -I/usr/include/atk-1.0 -I/usr/include/cairo -I/usr/include/pango-1.0 -I/usr/include/freetype2 -I/usr/include/libpng12


By: Vladimir Mikhelson (vmikhelson) 2012-02-24 00:32:13.469-0600

Matt, please reopen.  There is no progress so far.

By: Walter Doekes (wdoekes) 2012-02-24 00:38:24.514-0600

Valdimir, you're busy breaking your system by installing glib by hand.

Step 1 for you is to take a clean (different) system and try to reproduce this.

By: Vladimir Mikhelson (vmikhelson) 2012-02-24 00:47:41.596-0600

Walter,

I truly appreciated you help so far. As I can interpret your answer you recommend undoing the glib install.  Can you advise on how to do it clean?

Unfortunately, this is the only system I have.

By: Vladimir Mikhelson (vmikhelson) 2012-02-24 00:54:44.853-0600

Meantime 1.8.9.3 seg faulted three (3) times upon initial load.  This tells me no change in behavior occurred as a result of all my efforts to install libresample.

To clarify. The system was originated as AsteriskNOW and is primarily maintained by yum updates.

I am compiling Asterisk as depositories still do not include gtalk related modules. As soon as this is resolved I hope to be back on the yum track.

This manual compilation effort was the biggest so far. The problem I am facing with seg fault is very old.  I opened at least three cases with Asterisk to no avail.

By: Walter Doekes (wdoekes) 2012-02-24 02:01:44.884-0600

> Unfortunately, this is the only system I have.

That's too bad. There really isn't much for us to go on. That backtrace says it crashes where is is impossible to crash. That means that either (a) your gdb output is bad or (b) there is something other nasty going on (corrupt hardware, broken libraries/dependencies).

If you did clean all old traces of libresample and asterisk before recompiling/reinstalling then (a) shouldn't occur.

That leaves us with (b): if you take a different system, that would rule out hardware corruption and a "clean" system would rule out broken libraries.

> Unfortunately, this is the only system I have.

You surely must have a friend with a machine with linux on it somewhere? Or, if you can take the current system down and completely re-install CentOS (or whatever it is that it runs now), you can at least rule out broken library issues.

Further tips:

- run asterisk from gdb directly: gdb `which asterisk`  [enter] run -f -U asterisk -G asterisk -vvvg -c [enter]
 maybe it shows something other than two valid locals being added together. you may want to google a bit about
 how to use gdb.

- there is a tiny tiny chance that your gcc is causing this, you could attempt to install a different version (older or newer) and see if that helps.

Good luck.

By: Vladimir Mikhelson (vmikhelson) 2012-02-24 13:11:51.327-0600

Walter,

I will probably abstain from playing with GCC. I will concentrate on reverting back my GLIB/LIBRESAMPLE self-compiled installation.

In fact, I have already force reinstalled all the packages and possible dependencies by YUM.  In the course of doing that I discovered glibc.i686 being installed whereas all other GLIBC components were showing ".i386"

Installed Packages
glibc.i686                           2.5-65.el5_7.3                    installed
glibc-common.i386                    2.5-65.el5_7.3                    installed
glibc-devel.i386                     2.5-65.el5_7.3                    installed
glibc-headers.i386                   2.5-65.el5_7.3                    installed
glibc-utils.i386                     2.5-65.el5_7.3                    installed
Available Packages
glibc.i386                           2.5-65.el5_7.3                    updates

It smells like a potential problem to me.  Do you think it makes sense to try to force the .i386 version? The .686 most likely came as part of the CentOS 5.7 update.

By: Walter Doekes (wdoekes) 2012-02-24 13:30:25.379-0600

I don't think mixing 386 and 686 packages should be a problem. In fact, if you notice that there are upgrades, I suggest you do all the upgrading you can. (And then reboot, just in case.)

By: Vladimir Mikhelson (vmikhelson) 2012-02-24 15:55:48.679-0600

I found the crook which broke make menuselect.  It was GTK+.  We are back on track with the seg fault investigation. BTW, my system is up to date with all available updates.

OK. I found this ancient case https://issues.asterisk.org/view.php?id=18151  It sounds like Asterisk is known to not work properly with LIBRESAMPLE. Also it looks like it will compile with no LIBRESAMPLE.

This is what I will play with.  Especially since my seg fault seems to stem from SLIN16 to SLIN8 cost calculation.

By: Walter Doekes (wdoekes) 2012-02-24 16:31:49.661-0600

You're right, you don't need libresample, unless you need codec_resample. You can simply disable the loading of codec_resample.so using noload=> in modules.conf. Or you could recompile without libresample.

By: Vladimir Mikhelson (vmikhelson) 2012-02-26 01:28:22.300-0600

For troubleshooting purposes I removed libresample and libresample-devel by YUM. I made sure codec_resample became XXXed in the menuselect. I ran make, make menuselect.

I then restarted Asterisk three times with no seg. fault.  I will continue watching the behavior and will report here.

It sounds like there is a memory leak somewhere around libresample.

BTW, codec translation paths did not change.  I really do not see the benefit of having the codec_resample module compiled.

By: Vladimir Mikhelson (vmikhelson) 2012-02-26 01:51:38.243-0600

Any idea how to re-open the case?

By: Vladimir Mikhelson (vmikhelson) 2012-02-26 01:53:25.508-0600

Matt, can you re-open this active case please.

Added the above from Matt's profile.  Hoped he will be notified.  They set JIRA in a way where communication to administrators is virtually impossible :(

By: Vladimir Mikhelson (vmikhelson) 2012-02-26 02:00:45.932-0600

Adding this in Transitions. Trying to find out how to re-open the case.

Apparently created yet another comment.

By: Walter Doekes (wdoekes) 2012-02-26 05:47:19.549-0600

Vladimir,
that codec_resample called libresample which "caused" the segfault was already known. You haven't provided any new clues since this bug was closed, so I don't see any reason to reopen it either.

I can't reproduce your crash, and you can't prove it isn't your install that is broken (since you're unable to reproduce it on a different machine).

Regards,
Walter

By: Vladimir Mikhelson (vmikhelson) 2012-02-26 14:46:44.218-0600

Walter,

I do not know your relationship with Digium.  So let me comment.

If you are not on the development team then thank you for helping me out with at least one hint, specifically, LDD command I never used before.

But if you are (and the suggestion that you can re-open the case but do not think it is reasonable makes me think that you have some privileges on JIRA which I do  not have) then it is a different story.

What Digium and possibly you are doing here is blaming on the user instead of looking into the issue.  This approach is inappropriate in at least two perspectives.  One, you discourage bug reporting.  Two, you are leaving bugs in the system.  Closing the case where the probability of catching a bug is at least 20% is an act of irresponsibility towards the product Digium is supposedly developing and supporting.

I could comment more but as I experienced the similar attidude multipple times on multiple occasions with bug reporting here and as I read other similar testimonials regarding Digium's attitude from other people on mailing lists I participate in I will stop here.

Walter, I am not a C programmer and I do not know the architecture of Asterisk enough to be considered a contributing developer.  What I can do is troubleshoot and point out to the potential issue. In this role I can help somebody with knowlege to look into the issue which I can easily reproduce, granted not 100% but still reliably.

Codec_resample along with libresample ARE causing the seg fault on my system.  The system is well maintained and is pretty typical.  There is no mystery hardware or library issues on it you tried to use as an excuse.

In you analysis on 02-24, 02:01am, you looked into hardware, libraries, compiler and debugger theories.  What you overlooked was a memory leak or another kind of resource issue which could have caused the "v += t" both suppsedly being equal 0 to fail.

If anybody from Digium wants to debug I am here to help. If they want to follow the M$ example and collect as many bugs as possible to the point of near collapse experience then it is their choice.

-Vladimir


By: Walter Doekes (wdoekes) 2012-02-26 15:11:52.241-0600

Vladimir,
I will not be drawn into a discussion about Digium's or my habits of blaming bugs on the user.
If you want visibility for your issue, there are a couple of paths you can pursue: #asterisk-dev and #asterisk-bugs IRC channels on freenode or the asterisk-users or asterisk-dev mailing lists.

Regards,
Walter

By: Matt Jordan (mjordan) 2012-02-27 08:36:58.852-0600

I won't get drawn into a discussion of Digium related practices, other then to say that they pay my paycheck and I do spend a good amount of my day trying to make sure that issues that are issues are reported and dealt with :-)

On to your issue specifically:

In codec_resample, Russell has the following note:

/*!
* \file
*
* \brief Resample slinear audio
*
* \note To install libresample, check it out of the following repository:
* <code>$ svn co http://svn.digium.com/svn/thirdparty/libresample/trunk</code>
*
* \ingroup codecs
*/

In other words, libresample - installed from any other source but the SVN thirdparty repo - is not recommended and not supported.  When I tested libresample using the one checked out from our addons on a CentOS 6 box, I did not have any problems loading the codec.  Since you have stated that you installed libresample using yum, you're using a version of libresample that is not supported by Asterisk.

Granted, this is only going to show up in the C source file (or doxygen generated comments), so its a bit tough to know that's the only way its supported.  Hence why taking these types of issues to the mailing lists is usually a good idea.

And now off of your issue and on to a related tangent, that may help to explain why I closed your issue.

The issue tracker is meant for reporting bugs.  Sometimes we (and by we, I mean me) don't think that an issue you're having is a bug with Asterisk, which is why we close the issue.  That doesn't mean you aren't having an issue of course - and it could be that we (in this case, meaning only myself - as I'm the one who closed the issue) - got it wrong.  That's why, as Walter suggested, you take your issue to the mailing lists - either asterisk-dev or asterisk-users.  There are hundreds of developers and thousands of users who most likely have had your problem - and can either help to clarify why your issue is a bug, or help you resolve a potential configuration problem.

When you only post your problem on the issue tracker, it is potentially a much less visible place - not everyone checks every new issue that gets created.

SO: in conclusion, if you feel that this is still a bug with Asterisk, then I encourage you strongly to take it to the mailing list or the IRC channels.  Someone else may be able to provide more information to us (meaning me) as to why its a bug, in which case we'll be happy to reopen this issue.  Or, someone else may be to able to tell you how to resolve the configuration issue you're having.  In either case, posting to the mailing list in these types of situations is the correct next step to take.

Thanks

Matt

By: Vladimir Mikhelson (vmikhelson) 2012-02-27 11:50:16.135-0600

Matt,

Thank you for getting back with me and explaining your logic.

Back to the issue.  I did install the trunk version per Russel's notes and it did not change a thing. See my notes dated 22/Feb/12 1:17 AM and 22/Feb/12 10:45 AM.

I am  not completely sure whether LIBRESAMPLE is considered to be a part of Asterisk.  What I am sure about there is a problem with either Asterisk code_resample or libresample.

My experience with Asterisk users' mailing list was not overly assuring in terms of response consistency.  Specifically, several messages I posted either yielded in no or barely relevant response.

The bottom line.  There is a bug.  I can reproduce it on my system.  If anybody is interested I can spend more of my time working on it.  If not my troubleshooting has succeded by identifying the offending module which I excluded from compillation and thus resolved the immediate issue.

Thank you,
Vladimir