Summary:ASTERISK-20361: XMPP segfaults
Reporter:Noah Engelberth (mlnoah)Labels:
Date Opened:2012-09-04 16:05:40Date Closed:2012-09-12 13:23:07
Versions:SVN Frequency of
Environment:CentOS 6.3. XMPP clients including Pidgin 2.10.6 and Trillian. OpenFire server, tried both versions 3.6.4 and 3.7.1Attachments:( 0) extensions.conf
( 1) xmpp.conf
( 2) xmpp-backtrace.txt
( 3) xmpp-message.txt
( 4) xmpp-segfault-backtrace.txt
Description:Getting fairly frequent segfaults on my test box running Asterisk 11 SVN.  Trying to get res_xmpp working as an "interactive XMPP menu" for my internal users.  Both XMPP clients I've tested seem to send a message with a body of "(null)" when the XMPP client is starting a new message (but before the new message is actually sent).  I have my dialplan set up to filter these out and not fully receive them, but Asterisk is segfaulting somewhere in this filtering process.  Not 100% reproducible, but happens in long spurts when it does happen.
Comments:By: Noah Engelberth (mlnoah) 2012-09-04 16:08:36.921-0500

When it crashes, it crashes before/during the GotoIf at the bottom of the xmpp-incoming,s extension.

By: Noah Engelberth (mlnoah) 2012-09-10 10:26:55.491-0500

I also tried updating my openfire server from 3.6.4 to 3.7.1, without any change in the frequency of the crashes.  In addition, I tried changing my OpenFire server to not automatically add buddies to the Asterisk user's contact list, and manually add a buddy with xmpp.conf, without changing the frequency of the crashes.

By: Jonathan Rose (jrose) 2012-09-10 16:09:11.486-0500

Hey Noah, if you still have the core dump, I would appreciate figuring out what one of the struct pointers contained from that backtrace. It looks like what is causing our crash is probably an attempt to run ast_strdupa on a NULL string for message->message, but I'm not quite sure since the value of message->message isn't visible.

Here are the steps to get what I need:

gdb asterisk <nameOfCoreDump>
frame 1
print *message

It's important to include the * in front of message since it will just give me the value of the pointer otherwise.  Including the * should display the full contents of the struct including the names and values of all its fields.

When you use this core dump, make sure you are using the version of Asterisk that you generated it with.

By: Noah Engelberth (mlnoah) 2012-09-11 08:01:06.671-0500

I don't have the original core dump any more - and I updated SVN versions late last week so I could verify the current version before I put it on a production server.  I do have core dumps from the current version that are generated under similar circumstances, so I ran a new backtrace and then created the step you requested.

Also, when I updated SVN versions, I turned on better backtraces, so the new backtraces should have that.

By: Leif Madsen (lmadsen) 2012-09-11 08:08:42.894-0500

I haven't seen any crashes yet, but I can confirm I did see some (null) messages when using through the Google XMPP servers. Client is Pidgin.

For some reason, after a period of time I did stop getting the (null) messages, but am unsure what I did to cause that. Will be continuing to test on Thursday for documentation purposes.

By: Noah Engelberth (mlnoah) 2012-09-11 08:27:00.620-0500

I think the (null) messages are sent by the chat client as part of what enables the "you're about to receive a message..." functionality that Pidgin and other clients have available as an option to turn on.  The messages don't evaluate true for ISNULL, and "${MESSAGE(body)}" does match the string "(null)" within the dialplan.

By: Jonathan Rose (jrose) 2012-09-11 09:42:44.655-0500

Thanks Noah, that confirmed it. This segfault is caused by an attempt to ast_strdupa with a NULL pointer argument. It might be a CentOS specific crash.

By: Jonathan Rose (jrose) 2012-09-12 13:35:33.426-0500

I've finished updating 11 and trunk with changes similar to those in the last patch.  Thanks for all your help with this one Noah.