[Home]

Summary:ASTERISK-17492: Crash in res_phoneprov on reload
Reporter:Paul Dugas (pdugas)Labels:
Date Opened:2011-03-01 20:10:09.000-0600Date Closed:2011-08-29 12:51:34
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_phoneprov
Versions:1.8.2 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) backtrace.txt
( 1) phoneprov.conf
Description:I've been working with a new 1.8.2.2 setup using the RPMs served up in the YUM repository.  I'm using res_phoneprov and the built-in HTTP server to configure a batch of Polycom phones.  All's usually working well.

Occasionally, when I "reload" from the CLI to apply some dialplan tweaks or add some users.conf entries, I'm crashing Asterisk and getting core files in /tmp.  I managed to get stack traces from a few of them tonight and they're all the same.  See "Additional Information" below.

*** NOTE: The "Asterisk Version" field doesn't include 1.8.2.2 so I choose 1.8.2.3.  I hope this doesn't lead someone astray.

****** ADDITIONAL INFORMATION ******

core show version
Asterisk 1.8.2.2 built by root @ localhost.localdomain on a x86_64 running Linux on 2011-01-20 21:32:01 UTC

(gdb) where
#0  0x0000003870430265 in raise () from /lib64/libc.so.6
#1  0x0000003870431d10 in abort () from /lib64/libc.so.6
#2  0x000000387046a84b in __libc_message () from /lib64/libc.so.6
#3  0x000000387047230f in _int_free () from /lib64/libc.so.6
#4  0x000000387047276b in free () from /lib64/libc.so.6
ASTERISK-1  0x00002aaabdfa24a6 in delete_extension (obj=<value optimized out>)
   at res_phoneprov.c:684
ASTERISK-2  user_destructor (obj=<value optimized out>) at res_phoneprov.c:792
ASTERISK-3  0x000000000043e389 in internal_ao2_ref ()
ASTERISK-4  0x00002aaabdfa4b9c in unref_user () at res_phoneprov.c:756
ASTERISK-5  delete_users () at res_phoneprov.c:811
ASTERISK-6 reload () at res_phoneprov.c:1322
ASTERISK-7 0x00000000004c29a8 in ast_module_reload ()
ASTERISK-8 0x000000000047c5ac in handle_reload ()
ASTERISK-9 0x0000000000479110 in ast_cli_command_full ()
ASTERISK-10 0x00002aaab93335c3 in cli_alias_passthrough (e=0x2aaad246ad18, cmd=-4,
   a=0x41aa3b10) at res_clialiases.c:128
ASTERISK-11 0x0000000000479110 in ast_cli_command_full ()
ASTERISK-12 0x0000000000479329 in ast_cli_command_multiple_full ()
ASTERISK-13 0x0000000000437e7d in netconsole ()
ASTERISK-14 0x0000000000530cbc in dummy_start ()
ASTERISK-15 0x0000003870c0673d in start_thread () from /lib64/libpthread.so.0
ASTERISK-16 0x00000038704d3f6d in clone () from /lib64/libc.so.6
(gdb)


# rpm -qi asterisk18

Name        : asterisk18                   Relocations: (not relocatable)
Version     : 1.8.2.2                           Vendor: Digium, Inc.
Release     : 1_centos5                     Build Date: Thu 20 Jan 2011 04:38:06 PM EST
Install Date: Tue 15 Feb 2011 02:10:10 PM EST      Build Host: localhost.localdomain
Group       : Utilities/System              Source RPM: asterisk18-1.8.2.2-1_centos5.src.rpm
Size        : 0                                License: GPL
...

# uname -a
Linux pbx.internal.lan 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
Comments:By: Andrew Latham (lathama) 2011-03-02 05:50:53.000-0600

We are using res_phoneprov for hundreds (200-900) of phones at a time across many installs and have not hit any issues since the 1.6.2 issues with http/manager (1 year ago).  Can you share your phoneprov.conf file?  In Trunk 1.8, 1.6.2 branch I added a debug setting to watch the HTTP request URLs which may help.

By: Paul Dugas (pdugas) 2011-03-02 07:20:08.000-0600

My phoneprov.conf is attached.  I can provide other configs including my users.conf file as well as the phoneprov templates if needed.



By: Andrew Latham (lathama) 2011-03-02 08:11:22.000-0600

http://svn.asterisk.org/svn/asterisk/trunk/configs/phoneprov.conf.sample

res_phoneprov, to my knowledge, does not support "profile templates" in res_phoneprov templates = the files that are dynamically served. Remove your
[polycom](polycom-template)
[polycom-remote](polycom-template)
lines and replace
[polycom-template](!)
with
[polycom]


You can read the details here http://svn.asterisk.org/svn/asterisk/trunk/res/res_phoneprov.c

By: Paul Dugas (pdugas) 2011-03-02 08:52:58.000-0600

The profile templates are working for me though I'll not argue that they're not the cause of this crash.  I put all my common settings in the [polycom-template] section then inherit them in the [polycom] and [polycom-remote] sections where I set the REMOTE variable to 0 or 1.  In my phone1_316-reg.xml file (which is used with a call to PP_EACH_EXTENSION() in phone1_316.cfg) I can do things like so:

 reg.${LINE}.server.1.address="${IF($[${REMOTE}=1]?${REMOTE_SERVER}:${SERVER})}"
 reg.${LINE}.server.1.port="${IF($[${REMOTE}=1]?${REMOTE_SERVER_PORT}:${SERVER_PORT})}"

Entries in my users.conf file have autoprov=yes, macaddress is set, and profile set to either "polycom" or "polycom-remote".  Stations that I give to users who work from home use the later.  Local stations use the former.

This works fine.

By: Leif Madsen (lmadsen) 2011-03-03 14:39:40.000-0600

If you're getting a crash you need to provide a backtrace:  https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Paul Dugas (pdugas) 2011-03-03 14:56:21.000-0600

Full backtrace now attached.  Guess the partial one I provided wasn't good enough.  This is a production system running the binaries from the YUM repository so I am not able to avoid the <value optomized out> as far as I know.



By: Andrew Latham (lathama) 2011-03-03 15:06:55.000-0600

pdugas,  use

./configure
make menuselect.makeopts
menuselect/menuselect --enable DONT_OPTIMIZE menuselect.makeopts
menuselect/menuselect --enable DEBUG_THREADS menuselect.makeopts
make install

to get the backtraces that are more helpful.

By: Paul Dugas (pdugas) 2011-03-03 15:36:54.000-0600

Sorry but this is a production machine that uses the binary packages provided by Digium in their YUM repository.  The machine does not have the source code or a development environment.  If that means nobody can look any deeper into this then lets just close it.  

There's enough info in there to know that free() is abort()'ing in res_phoneprov.c's delete_extension() routine.  Seems useful to me.

By: Paul Dugas (pdugas) 2011-04-12 10:18:27

I have been unable to reproduce this fault since reporting it.  I suspect I had a configuration fault that was causing it.  I would suggest closing this issue unless someone wants to look into the crash report in more detail.

By: Andrew Latham (lathama) 2011-04-12 10:21:02

Does your configuration revision control show which setting was causing the issue?  If it is a configuration issue, maybe some updated documentation would help other users.

By: Paul Dugas (pdugas) 2011-04-12 11:42:31

Sorry but the crashes were happening as I was doing the initial setup of a new system prior to getting the configs into revision control.

By: Andrew Latham (lathama) 2011-04-12 13:16:09

May be related http://svnview.digium.com/svn/asterisk?view=rev&rev=313432 or ASTERISK-16197

By: Paul Belanger (pabelanger) 2011-04-25 21:25:31

Thank you for your bug report. In order to move your issue forward, we require a backtrace[1] from the core file produced after the crash.

Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

after enabling, reproduce the crash, and then execute the backtrace[1] instructions.

When complete, attach that file to this issue report.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Paul Dugas (pdugas) 2011-04-26 09:14:08

pabelanger:

As I wrote before, The fault occurred on a machine running the binary RPM packages from the Asterisk/Digium YUM repository.  It was being built as a production machine and was not built from source so enabling debug dumps was not possible.  Perhaps Digium should look into adjusting their RPM build system to produce the necessary .debug packages that could be installed in cases like this.

The machine is in service and no longer exhibits the fault.  I suspect that some inconsistency in the configs was triggering the crash but I have been unable to reproduce it and the configs were not under revision control at the time.  I completely understand the desire for the level of detail you are requesting and apologize for not being able to provide it.  

All of that said, I was able to provide a stack trace that indicated a call to free() within the delete_extension() routine in res_phoneprov.c triggered a SEGV.  I am not yet familiar enough with the memory management within Asterisk to track this down any further.  Seems like a pretty good lead for someone more familiar that I am.

If nobody is willing/able to look into this any further without the level of detail you are asking for, I suggest again closing this issue.

By: Leif Madsen (lmadsen) 2011-08-29 12:51:34.785-0500

Closing per reporter. Thanks!