Summary: | ASTERISK-17492: Crash in res_phoneprov on reload | ||
Reporter: | Paul Dugas (pdugas) | Labels: | |
Date Opened: | 2011-03-01 20:10:09.000-0600 | Date Closed: | 2011-08-29 12:51:34 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Resources/res_phoneprov |
Versions: | 1.8.2 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Attachments: | ( 0) backtrace.txt ( 1) phoneprov.conf | |
Description: | I've been working with a new 1.8.2.2 setup using the RPMs served up in the YUM repository. I'm using res_phoneprov and the built-in HTTP server to configure a batch of Polycom phones. All's usually working well. Occasionally, when I "reload" from the CLI to apply some dialplan tweaks or add some users.conf entries, I'm crashing Asterisk and getting core files in /tmp. I managed to get stack traces from a few of them tonight and they're all the same. See "Additional Information" below. *** NOTE: The "Asterisk Version" field doesn't include 1.8.2.2 so I choose 1.8.2.3. I hope this doesn't lead someone astray. ****** ADDITIONAL INFORMATION ****** core show version Asterisk 1.8.2.2 built by root @ localhost.localdomain on a x86_64 running Linux on 2011-01-20 21:32:01 UTC (gdb) where #0 0x0000003870430265 in raise () from /lib64/libc.so.6 #1 0x0000003870431d10 in abort () from /lib64/libc.so.6 #2 0x000000387046a84b in __libc_message () from /lib64/libc.so.6 #3 0x000000387047230f in _int_free () from /lib64/libc.so.6 #4 0x000000387047276b in free () from /lib64/libc.so.6 ASTERISK-1 0x00002aaabdfa24a6 in delete_extension (obj=<value optimized out>) at res_phoneprov.c:684 ASTERISK-2 user_destructor (obj=<value optimized out>) at res_phoneprov.c:792 ASTERISK-3 0x000000000043e389 in internal_ao2_ref () ASTERISK-4 0x00002aaabdfa4b9c in unref_user () at res_phoneprov.c:756 ASTERISK-5 delete_users () at res_phoneprov.c:811 ASTERISK-6 reload () at res_phoneprov.c:1322 ASTERISK-7 0x00000000004c29a8 in ast_module_reload () ASTERISK-8 0x000000000047c5ac in handle_reload () ASTERISK-9 0x0000000000479110 in ast_cli_command_full () ASTERISK-10 0x00002aaab93335c3 in cli_alias_passthrough (e=0x2aaad246ad18, cmd=-4, a=0x41aa3b10) at res_clialiases.c:128 ASTERISK-11 0x0000000000479110 in ast_cli_command_full () ASTERISK-12 0x0000000000479329 in ast_cli_command_multiple_full () ASTERISK-13 0x0000000000437e7d in netconsole () ASTERISK-14 0x0000000000530cbc in dummy_start () ASTERISK-15 0x0000003870c0673d in start_thread () from /lib64/libpthread.so.0 ASTERISK-16 0x00000038704d3f6d in clone () from /lib64/libc.so.6 (gdb) # rpm -qi asterisk18 Name : asterisk18 Relocations: (not relocatable) Version : 1.8.2.2 Vendor: Digium, Inc. Release : 1_centos5 Build Date: Thu 20 Jan 2011 04:38:06 PM EST Install Date: Tue 15 Feb 2011 02:10:10 PM EST Build Host: localhost.localdomain Group : Utilities/System Source RPM: asterisk18-1.8.2.2-1_centos5.src.rpm Size : 0 License: GPL ... # uname -a Linux pbx.internal.lan 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux | ||
Comments: | By: Andrew Latham (lathama) 2011-03-02 05:50:53.000-0600 We are using res_phoneprov for hundreds (200-900) of phones at a time across many installs and have not hit any issues since the 1.6.2 issues with http/manager (1 year ago). Can you share your phoneprov.conf file? In Trunk 1.8, 1.6.2 branch I added a debug setting to watch the HTTP request URLs which may help. By: Paul Dugas (pdugas) 2011-03-02 07:20:08.000-0600 My phoneprov.conf is attached. I can provide other configs including my users.conf file as well as the phoneprov templates if needed. By: Andrew Latham (lathama) 2011-03-02 08:11:22.000-0600 http://svn.asterisk.org/svn/asterisk/trunk/configs/phoneprov.conf.sample res_phoneprov, to my knowledge, does not support "profile templates" in res_phoneprov templates = the files that are dynamically served. Remove your [polycom](polycom-template) [polycom-remote](polycom-template) lines and replace [polycom-template](!) with [polycom] You can read the details here http://svn.asterisk.org/svn/asterisk/trunk/res/res_phoneprov.c By: Paul Dugas (pdugas) 2011-03-02 08:52:58.000-0600 The profile templates are working for me though I'll not argue that they're not the cause of this crash. I put all my common settings in the [polycom-template] section then inherit them in the [polycom] and [polycom-remote] sections where I set the REMOTE variable to 0 or 1. In my phone1_316-reg.xml file (which is used with a call to PP_EACH_EXTENSION() in phone1_316.cfg) I can do things like so: reg.${LINE}.server.1.address="${IF($[${REMOTE}=1]?${REMOTE_SERVER}:${SERVER})}" reg.${LINE}.server.1.port="${IF($[${REMOTE}=1]?${REMOTE_SERVER_PORT}:${SERVER_PORT})}" Entries in my users.conf file have autoprov=yes, macaddress is set, and profile set to either "polycom" or "polycom-remote". Stations that I give to users who work from home use the later. Local stations use the former. This works fine. By: Leif Madsen (lmadsen) 2011-03-03 14:39:40.000-0600 If you're getting a crash you need to provide a backtrace: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace By: Paul Dugas (pdugas) 2011-03-03 14:56:21.000-0600 Full backtrace now attached. Guess the partial one I provided wasn't good enough. This is a production system running the binaries from the YUM repository so I am not able to avoid the <value optomized out> as far as I know. By: Andrew Latham (lathama) 2011-03-03 15:06:55.000-0600 pdugas, use ./configure make menuselect.makeopts menuselect/menuselect --enable DONT_OPTIMIZE menuselect.makeopts menuselect/menuselect --enable DEBUG_THREADS menuselect.makeopts make install to get the backtraces that are more helpful. By: Paul Dugas (pdugas) 2011-03-03 15:36:54.000-0600 Sorry but this is a production machine that uses the binary packages provided by Digium in their YUM repository. The machine does not have the source code or a development environment. If that means nobody can look any deeper into this then lets just close it. There's enough info in there to know that free() is abort()'ing in res_phoneprov.c's delete_extension() routine. Seems useful to me. By: Paul Dugas (pdugas) 2011-04-12 10:18:27 I have been unable to reproduce this fault since reporting it. I suspect I had a configuration fault that was causing it. I would suggest closing this issue unless someone wants to look into the crash report in more detail. By: Andrew Latham (lathama) 2011-04-12 10:21:02 Does your configuration revision control show which setting was causing the issue? If it is a configuration issue, maybe some updated documentation would help other users. By: Paul Dugas (pdugas) 2011-04-12 11:42:31 Sorry but the crashes were happening as I was doing the initial setup of a new system prior to getting the configs into revision control. By: Andrew Latham (lathama) 2011-04-12 13:16:09 May be related http://svnview.digium.com/svn/asterisk?view=rev&rev=313432 or ASTERISK-16197 By: Paul Belanger (pabelanger) 2011-04-25 21:25:31 Thank you for your bug report. In order to move your issue forward, we require a backtrace[1] from the core file produced after the crash. Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then: make install after enabling, reproduce the crash, and then execute the backtrace[1] instructions. When complete, attach that file to this issue report. [1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace By: Paul Dugas (pdugas) 2011-04-26 09:14:08 pabelanger: As I wrote before, The fault occurred on a machine running the binary RPM packages from the Asterisk/Digium YUM repository. It was being built as a production machine and was not built from source so enabling debug dumps was not possible. Perhaps Digium should look into adjusting their RPM build system to produce the necessary .debug packages that could be installed in cases like this. The machine is in service and no longer exhibits the fault. I suspect that some inconsistency in the configs was triggering the crash but I have been unable to reproduce it and the configs were not under revision control at the time. I completely understand the desire for the level of detail you are requesting and apologize for not being able to provide it. All of that said, I was able to provide a stack trace that indicated a call to free() within the delete_extension() routine in res_phoneprov.c triggered a SEGV. I am not yet familiar enough with the memory management within Asterisk to track this down any further. Seems like a pretty good lead for someone more familiar that I am. If nobody is willing/able to look into this any further without the level of detail you are asking for, I suggest again closing this issue. By: Leif Madsen (lmadsen) 2011-08-29 12:51:34.785-0500 Closing per reporter. Thanks! |