[Home]

Summary:ASTERISK-09292: Reload is heavily delayed and CLI gets unresponsive if res_snmp is enabled
Reporter:hristo (hristo)Labels:
Date Opened:2007-04-20 14:44:17Date Closed:2007-07-11 19:59:16
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) snmpd-full-debug.txt
Description:I have several servers, which have res_snmp enabled and I have noticed that issuing a reload on the console takes several seconds to several minutes to complete. Once it actually starts reloading the configuration, everything runs as fast as usual, but the time it takes to *begin* reloading is the real problem.

Usually if asterisk doesn't start reloading in 5-10 seconds it will never reload and the only way to regain control is to restart its process.

Issuing another reload prints:
"*CLI> reload
The previous reload command didn't finish yet"

show uptime prints:
"System uptime: 2 minutes, 36 seconds
Last reload: 1 minute, 36 seconds"

but, note that "Last reload" hasn't even started to reload (regardless of what it says).

Some commands ("restart now", etc.) have no effect while asterisk is in this state. The only solution is to restart it.

I have tried enabling debug/verbose, but nothing really showed up in the logs/console.

The moment I disable res_snmp (noload => res_snmp.so) and restart asterisk everything gets back to normal - configuration reloads happen instantaneously.

I'm not really sure how to debug this or how to provide other valuable information...any help will be appreciated. It shouldn't be that hard to duplicate - I have tried it on 6 different servers (single/dual CPUs, 32 and 64 bit) and all behave in exactly the same way once res_snmp is enabled.

I remember noticing this problem long ago...probably even before the first 1.4 betas.

****** ADDITIONAL INFORMATION ******

2.6.19-1.2288.2.1.fc5 #1 SMP Thu Feb 22 11:32:04 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
net-snmp-libs-5.3-4.2
net-snmp-devel-5.3-4.2
net-snmp-5.3-4.2
Comments:By: Jason Parker (jparker) 2007-04-20 15:11:50

It looks like it might be hanging or waiting for a long while inside of the net-snmp snmp_shutdown() function.  Is there a way to turn up debugging of net-snmp?

By: hristo (hristo) 2007-04-21 04:52:27

You'll find attached debug from snmpd, however, all attached net-snmp debug messages start to appear only after the reload process actually *starts*, and not at the time the reload command itself is issued (as described there is a delay of several seconds to minutes between the issuing of the command and the actual reload happening).

There is no output whatsoever in asterisk with debug/verbose turned on (level 5).

I have also tried to reload only res_snmp.so (*CLI> module reload res_snmp.so) as opposed to full reload and this reproduced the problem, which at least proves that it is only related to the res_snmp.so module.

forgot to mention in the initial report that res_snmp is run as subagent (subagent = yes in res_snmp.conf)

By: hterag (hterag) 2007-05-07 22:15:31

I am seeing this also with these details
libsnmp9-dev 5.2.3-7
libsnmp9 5.2.3-7
Linux 2.6.18-4-amd64 #1 SMP x86_64 GNU/Linux

Asterisk 1.4 revision 62299

Its quite regular, only rarely does a reload not cause this behaviour for me.

running as sub agent also

By: IƱaki Baz Castillo (ibc) 2007-05-16 09:55:18

I confirm this bug.

With res_snmp loaded and enabled the second "CLI>reload" never finished and makes Asterisk unusable at all.

This is my "messages" log after the second "CLI>reload":

---------
[May 16 10:26:54] NOTICE[6357] cdr.c: CDR simple logging enabled.
[May 16 10:26:54] NOTICE[6357] indications.c: Removed default indication ?
country 'us'
---------

And there is no more. CLI leaves working and "/etc/init.d/asterisk restart" doesn't work. The only way is killing the process and restart Asterisk.

- Debian Etch
- Asterisk 1.4.4
- libsnmp9-dev 5.2.3-7

By: Martin Vit (festr) 2007-05-28 15:20:48

I also confirm this bug.

debian sarge netsnmp5, asterisk 1.4.4, subagent. first reload does nothing. Unload module too. I've to restart asterisk.

update: after unloading module, asterisk process eats all cpu until restart.

i've tested 1.4.4 on another debian sarge system. unload was succesfull, but load fails:

*CLI> unload res_snmp.so
Unloading [Sub]Agent Module
 == Terminating SubAgent
*CLI> load res_snmp.so
 == Parsing '/etc/asterisk/res_snmp.conf': Found
Loading [Sub]Agent Module
Loaded res_snmp.so => (SNMP [Sub]Agent for Asterisk)
The 'load' command is deprecated and will be removed in a future release. Please use 'module load' instead.
*CLI>   == Starting SubAgent
No support for requested transport domain "tcp"
Error: Failed to connect to the agentx master agent: Unknown host
free(): invalid pointer 0x81ba290!
free(): invalid pointer 0x81a7560!
Segmentation fault (core dumped)

#0  0xb7aa524a in netsnmp_register_old_api () from /usr/lib/libnetsnmphelpers.so.5
#1  0xb7bcd353 in register_mib_context () from /usr/lib/libnetsnmpagent.so.5
#2  0xb7bcd3d7 in register_mib_range () from /usr/lib/libnetsnmpagent.so.5
#3  0xb7bcd43b in register_mib_priority () from /usr/lib/libnetsnmpagent.so.5
#4  0xb7bcd49b in register_mib () from /usr/lib/libnetsnmpagent.so.5
ASTERISK-1  0xb7be64d7 in agent_thread (arg=0x0) at snmp/agent.c:809
ASTERISK-2  0x080f79e5 in dummy_start (data=0x4) at utils.c:545
ASTERISK-3  0xb7fb0b63 in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-4  0xb7e0318a in clone () from /lib/tls/libc.so.6

here is strace when unloading module

ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(1, "\33[0;37;40m", 10)            = 10
time(NULL)                              = 1180385172
time(NULL)                              = 1180385172
time(NULL)                              = 1180385172
time(NULL)                              = 1180385172
time(NULL)                              = 1180385172
write(1, "\33[1;30;40m \33[0;37;40mUnloading ["..., 49 Unloading [Sub]Agent Module
) = 49
tgkill(9666, 9666, SIGURG)              = 0
--- SIGURG (Urgent I/O condition) @ 0 (0) ---
rt_sigaction(SIGURG, {0x806c117, [URG], SA_RESTART}, {0x806c117, [URG], SA_RESTART}, 8) = 0
sigreturn()                             = ? (mask now [])
futex(0xb780fbf8, FUTEX_WAIT, 9670, NULL

gdb asterisk -p 9670

(gdb) bt
#0  0xb7d42a27 in select () from /lib/tls/libc.so.6
#1  0xb7b07228 in agent_check_and_process () from /usr/lib/libnetsnmpagent.so.5
#2  0xb7b25e7e in agent_thread (arg=0x0) at snmp/agent.c:162
#3  0x08106b74 in dummy_start (data=0x81a36f0) at utils.c:545
#4  0xb7ef6b63 in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-1  0xb7d4918a in clone () from /lib/tls/libc.so.6



By: Martin Vit (festr) 2007-05-28 16:59:11

i've upgraded to libsnmp 5.2.3 with this results:

unload  module works but after timeout that is caused by agent_check_and_process(1) which uses select (5 seconds timeout).

*CLI> unload res_snmp.so
Unloading [Sub]Agent Module
 == test2
 == Terminating SubAgent


Lets load res_snmp.so again

*CLI> load res_snmp.so
 == Parsing '/etc/asterisk/res_snmp.conf': Found
Loading [Sub]Agent Module
Loaded res_snmp.so => (SNMP [Sub]Agent for Asterisk)
The 'load' command is deprecated and will be removed in a future release. Please use 'module load' instead.
*CLI>   == Starting SubAgent
ignoring extra call to init_agent (2)


But agent wont connect anymore. Unloading it again cause deadlock and only restart asterisk helps.

reload simly does unload and load. So this is same behaviour described by ibc.

By: Martin Vit (festr) 2007-05-28 18:08:39

i've tested the latest netsnmp 5.4 with the same results (second reload cause deadlock). Second reload ends in res/snmp/agent.c:agent_check_and_process(1) which never stops. It seems, that agentx does not like complete reinitializing of init_snmp (as found on some forums).

update:

http://archives.devshed.com/forums/networking-100/calling-init-snmp-more-than-one-won-t-works-2208672.html

update2:

snmp_init can be initialized only once per program. So after unloading res_snmp.so there is no chance to connect subagent again.



By: Jason Parker (jparker) 2007-05-29 13:14:50

outside of disabling unload/reload of res_snmp, is there anything we can really do on the Asterisk side?  What should we do at this point?

By: Russell Bryant (russell) 2007-06-06 17:10:19

Due to the information pointed out by festr, there is no way to support reload() in res_snmp without completely unloading and loading the module.  So, I have removed reload() support completely in 1.4 and trunk in revisions 67872 and 67875.  Thanks!