Summary:ASTERISK-20170: res_odbc crash after freetds dsn reconnects
Reporter:Noah Engelberth (mlnoah)Labels:
Date Opened:2012-07-25 08:56:43Date Closed:2012-08-24 10:06:22
Versions:10.4.2 Frequency of
Environment:CentOS 6.2 VM on a CentOS 6.3 KVM host clusterAttachments:( 0) backtrace.txt
( 1) config_files.txt
Description:I have an Asterisk Open Source 10 system set up that is using res_odbc to connect to a MSSQL database so that our users can clock in/out on our timeclock system from their phones.  I've been having a consistent issue with Asterisk crashing (completely restarting and dropping active calls) when there is a network disruption that severs the connection between Asterisk and the MSSQL server while someone is trying to punch the timeclock.

The setup is as follows:
Asterisk 10.4 (also had same issues on 10.2) running on CentOS 6.2 (VM on a CentOS 6.3 KVM host cluster) - connected to Voice VLAN
- freetds installed from epel yum repository, 0.91-2.el6 (most current version available on epel)
- unixODBC & unixODBC-devel 2.2.14-11.el6 installed
- Asterisk also has an ODBC connection to a local MySQL server configured and in use for a separate purpose
MSSQL 2008 R2 running on Server 2008 R2 (VM on a CentOS 6.3 KVM host cluster) - connected to Data VLAN

[Edit by Rusty Newton - config file contents attached as config_files.txt]

The steps to replicate the crash are:
1) Network disruption that prevents the Asterisk server from communicating with the MSSQL server occurs.
2) While the network disruption is ongoing, a user dials into the Asterisk server's timeclock extension and inputs their employee ID, which causes Asterisk to perform a lookup on the MSSQL server.
3) Asterisk "hangs" for 3-5 minutes while it waits for the ODBC connection to the MSSQL server.
4) I get made aware of the problem and log in to Asterisk.
5) I execute "module reload res_odbc.so" and Asterisk reconnects successfully to the ODBC connection and can process new calls to the timeclock.
6) The "hung" calls continue to show in "core show channels" even after the user hangs up and tries again (for what it's worth users, typically create 3-4 hung calls each before one or more of them let me know.  I've seen anywhere from 5-20 hung calls at the times I've logged in to try to reconnect the ODBC connection).
7) Asterisk crashes during or shortly after the module reload.  Sometimes I've sent one or more "channel request hangup" commands from the Asterisk CLI for the hung calls.  Sometimes it crashes immediately on the module reload, sometimes it runs for a few minutes after the reload.  I don't think it's ever run more than 5 minutes after I reload the ODBC connections.

backtrace is attached (was generated by a version of Asterisk without DONT_OPTIMIZE -- I've recompiled and will restart my system with DONT_OPTIMIZE as soon as call volume permits, but don't know when the requisite network disruption will occur to cause another crash.
Comments:By: Noah Engelberth (mlnoah) 2012-07-25 09:49:24.118-0500

Clarification to point 3) of my reproduction steps:  The "hang" is on the specific channel that was trying to access the ODBC connection.  No other channels are effected/hung (until Asterisk crashes and drops all active calls).

By: Rusty Newton (rnewton) 2012-07-26 17:42:29.929-0500

Thank you for the great details and report. The developers will really need the non-optimized backtrace.  Also, a full log with DEBUG level 5 running up to the the crash (only need to provide the few minutes before the crash) would be very helpful.

By: Noah Engelberth (mlnoah) 2012-08-15 11:13:54.130-0500

After recompiling Asterisk with DONT OPTIMIZE, I have been unable to replicate the crash issue.  I've had at least a half a dozen "controlled circumstances" disconnects from the database, and at least one "uncontrolled circumstance" disconnect that I know of over the past 2 weeks, without any crashes at all in Asterisk.  

All I did to recompile was navigate back to the original source directory I had used to install Asterisk to begin with, and then "make menuselect", turn on DONT OPTIMIZE, and then "make", "make install" and restart Asterisk.  I did not change in any other way what version or what copy of the source code I was using.

Would it be useful information for me to try turning DONT OPTIMIZE back off and seeing if the crashes come back?

By: Rusty Newton (rnewton) 2012-08-20 19:30:00.924-0500

It would be very interesting if the issue only reoccurred with the DONT_OPTIMIZE flag disabled.  If you have the time, test that out and let us know. In the meantime I'll see if someone can take a look at the current back-trace and get anything from it.

By: Noah Engelberth (mlnoah) 2012-08-21 21:31:04.434-0500

Ugh, I guess you can close this not replicable.  Even after recompiling with DONT_OPTIMIZE off, it's still not reproducing for me now.  Computers...

By: Rusty Newton (rnewton) 2012-08-24 10:06:22.529-0500

No worries - if you can reproduce it at some point, provide the compiler version.  Closing it out.