ASTERISK-14216: [patch] Asterisk 1.6.1.0 crashes with a core dump at random occassions

[Home]

Summary: ASTERISK-14216: [patch] Asterisk 1.6.1.0 crashes with a core dump at random occassions

Reporter: m0bius (m0bius) Labels:

Date Opened: 2009-05-28 04:00:04 Date Closed: 2011-06-07 14:00:30

Priority: Critical Regression? No

Status: Closed/Complete Components: General

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments: ( 0) trace

Description: It appears that our asterisk (1.6.1.0 with realtime mysql configuration on a 64bit Debian Server) crashes with a core dump at random occassions. We have noticed that before this happens all the peers in the system are marked as unreachable and there is an error about 'No More UDTPL ports remaining' however I am not confident that this is causing the problem

Comments: By: m0bius (m0bius) 2009-05-28 04:00:41

I am attaching a 'thread apply all bt full' trace from gdb but due to the sensitive nature of information I have marked this ticket as private
By: Tilghman Lesher (tilghman) 2009-05-29 12:40:59

Please see doc/valgrind.txt.
By: m0bius (m0bius) 2009-06-01 04:36:07

I am afraid it is currently impossible to run asterisk under valgrind since asterisk is currently into production and we cannot replicate the issue (It happens in random occasions)

Is it possible to assist you in any other way?
By: Tilghman Lesher (tilghman) 2009-06-01 10:38:20

The problem is that your stack trace is corrupt, which makes it very difficult to track down where a problem is occurring. I can see the command which is crashing Asterisk, but I cannot see where in the command it is crashing. Since this isn't easily reproduceable, I can't easily replicate the situation which is causing this to crash, in order to set breakpoints and watch. The only thing I can think of is memory corruption, which is where valgrind would come in handy. I have a possible patch for memory debugging, which may work where valgrind would otherwise help. However, it will make your memory usage go up significantly.
By: Tilghman Lesher (tilghman) 2009-06-01 10:40:46

To use, apply patch, run 'make menuselect', and under Compiler options, enable "MALLOC_HOLD".
By: Leif Madsen (lmadsen) 2009-07-13 10:10:11

This issue has been in feedback for a bit. Just thought I'd ping it and see where we should go with it. Thanks!
By: m0bius (m0bius) 2009-07-13 10:58:37

I've been digging it for a while now, and it looks like this is connected to ticket 0015345. Today we noticed that if we set rtupdate=no and stop the updates to the realtime database the issue stops (or at least from what we have seen so far minimizes)

Does this provide any help?
By: Tilghman Lesher (tilghman) 2009-07-13 14:28:44

If the problem stops, then that might provide a clue. But if the problem only minimizes, then it's not the cause, just an extra factor, and so it's not helpful at all. If you're using res_config_mysql, does it help if you switch to res_config_odbc (or vice versa)?
By: m0bius (m0bius) 2009-07-14 05:47:51

We are using res_config_odbc. We have seen various problems arising using res_config_mysql and we wanted to avoid this.

The fact is that the problem appears on any asterisk version we deployed. We have tried by now: 1.4.25, 1.4.25.1, 1.4.26-rc5, 1.6.0.9, 1.6.1.0, and now finaly 1.6.1.1

It showed the same symptoms even with 1.6.1.1. The only thing that all those versions had in common was the Realtime configuration with this mySQL.

Until now, the service appears to be running without any problems (2 days now). Due to the weird problem all I can do is wait for a few days to see if the problem appears again. However I am getting more and more convinced that this was the problem. Yesterday we forced load on the mySQL and asterisk with rtupdate=yes and it seemed to replicate the effect.

I am guessing that while trying to update lastms column on Qualify in the database the table is getting locked. All other requests to the database (since asterisk uses a single connection to the database) are put on hold. If the database responds slowly it might be locking chan_sip.

By: m0bius (m0bius) 2009-08-27 05:26:08

The problem seems to have been resolved. I think that the implementation of the rtupdate option should be reconsidered
By: Tilghman Lesher (tilghman) 2009-08-27 08:11:56

If you believe locking per connection is the issue, you could try using "Threading=2" in your /etc/odbcinst.ini and "share_connections => no" in your /etc/asterisk/res_odbc.conf. These two options should prevent the ODBC layer from serializing your updates.
By: Tilghman Lesher (tilghman) 2009-09-16 14:47:52

Reporter resolved problem with configuration change.