[Home]

Summary:ASTERISK-13826: [patch] SQL Error makes res_odbc reconnect to odbc dsn
Reporter:Ove Aursand (aurs)Labels:
Date Opened:2009-03-25 08:09:54Date Closed:2011-06-07 14:00:31
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_odbc
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt_bt_full2.txt
( 1) bt_full.txt
( 2) bt.txt
( 3) extensions.conf
( 4) func_odbc.conf
( 5) res_odbc_segfaultfix.patch
( 6) sample.call
( 7) thread_apply_all_bt.txt
Description:In res_odbc.c, lines 98-133, there is a section that disconnects, and reconnects to the database. This *might* have caused a crash... More info in additional info. Not sure how to reproduce/debug this.

****** ADDITIONAL INFORMATION ******

Asterisk 1.4.23.2 crashed (optimized, so bt is probably not too useful). We had logging of verbose to file, and at the point of the crash, these lines are in the logs:
[Mar 24 15:29:32] WARNING[21479] res_odbc.c: SQL Execute returned an error -1: 22P02: Error while executing the query;
ERROR:  invalid input syntax for integer: "" (77)
[Mar 24 15:29:32] WARNING[21479] res_odbc.c: SQL Execute error -1! Attempting a reconnect...
This was caused by a INSERT statement that tried to insert '' to a INTEGER field in a table.
func_odbc is used a lot on each call, so there were several sql's being sent at "the same time" here. The last SQL we see in the backtrace (for what it's worth) is supposed to return a blank result (no rows found).
The question is.. could that sql statement run against a db handle that is disconnected before it is finished, and this causes the crash? The sql in the backtrace is not the one that tries to insert '' as integer.

I don't see any reason for reconnecting to the database after this SQL error. Is it possible to create a config that does:
connect to db (handle X)
prepare statement A with a sql that will cause an error
prepare statement B that is OK
run statement A (B will now "point" to a non-existing db-handle, since A has disconnected and reconnected)
run statement B

Any thoughts on how to proceed debugging this issue? Or am I way off track here? :D
Comments:By: Tilghman Lesher (tilghman) 2009-03-25 11:56:10

1.2 is no longer supported.  Please upgrade to 1.4.24 or 1.6.0.6.

By: Mark Michelson (mmichelson) 2009-03-25 15:51:45

Reporter came on IRC and said that version reported was incorrect. I'm updating the issue to reflect the proper Asterisk version.

By: Ove Aursand (aurs) 2009-03-25 16:50:20

The reported version was correct, but I have a typo in the additional info here. The version used was 1.4.23.2. There is no diff on res_odbc.c from 1.4.23.2 and 1.4.24. I could upgrade to 1.4.24, but I don't know if I'm able to reproduce the crash anyway. But I can give it a try.

By: Tilghman Lesher (tilghman) 2009-03-25 17:55:35

If you're having an issue with using shared connections in 1.4, you can turn on the pooling feature in res_odbc.conf, with pooling => yes and limit => 25 (or whatever limit you need).

By: Tilghman Lesher (tilghman) 2009-03-25 17:57:02

Also, clearly, you should fix your SQL to avoid running invalid queries.

By: Ove Aursand (aurs) 2009-03-25 17:58:33

Steps used to reproduce:
1. create a dialplan (example in extensions.conf uploaded here)with a sql that fails (insert a string to a integer field), and several SQLs that is ok (just a SELECT on a row that exists)
2. copy 10-20 sample call files (like the one attached) to /var/spool/asterisk/outbound
3. hammer the server with asterisk -rx "dial s@bugfinder"
4. Got 2 core files after a while

By: Ove Aursand (aurs) 2009-03-25 18:07:29

Of course I should fix my SQL (already done). This was deliberately done to reproduce a crash.
I can upload the other backtrace I got. That one is on a valid SQL (which runs shortly after the invalid one).

By: Tilghman Lesher (tilghman) 2009-03-25 18:28:43

I still would like you to try this with the suggestion in note ASTERISK-13826@102195



By: Ove Aursand (aurs) 2009-03-25 18:55:35

Tried for a while with pooling => yes and limit => 25 without crashes

By: Tilghman Lesher (tilghman) 2009-03-26 11:28:52

Issue resolved with configuration change.

By: Michiel van Baak (mvanbaak) 2009-05-15 04:39:59

re-opened because grEvenX requested it so they can post a patch

By: Even Andre Fiskvik (grevenx) 2009-05-15 06:16:50

Added patch for this issue. Though using pooling resolves the segfault issue, there is a _reason_ why pooling is an option. Pooling does not scale well in our environment, and I feel that this very easy fix would be beneficial for certain types of asterisk users (large-scale usage).

By: Tilghman Lesher (tilghman) 2009-05-15 10:37:30

Unfortunately, this patch fails with MySQL, a very popular choice, because MySQL actually DOES return SQL_ERROR when a reconnect is needed:
[May 8 11:11:04] WARNING[13130]: res_odbc.c:616 ast_odbc_prepare_and_execute: SQL Execute returned an error -1: HY000: [MySQL][ODBC 3.51 Driver][mysqld-5.0.51a-3ubuntu5.4]MySQL server has gone away (78)

So this patch won't work.

By: Even Andre Fiskvik (grevenx) 2009-05-18 01:44:49

That's a shame, is there any way that you know of that can go around this problem with MySQL?

Would a valid option be to include this as a setting in res_odbc.conf?

By: Tilghman Lesher (tilghman) 2009-05-18 06:52:15

Possibly.  If you want to create that patch, we can take a look at that approach, as long as the existing behavior does not change (as a default) in any released version.

By: Tilghman Lesher (tilghman) 2009-05-29 15:15:04

No response in over 5 days.  Reopen if/when you're able to provide the suggested patch.