[Home]

Summary:ASTERISK-15864: [patch] Asterisk crash in func_odbc
Reporter:Private Name (falves11)Labels:
Date Opened:2010-03-23 12:46:33Date Closed:2010-06-03 14:51:08
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20100330__issue17084.diff.txt
( 1) asteriks_crash_odbc2.rtf
( 2) asterisk_crash_odbc.rtf
( 3) asterisk_crash_odbc3.txt
( 4) asterisk-1.4.31-gdb.txt
( 5) crash_161.txt
( 6) crash_2.txt
( 7) crash_3_odbc.txt
( 8) crash_4_odbc.txt
( 9) crash_5_odbc.txt
(10) crash_odbc_3-29_2.txt
(11) crash_6.txt
(12) crassh161_2.txt
(13) fix_thread.diff
(14) issue17084_162_A.diff
(15) issue17084_162.diff
(16) odbc_crash3-29_1.txt
Description:I am attaching the GDB backtrace
Comments:By: Leif Madsen (lmadsen) 2010-03-23 14:30:17

The first item under "Opening An Issue In The Issue Tracker - A Checklist" states:

Naming your report
It is important that you name your report appropriately. Giving it a topic of "Asterisk crashes my server" or "My mother in law annoys me" doesn't help. A detailed and good short description will help the bug marshals and developers to browse through the bug tracker and find your issue quickly. It is common use on the Asterisk bug tracker to put the text (minus quotes, but including brackets) of "[patch]" as the first part of a bug short description when the particular bug includes a sourcecode patch. This helps the developers sort things appropriately (we'll let you guess which bugs get worked on first.)


With additional information such as console output, description of what you were doing, system configuration, load average at the time of the crash, memory information, etc..., debugging the issue can be more difficult for the developers, and thus a longer lead time to resolve the issues you have reported.

Also, without appropriate subjects, it is harder for other people to search the issue tracker for issues that have already been reported, thus leading to duplicate submissions, which causes backlogs and a longer time to resolve issues.

Everyone doing their part to file better issues can help make the process smoother which allows us to resolve issues faster for everyone. Thanks!

By: Private Name (falves11) 2010-03-23 14:36:02

I could not find a better title because I had no idea why it was crashing. Please feel free to look at the backtrace and change the name accordingly. It could be a number of things.

By: Private Name (falves11) 2010-03-23 17:03:56

It blew up several times this afternoon. I am using the very latest version of freetds. Should I upload more traces of these two have enough information to indicate the issue?

By: Tilghman Lesher (tilghman) 2010-03-23 21:23:45

I'd say that you need to install the version of unixodbc with debugging symbols enabled, because this is crashing in unixodbc code, and it's impossible to tell where.  Again, it's probably not an Asterisk issue.

By: Private Name (falves11) 2010-03-23 21:26:33

I will be happy to do that if you send me detailed instructions via email falves1@hotmail.com Even if this is not in Asterisk code, it affects our product directly. I am not an engineer, so I need careful instructions.
this is what I have running:

unixODBC-devel-2.2.11-7.1
unixODBC-2.2.11-7.1
unixODBC-devel-2.2.11-7.1
unixODBC-2.2.11-7.1



By: Leif Madsen (lmadsen) 2010-03-23 21:32:57

I'd suggest asking for support in the unixODBC forums. Supporting unixODBC is not the responsibility of the Asterisk development team.

By: Private Name (falves11) 2010-03-23 21:36:56

I dispute that ODBC is the culprit, since it never, never, happens in version 1.4. It only started to happen since I moved to version 1.6.1, last week.
But I will install ODBC with symbols. I wonder why the GDB trace does not contain any useful information to see what is making it unstable.

By: Tilghman Lesher (tilghman) 2010-03-23 21:44:42

My suggestion would be:  CFLAGS="-g3 -O0" ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --enable-gui=no

By: Private Name (falves11) 2010-03-24 03:42:51

I use the RPMs that come from the distribution, how do I go from there to what we need? I have no idea.

By: Tilghman Lesher (tilghman) 2010-03-24 09:50:53

Download the source here:
http://www.unixodbc.org/unixODBC-2.2.14-p2.tar.gz

By: Private Name (falves11) 2010-03-27 06:07:53

I have a question. Do I uninstall first my unixodbc rpms that come with the distribution? If so, what happens with the dependencies? If I compile the source code, do I have to use some parameters to make sure that the old unixodbc is overwritten?
Can somebody help on this? Days ago I tried to upgrade mysql with a source version and although it compiled and installed fine, apparently, the old version from RPMĀ“s kept popping up, so it is not so simple.

By: Tilghman Lesher (tilghman) 2010-03-27 13:28:38

If you use the parameters above, the rpm-provided libraries should be overwritten.

By: Private Name (falves11) 2010-03-29 11:48:12

It is running now with the compiled version of unixODBC.

By: Private Name (falves11) 2010-03-29 14:46:58

Now with unixODBC 2.1.14, compiled from sources, it crashed 760 times today. I am uploading a GDB trace with the hope that somebody can figure this out.

By: Tilghman Lesher (tilghman) 2010-03-30 10:54:24

These aren't the same crashes as before, but a fix has been uploaded anyway.

By: Private Name (falves11) 2010-04-25 05:25:22

I got a new ODBC crash today. I am using the latest unixodbc and the latest freetds. Please let me know if this is related to the same issue.



By: Tilghman Lesher (tilghman) 2010-04-25 12:35:26

No, it looks like a completely different crash.

By: Private Name (falves11) 2010-04-25 12:37:19

Should I create a new bug or can you modify the fix uploaded to include this problem? I think that we should keep all ODBC issues together.

By: Tilghman Lesher (tilghman) 2010-04-25 13:16:24

I don't see how that crash is possibly related to a bug in Asterisk.

By: Private Name (falves11) 2010-04-25 14:02:48

Can you indicate what the rationale behind your conclusion? I will recompile unixdODBC and freetds with debug information, if that helps

By: Private Name (falves11) 2010-04-26 06:26:33

I just uploaded another crash. Maybe somebody can look into this and point me into the right direction. Asterisk has the patch applied.

By: Private Name (falves11) 2010-04-26 14:16:45

I just uploaded another crash. This time all the libraries are compiled with trace information, unixODBC, freetds and Asterisk.
Can somebody tell me what is the offending code? I will ask for support from the company who wrote unixODBC if they are responsible, or submit the trace to the freetds list if it makes sense.

One important piece of information is that Asterisk 1.4 never crashes in function_ODBC.

By: Leif Madsen (lmadsen) 2010-04-26 14:22:29

If it is a separate crash/issue, then it should be filed separately, otherwise the issue could potentially never be resolved. Related issues which are not the same should be filed separately and then we can link them back as related.

By: Private Name (falves11) 2010-04-26 14:35:04

I have no idea if this  is a separate issue or not. Can somebody with the right knowledge evaluate the traces and the patch proposed before?
In my opinion, the problem is the same for all crashes.

By: Tilghman Lesher (tilghman) 2010-04-26 14:44:11

In my opinion, the issue is in an external library, not in Asterisk.

By: Frediano Ziglio (freddy77) 2010-05-05 02:40:38

I narrow down the problem. Cores happen mostly on unixODBC code but problem is related to asterisk. It happens only with multiprocessor machines with high traffic and odbc drivers which does not support multiple statements on a single connection (like FreeTDS).

We enabled unixODBC logging and after some days log demonstrate this situation (I can send a 100K log).
1- thread1: allocate statement1 on connection1
2- thread2: allocate statement2 on connection1
3- thread2: do some processing on statement2
4- thread1: try to do some processing on statement1, it fails cause connection1 support only a single active statement
5- thread1: disconnect connection1 and try another connection
6- thread2: core cause statement2 has been freed by thread1 disconnection

The problem is a race condition in res_odbc.c (1 and 2 should not happen together).

Here you are the fix.

<inline patch removed by lmadsen>



By: Private Name (falves11) 2010-05-05 06:29:12

I wonder if this issue affects not only version 1.6 but also 1.4. Please a a Bug Marshall investigate?

By: Frediano Ziglio (freddy77) 2010-05-05 06:47:34

No, 1.4 is not affected by this problem cause the entire list is locked.

By: Private Name (falves11) 2010-05-05 07:47:47

That's why version 1.4 never, ever crashes.

By: Private Name (falves11) 2010-05-06 18:57:18

I confirm that these 2 patches together fix the issues of instability. Please feel free to add them the code. It has been running fine since yesterday with no crashes and several hundred simultaneous calls.

By: Frediano Ziglio (freddy77) 2010-05-08 14:44:09

Well, this fix works only for pooled connection (required by FreeTDS), but core still remains for non pooled connections. In fact in this case a single connection is returned for multiple statements but if any statement fails can cause a SQLDisconnect/SQLConnect which cause all statements free, even statements in other threads.
To sum up core happen on:
- multiple active queries on same connection
- errors on a query
So only on rarely conditions.

By: Paul Belanger (pabelanger) 2010-05-08 19:54:46

Please upload patches to the issue, not as comments.

By: Walter Doekes (wdoekes) 2010-05-11 05:50:38

Are you sure this does not happen on 1.4?

On database errors (can happen when when DROPping/reCREATing a view or function for example), it is possible that one thread is calling SQLFreeHandle while another calls e.g. SQLDescribeCol:

(gdb) thread 4
[Switching to thread 4 (Thread 0x420c4950 (LWP 5615))]#0  0x00007f7db60e8ced in SQLDescribeCol () from /usr/lib/libodbc.so.1
(gdb) bt
#0  0x00007f7db60e8ced in SQLDescribeCol () from /usr/lib/libodbc.so.1
#1  0x00007f7db44d2535 in realtime_odbc (database=<value optimized out>, table=<value optimized out>, ap=<value optimized out>) at res_config_odbc.c:181
#2  0x000000000044d304 in ast_load_realtime (family=<value optimized out>) at config.c:1435
#3  0x00007f7da14f37a0 in realtime_peer (newpeername=0x420c3b80 "040055001", sin=0x0, devstate_only=1) at chan_sip.c:2660
#4  0x00007f7da14f9527 in sip_devicestate (data=<value optimized out>) at chan_sip.c:17270
ASTERISK-1  0x000000000045291a in ast_device_state (device=0x9dc3f8 "SIP/040055001") at devicestate.c:170
ASTERISK-2  0x0000000000452d7e in do_devstate_changes (data=<value optimized out>) at devicestate.c:285
ASTERISK-3  0x00000000004b6edc in dummy_start (data=<value optimized out>) at utils.c:856
ASTERISK-4  0x00007f7dbb105fc7 in start_thread () from /lib/libpthread.so.0
ASTERISK-5  0x00007f7db81485ad in clone () from /lib/libc.so.6
ASTERISK-6 0x0000000000000000 in ?? ()
(gdb) thread 11
[Switching to thread 11 (Thread 0x407a0950 (LWP 5628))]#0  0x00007f7dbb10c384 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) bt
#0  0x00007f7dbb10c384 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f7dbb107bf0 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007f7dbb1074fe in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x00007f7db60ee76f in ?? () from /usr/lib/libodbc.so.1
#4  0x00007f7db60eedec in SQLFreeHandle () from /usr/lib/libodbc.so.1
ASTERISK-1  0x00007f7db63355a2 in odbc_obj_disconnect (obj=0x9a2820) at res_odbc.c:509
ASTERISK-2  0x00007f7db63371e2 in ast_odbc_prepare_and_execute (obj=0x9a2820, prepare_cb=0x7f7daaf53400 <generic_prepare>, data=0x40797c20) at res_odbc.c:124
ASTERISK-3  0x00007f7daaf5368f in acf_odbc_read (chan=0x9d95b0, cmd=<value optimized out>, s=0x40798430 "", buf=0x407988d0 "", len=4096) at func_odbc.c:278
ASTERISK-4  0x0000000000485578 in pbx_substitute_variables_helper_full (c=0x9d95b0, headp=0x9d99e8, cp1=<value optimized out>, cp2=0x4079da3b "", count=8132) at pbx.c:1691
ASTERISK-5  0x00000000004875cf in pbx_extension_helper (c=0x9d95b0, con=<value optimized out>, context=0x9d9800 "pi_phone_account_cont", exten=0x9d9850 "333", priority=2, label=0x0, callerid=0x9dc530 "040055001", action=E_SPAWN) at pbx.c:1786
ASTERISK-6 0x000000000048856a in __ast_pbx_run (c=0x9d95b0) at pbx.c:2300
ASTERISK-7 0x0000000000489ea9 in pbx_thread (data=0x7f7db6333d60) at pbx.c:2621
ASTERISK-8 0x00000000004b6edc in dummy_start (data=<value optimized out>) at utils.c:856
ASTERISK-9 0x00007f7dbb105fc7 in start_thread () from /lib/libpthread.so.0
ASTERISK-10 0x00007f7db81485ad in clone () from /lib/libc.so.6
ASTERISK-11 0x0000000000000000 in ?? ()


See the attached asterisk-1.4.31-gdb.txt.

I'm using unixodbc 2.2.11-16 with libmyodbc 3.51.15r409-4.

Using pooling=>yes fixes the issue.

I guess the comment in res_odbc.c is right and there is a race:
/*
* While this isn't the best way to try to correct an error, this won't automatically
* fail when the statement handle invalidates.
*/
/* XXX Actually, it might, if we're using a non-pooled connection. Possible race here. XXX */

Only odbc_obj_connect and odbc_obj_disconnect lock obj->lock. So any other user of the odbc_obj (func_odbc, realtime_odbc) can use it while it's being deallocated.

By: Private Name (falves11) 2010-05-25 08:00:44

I just uploaded the patch for version 1.6.2. It has bee running for a couple of days without crashing.

By: Digium Subversion (svnbot) 2010-06-03 14:46:42

Repository: asterisk
Revision: 267669

U   trunk/funcs/func_odbc.c

------------------------------------------------------------------------
r267669 | tilghman | 2010-06-03 14:46:41 -0500 (Thu, 03 Jun 2010) | 8 lines

Handle OOM errors more gracefully.

(closes issue ASTERISK-15864)
Reported by: falves11
Patches:
      issue17084_162_A.diff uploaded by falves11 (license 374)
Tested by: falves11

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=267669

By: Digium Subversion (svnbot) 2010-06-03 14:51:06

Repository: asterisk
Revision: 267670

_U  branches/1.6.2/
U   branches/1.6.2/funcs/func_odbc.c

------------------------------------------------------------------------
r267670 | tilghman | 2010-06-03 14:51:05 -0500 (Thu, 03 Jun 2010) | 15 lines

Merged revisions 267669 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r267669 | tilghman | 2010-06-03 14:46:42 -0500 (Thu, 03 Jun 2010) | 8 lines
 
 Handle OOM errors more gracefully.
 
 (closes issue ASTERISK-15864)
  Reported by: falves11
  Patches:
        issue17084_162_A.diff uploaded by falves11 (license 374)
  Tested by: falves11
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=267670