[Home]

Summary:ASTERISK-18391: Asterisk crashing in SQLAllocHandle when using ODBC
Reporter:Nikola Ciprich (nikola.ciprich)Labels:
Date Opened:2011-08-31 06:53:22Date Closed:2011-10-03 12:33:07
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_odbc
Versions:1.8.5.0 1.8.6.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Centos5 x86_64, Asterisk 1.8.5.0, unixODBC 2.2.14 (compiled newer version), asterisk connected through ODBC to both sqlite and postgresql databases.Attachments:( 0) valgrind.txt
( 1) valgrind.txt
Description:asterisk often (at least once a week) crashes with following backtrace:


#0  0x00007fca96433265 in raise () from /lib64/libc.so.6
#1  0x00007fca96434d10 in abort () from /lib64/libc.so.6
#2  0x00007fca9646d99b in __libc_message () from /lib64/libc.so.6
#3  0x00007fca9647545f in _int_free () from /lib64/libc.so.6
#4  0x00007fca964758bb in free () from /lib64/libc.so.6
#5  0x00007fca92b8fc4e in ?? () from /usr/lib64/libodbc.so.2
#6  0x00007fca92b5d252 in ?? () from /usr/lib64/libodbc.so.2
#7  0x00007fca92b5deee in SQLAllocHandle () from /usr/lib64/libodbc.so.2
#8  0x00007fca85cfe58e in ast_unregister_file_version () from /usr/lib64/asterisk/modules/func_odbc.so
#9  0x00007fca92dbe621 in ast_odbc_direct_execute () from /usr/lib64/asterisk/modules/res_odbc.so
#10 0x00007fca85d01425 in ast_unregister_file_version () from /usr/lib64/asterisk/modules/func_odbc.so
#11 0x00000000004d9768 in ast_func_read ()
#12 0x00000000004dc238 in pbx_substitute_variables_helper_full ()
#13 0x00000000004dc59d in pbx_substitute_variables_helper_full ()
#14 0x00000000004dc8aa in pbx_substitute_variables_helper ()
#15 0x00000000004deea7 in ?? ()
#16 0x00000000004df4c0 in ast_spawn_extension ()
#17 0x00000000004e18e3 in ?? ()
#18 0x00000000004e353b in ?? ()
#19 0x000000000051918e in ?? ()
#20 0x00007fca9561273d in start_thread () from /lib64/libpthread.so.0
#21 0x00007fca964d74bd in clone () from /lib64/libc.so.6

it's heavily loaded production machine, on which there's no chance to run valgrind :(

Comments:By: Walter Doekes (wdoekes) 2011-08-31 08:55:02.908-0500

{quote}it's heavily loaded production machine, on which there's no chance to run valgrind{quote}

But you can compile with DONT_OPTIMIZE which should give you more info. The slowdown -- if any -- should be negligible.

By: Nikola Ciprich (nikola.ciprich) 2011-08-31 09:10:32.613-0500

OK, I'll try and report...

By: Nikola Ciprich (nikola.ciprich) 2011-09-05 03:23:23.932-0500

hello, so here's unoptimized asterisk backtrace:

#0  0x00007f4e0fe5d265 in raise () from /lib64/libc.so.6
#1  0x00007f4e0fe5ed10 in abort () from /lib64/libc.so.6
#2  0x00007f4e0fe9799b in __libc_message () from /lib64/libc.so.6
#3  0x00007f4e0fe9f45f in _int_free () from /lib64/libc.so.6
#4  0x00007f4e0fe9f8bb in free () from /lib64/libc.so.6
#5  0x00007f4e0c5b9cab in ?? () from /usr/lib64/libodbc.so.2
#6  0x00007f4e0c5a139e in SQLSetConnectAttr () from /usr/lib64/libodbc.so.2
#7  0x00007f4e0c7e95c8 in _ast_odbc_request_obj2 (name=0x7f4de80259f8 "sqlite-asterisk", flags=..., file=<value optimized out>, function=<value optimized out>,
   lineno=<value optimized out>) at res_odbc.c:1336
#8  0x00007f4dff6b03fe in acf_odbc_read (chan=0x2dafd28, cmd=0x44e13690 "ODBC_RECORD", s=0x1 <Address 0x1 out of bounds>, buf=0x44e13720 "", len=4096) at func_odbc.c:517
#9  0x00000000004d9768 in ast_func_read (chan=0x2dafd28, function=<value optimized out>, workspace=0x44e13720 "", len=4096) at pbx.c:3515
#10 0x00000000004dc238 in pbx_substitute_variables_helper_full (c=0x2dafd28, headp=0x2db00e0, cp1=<value optimized out>, cp2=0x44e16827 "", count=4088, used=0x44e188a8) at pbx.c:3896
#11 0x00000000004dc59d in pbx_substitute_variables_helper_full (c=0x2dafd28, headp=0x2db00e0, cp1=<value optimized out>, cp2=0x44e1ade0 "", count=8191, used=0x44e18920) at pbx.c:3970
#12 0x00000000004dc8aa in pbx_substitute_variables_helper (c=0x3767, cp1=0x8 <Address 0x8 out of bounds>, cp2=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, count=-1)
   at pbx.c:3992
#13 0x00000000004deea7 in pbx_substitute_variables (c=0x2dafd28, con=<value optimized out>, context=0x2db0280 "StartRecording", exten=0x2db02d0 "~~s~~", priority=2, label=0x0,
   callerid=0x12ae620 "00420739471203", action=E_SPAWN, found=0x44e1cf5c, combined_find_spawn=1) at pbx.c:4017
#14 pbx_extension_helper (c=0x2dafd28, con=<value optimized out>, context=0x2db0280 "StartRecording", exten=0x2db02d0 "~~s~~", priority=2, label=0x0,
   callerid=0x12ae620 "00420739471203", action=E_SPAWN, found=0x44e1cf5c, combined_find_spawn=1) at pbx.c:4079
#15 0x00000000004df4c0 in ast_spawn_extension (c=0x3767, context=0x6 <Address 0x6 out of bounds>, exten=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, priority=-1,
   callerid=<value optimized out>, found=<value optimized out>, combined_find_spawn=1) at pbx.c:4625
#16 0x00000000004e18e3 in __ast_pbx_run (c=0x2dafd28, args=0x0) at pbx.c:4723
#17 0x00000000004e353b in pbx_thread (data=0x3767) at pbx.c:5058
#18 0x000000000051918e in dummy_start (data=<value optimized out>) at utils.c:1004
#19 0x00007f4e0f03c73d in start_thread () from /lib64/libpthread.so.0
#20 0x00007f4e0ff014bd in clone () from /lib64/libc.so.6


there seems to be more information, although I still see some values optimized out.. weirde... will this help?

By: Nikola Ciprich (nikola.ciprich) 2011-09-09 03:20:10.581-0500

Hello Walter,

do You see any obvious error from the backtrace?

n.


By: Walter Doekes (wdoekes) 2011-09-09 03:23:52.071-0500

I see that it doesn't look unoptimized. The should not be any <value optimized out> in there. Are you sure you're running the DONT_OPTIMIZE version?

By: Nikola Ciprich (nikola.ciprich) 2011-09-09 04:09:50.201-0500

Hello Walter,
I've checked it again now, and DONT_OPTIMIZE is on. Does it have to turn other debugging options on?
ie DEBUG_THREADS?  the pbx just crashed again, here's new backtrace. At least some parameters in  pbx_substitute_variables don't look optimized out, so it's strange... Maybe I should check some other compilation flags?

here's the new backtrace:

#0  0x00007f3d610c3265 in raise () from /lib64/libc.so.6
#1  0x00007f3d610c4d10 in abort () from /lib64/libc.so.6
#2  0x00007f3d610fd99b in __libc_message () from /lib64/libc.so.6
#3  0x00007f3d6110545f in _int_free () from /lib64/libc.so.6
#4  0x00007f3d611058bb in free () from /lib64/libc.so.6
#5  0x00007f3d5d81fc9c in function_entry (handle=<value optimized out>) at __info.c:4996
#6  0x00007f3d5d7ed252 in __SQLAllocHandle (handle_type=<value optimized out>, input_handle=0x7f3d584edb10, output_handle=0x4086af98, requested_version=0) at SQLAllocHandle.c:519
#7  0x00007f3d5d7edeee in SQLAllocHandle (handle_type=6763, input_handle=0x7ded, output_handle=0x6) at SQLAllocHandle.c:1276
#8  0x00007f3d5080a58e in generic_execute (obj=<value optimized out>, data=0x1a74728) at func_odbc.c:168
#9  0x00007f3d5da4e621 in ast_odbc_direct_execute (obj=0x7f3d584ed678, exec_cb=0x7f3d5080a540 <generic_execute>, data=0x1a74728) at res_odbc.c:592
#10 0x00007f3d5080d425 in acf_odbc_read (chan=0x1e86298, cmd=0x4086b6a0 "ODBC_RECORD", s=0x1 <Address 0x1 out of bounds>, buf=0x4086b720 "", len=4096) at func_odbc.c:519
#11 0x00000000004d9768 in ast_func_read (chan=0x1e86298, function=<value optimized out>, workspace=0x4086b720 "", len=4096) at pbx.c:3515
#12 0x00000000004dc238 in pbx_substitute_variables_helper_full (c=0x1e86298, headp=0x1e86650, cp1=<value optimized out>, cp2=0x4086e827 "", count=4088, used=0x408708a8) at pbx.c:3896
#13 0x00000000004dc59d in pbx_substitute_variables_helper_full (c=0x1e86298, headp=0x1e86650, cp1=<value optimized out>, cp2=0x40872de0 "", count=8191, used=0x40870920) at pbx.c:3970
#14 0x00000000004dc8aa in pbx_substitute_variables_helper (c=0x1a6b, cp1=0x8 <Address 0x8 out of bounds>, cp2=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, count=-1) at pbx.c:3992
#15 0x00000000004deea7 in pbx_substitute_variables (c=0x1e86298, con=<value optimized out>, context=0x1e867f0 "StartRecording", exten=0x1e86840 "~~s~~", priority=2, label=0x0, callerid=0x1d291c0 "232",
   action=E_SPAWN, found=0x40874f5c, combined_find_spawn=1) at pbx.c:4017
#16 pbx_extension_helper (c=0x1e86298, con=<value optimized out>, context=0x1e867f0 "StartRecording", exten=0x1e86840 "~~s~~", priority=2, label=0x0, callerid=0x1d291c0 "232", action=E_SPAWN, found=0x40874f5c,
   combined_find_spawn=1) at pbx.c:4079
#17 0x00000000004df4c0 in ast_spawn_extension (c=0x1a6b, context=0x6 <Address 0x6 out of bounds>, exten=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, priority=-1,
   callerid=<value optimized out>, found=<value optimized out>, combined_find_spawn=1) at pbx.c:4625
#18 0x00000000004e18e3 in __ast_pbx_run (c=0x1e86298, args=0x0) at pbx.c:4723
#19 0x00000000004e353b in pbx_thread (data=0x1a6b) at pbx.c:5058
#20 0x000000000051918e in dummy_start (data=<value optimized out>) at utils.c:1004
#21 0x00007f3d602a273d in start_thread () from /lib64/libpthread.so.0
#22 0x00007f3d611674bd in clone () from /lib64/libc.so.6


By: Walter Doekes (wdoekes) 2011-09-09 04:32:04.425-0500

My best guess is that recompilation failed after you switched the flag to ON.

$ make clean && make install

By: Nikola Ciprich (nikola.ciprich) 2011-09-09 04:51:31.033-0500

I'm building RPM package, so it's always fully recompiled. the menuselect.makeopts contains following: MENUSELECT_CFLAGS=DONT_OPTIMIZE LOADABLE_MODULES
so I'm sure I'm compiling it with those flagw. then I'm installing both asterisk and asterisk-debuginfo packages..

By: Leif Madsen (lmadsen) 2011-09-14 13:51:04.166-0500

Sorry, if we see <value optimized out> then the binary is definitely not running with DONT_OPTIMIZE

By: Leif Madsen (lmadsen) 2011-09-14 13:52:43.509-0500

Also what version of UnixODBC are you using? If you're building RPM packages, I'm going to guess you're using what is shipped by RedHat, which probably means you're using a version from about 2005 (that's what ships with CentOS).

You need to use the latest UnixODBC from www.unixodbc.org, which is version 2.3.0 right now. Outside of that you're going to run into problems with UnixODBC that causes Asterisk to crash.

By: Leif Madsen (lmadsen) 2011-09-14 13:53:53.779-0500

Also talked to Tilghman on IRC and he would require valgrind output in order to determine what else to do with this issue. If you can't reproduce this in the lab (and thus run valgrind) we'll need to close out this issue since there would be nothing we can do with it.

By: Leif Madsen (lmadsen) 2011-09-14 13:59:43.958-0500

Per Tilghman as well, your last note points out exactly why you're not compiling with DONT_OPTIMIZE:  the MENUSELECT_CFLAGS that are listed are what are *NOT* selected. Menuselect works with a list of things that should not disabled, not what should be enabled.

By: Nikola Ciprich (nikola.ciprich) 2011-09-15 08:40:38.050-0500

the ODBC version is stated at the top of this bug, it's not from shipped package, but newer. It's not 2.3.0 though, to I'll try with 2.3.0 and report. If valgrind output is really needed, I'll have to get this as well...

By: Nikola Ciprich (nikola.ciprich) 2011-09-21 06:37:54.493-0500

OK, so we've just had another crash, asterisk-1.8.6.0, unixODBC-2.3.0.

So the problem still persists. We'll reproduce this on our testing system today, and provide valgrind output.. I'll report further then...

By: Nikola Ciprich (nikola.ciprich) 2011-09-24 03:44:06.287-0500

OK, finally I have a way to quickly reproduce :) Also mystery of DONT_OPTIMIZE seems to be solved - rpm apparently strips some of the debugging information, and even though -debuginfo package is installed, some values are still optimized out.. So I've reproduced this from sources installed by make install. Here comes the backtrace:


#0  0x000000301fe0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000301fe08e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x000000301fe08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000039cea2264e in SQLSetConnectAttr (connection_handle=0x83f86f0, attribute=102, value=0x1, string_length=0) at SQLSetConnectAttr.c:402
#4  0x0000000006f0b536 in _ast_odbc_request_obj2 (name=0x2706ecd0 "sqlite-asterisk", flags=..., file=0x9634eb4 "func_odbc.c", function=0x96350e9 "acf_odbc_read", lineno=518) at res_odbc.c:1337
#5  0x0000000006f0b91d in _ast_odbc_request_obj (name=0x2706ecd0 "sqlite-asterisk", check=0, file=0x9634eb4 "func_odbc.c", function=0x96350e9 "acf_odbc_read", lineno=518) at res_odbc.c:1392
#6  0x000000000962fb70 in acf_odbc_read (chan=0x6774ce0, cmd=0x27c31210 "ODBC_DUMMY2", s=0x27c3121c "3", buf=0x27c312d0 "", len=4096) at func_odbc.c:518
#7  0x000000000050b7e7 in ast_func_read (chan=0x6774ce0, function=0x27c322e0 "ODBC_DUMMY2(3)", workspace=0x27c312d0 "", len=4096) at pbx.c:3530
#8  0x000000000050cbb3 in pbx_substitute_variables_helper_full (c=0x6774ce0, headp=0x67750b0, cp1=0x5f79f78 "${ODBC_DUMMY2(3)}", cp2=0x27c35690 "", count=8191, used=0x27c33498) at pbx.c:3914
#9  0x000000000050d1ab in pbx_substitute_variables_helper (c=0x6774ce0, cp1=0x5f79f78 "${ODBC_DUMMY2(3)}", cp2=0x27c35690 "", count=8191) at pbx.c:4010
#10 0x000000000050d29f in pbx_substitute_variables (passdata=0x27c35690 "", datalen=8192, c=0x6774ce0, e=0x5f70cd8) at pbx.c:4035
#11 0x000000000050d523 in pbx_extension_helper (c=0x6774ce0, con=0x0, context=0x6775250 "outgoing", exten=0x67752a0 "0136", priority=9, label=0x0, callerid=0x63e1098 "0036", action=E_SPAWN, found=0x27c37e70,
   combined_find_spawn=1) at pbx.c:4097
#12 0x000000000050f453 in ast_spawn_extension (c=0x6774ce0, context=0x6775250 "outgoing", exten=0x67752a0 "0136", priority=9, callerid=0x63e1098 "0036", found=0x27c37e70, combined_find_spawn=1) at pbx.c:4745
#13 0x000000000050fefb in __ast_pbx_run (c=0x6774ce0, args=0x0) at pbx.c:4843
#14 0x0000000000511cbe in pbx_thread (data=0x6774ce0) at pbx.c:5178
#15 0x0000000000567795 in dummy_start (data=0x572a3a8) at utils.c:1004
#16 0x000000301fe0673d in start_thread () from /lib64/libpthread.so.0
#17 0x000000301f6d44bd in clone () from /lib64/libc.so.6

I'm also attaching valgrind output.. please let me know when some other information is needed...

By: Nikola Ciprich (nikola.ciprich) 2011-09-24 03:46:05.488-0500

valgrind was started according to asterisk-xxx/Documentation/valgrind.txt


By: Walter Doekes (wdoekes) 2011-09-24 09:36:09.692-0500

Do you have this in your res_odbc.conf? If not, you could try adding it.

share_connections => no

By: Nikola Ciprich (nikola.ciprich) 2011-09-24 13:41:16.049-0500

I didn't! it looks like it might solve my problems... I'll give it a test run and report! thanks!


By: Nikola Ciprich (nikola.ciprich) 2011-09-24 16:05:37.958-0500

so it didn't help... asterisk now slowly dies, no core, no message, anything... here's another valgrind output if it might help...

By: Nikola Ciprich (nikola.ciprich) 2011-09-25 14:52:11.712-0500

gave it another run, and it crashed again.. althoug I think it seems still much more stable, now it was after 350 000 test calls, meaning about 1.4mil DB queries...  what I still don't understand, why there's no core now...

By: Walter Doekes (wdoekes) 2011-09-26 01:54:38.460-0500

That is odd indeed.

My next suggestion would be to compile hand compile unixodbc with -g (debug info) and -O0 (no optimization), recompile asterisk against the newly installed libodbc (might be necessary) and confirm that it's using that one:

$ ldd /usr/local/lib/asterisk/modules/res_odbc.so  | grep odbc
libodbc.so.1 => /usr/local/lib/libodbc.so.1 (0x00007f706fd33000)

Then you'd get a bit more info from libodbc. But first you would have to fix that it's not creating cores anymore ;)

(Mind you, I haven't looked at any valgrind output. I delegate that to someone else.)


By: Walter Doekes (wdoekes) 2011-09-26 01:57:52.622-0500

Additionally:
(1) You'd want to hand compile the libsqlite-odbc as well, because that is where the actual crash seems to occur, right?
(2) Perhaps it is an option to switch databases? Try MySQL (libmyodbc) and see if it crashes as well.

By: Walter Doekes (wdoekes) 2011-09-26 02:01:22.459-0500

Were you running it under valgrind when you didn't get any cores?

Look at this:

vg_alloc_ThreadState: no free slots available
Increase VG_N_THREADS, rebuild and try again.

valgrind: the 'impossible' happened:
  VG_N_THREADS is too low


Valgrind is killing asterisk because it ran out of threads.

By: Nikola Ciprich (nikola.ciprich) 2011-09-26 04:20:32.240-0500

I see! I'll retry the test without valgrind (which I've indeed been using!). Thanks for the tip!

By: Nikola Ciprich (nikola.ciprich) 2011-09-27 06:19:00.104-0500

I just checked the test, and it did ~2.3mil calls, meaning almost 10mil DB queries! so it's rock solid now, thanks for all the help! Maybe it could be noted somewhere in the documentation that it's not safe to use shared connections (at least) for sqlite?

By: Nikola Ciprich (nikola.ciprich) 2011-09-27 06:19:41.232-0500

anyways, You can surely close this issue now. thanks once again! n.