[Home]

Summary:ASTERISK-18709: lua socket.http crashes asterisk
Reporter:Dave Cabot (dcabot)Labels:
Date Opened:2011-10-12 10:32:40Date Closed:2011-10-24 09:43:58
Priority:BlockerRegression?
Status:Closed/CompleteComponents:PBX/pbx_lua
Versions:1.8.5.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:CentOS 5.2Attachments:( 0) backtrace.txt
Description:Writing an IVR using pbx_lua.  load testing with sipp to 100 concurrent calls.  lua script does a couple http.request/POST, surrounded by autoservice_start()/autoservice_stop().  Within a few mins asterisk crashes.  Here's the BT:

Program terminated with signal 11, Segmentation fault.
#0  0x004ca402 in __kernel_vsyscall ()
(gdb) bt
#0  0x004ca402 in __kernel_vsyscall ()
#1  0x0087dd20 in raise () from /lib/libc.so.6
#2  0xb75d716f in skgesigOSCrash () from /usr/lib/oracle/11.2/client/lib/libclntsh.so.11.1
#3  0xb784618d in kpeDbgSignalHandler () from /usr/lib/oracle/11.2/client/lib/libclntsh.so.11.1
#4  0xb75d742f in skgesig_sigactionHandler () from /usr/lib/oracle/11.2/client/lib/libclntsh.so.11.1
#5  <signal handler called>
#6  0x09079124 in recv@plt () from /usr/lib/lua/5.1/socket/core.so
#7  0x0907e9b5 in socket_recv (ps=0xac5739c, data=0xac573d0 "", count=8192, got=0x92973838, tm=0xac593d0) at /usr/include/bits/socket2.h:35
#8  0x09079f74 in buffer_get (buf=0xac573b0, data=0x9297389c, count=0x929738a0) at buffer.c:261
#9  0x0907a189 in buffer_meth_receive (L=0xb4f0188, buf=0xac573b0) at buffer.c:187
#10 0x0907c373 in meth_receive (L=0xb4f0188) at tcp.c:112
#11 0x0076f4e3 in luaD_precall (L=0xb4f0188, func=0xa912b48, nresults=-1) at ldo.c:319
#12 0x00779cc0 in luaV_execute (L=0xb4f0188, nexeccalls=3) at lvm.c:587
#13 0x0076f960 in luaD_call (L=0xb4f0188, func=0xa912aa0, nResults=-1) at ldo.c:377
#14 0x0076afd1 in f_call (L=0xb4f0188, ud=0x92975b94) at lapi.c:800
#15 0x0076f023 in luaD_rawrunprotected (L=0xb4f0188, f=0x76afb0 <f_call>, ud=0x92975b94) at ldo.c:116
#16 0x0076f088 in luaD_pcall (L=0xb4f0188, func=0x76afb0 <f_call>, u=0x92975b94, old_top=144, ef=0) at ldo.c:463
#17 0x0076ae24 in lua_pcall (L=0xb4f0188, nargs=2, nresults=-1, errfunc=0) at lapi.c:821
#18 0x0907d7ad in protected_ (L=0xb4f0188) at except.c:81
#19 0x0076f4e3 in luaD_precall (L=0xb4f0188, func=0xa912a94, nresults=3) at ldo.c:319
#20 0x00779cc0 in luaV_execute (L=0xb4f0188, nexeccalls=3) at lvm.c:587
#21 0x0076f960 in luaD_call (L=0xb4f0188, func=0xa912a28, nResults=0) at ldo.c:377
#22 0x0076afd1 in f_call (L=0xb4f0188, ud=0x92975e74) at lapi.c:800
#23 0x0076f023 in luaD_rawrunprotected (L=0xb4f0188, f=0x76afb0 <f_call>, ud=0x92975e74) at ldo.c:116
#24 0x0076f088 in luaD_pcall (L=0xb4f0188, func=0x76afb0 <f_call>, u=0x92975e74, old_top=24, ef=12) at ldo.c:463
#25 0x0076ae24 in lua_pcall (L=0xb4f0188, nargs=2, nresults=0, errfunc=1) at lapi.c:821
#26 0x005657b3 in exec (chan=0x93dc35d8, context=0x93dc3944 "incoming", exten=0x93dc3994 "400", priority=1, callerid=0x93d5d1b8 "9070000995", data=0x958af98 "") at pbx_lua.c:1307
#27 0x081339d9 in pbx_extension_helper (c=0x93dc35d8, con=0x0, context=0x2 <Address 0x2 out of bounds>, exten=0x93dc3994 "400", priority=1, label=0x0, callerid=0x93d5d1b8 "9070000995",
   action=E_SPAWN, found=0x9297a264, combined_find_spawn=1) at pbx.c:4115
#28 0x081364ac in __ast_pbx_run (c=0x93dc35d8, args=0x0) at pbx.c:4723
#29 0x08138420 in pbx_thread (data=0x93dc35d8) at pbx.c:5058
#30 0x0817455a in dummy_start (data=0x91f093c8) at utils.c:1004
#31 0x00a3046b in start_thread () from /lib/libpthread.so.0
#32 0x00925dbe in clone () from /lib/libc.so.6
(gdb)
Comments:By: Matthew Nicholson (mnicholson) 2011-10-12 13:35:55.819-0500

Would it be possible to test using func_curl instead of the lua socket lib

Also, this backtrace does not supply enough information to determine the cause of the crash. Please follow the instructions in backtrace.txt and upload another one.

Also if possible, post a minimal extensions.lua (or your full extensions.lua) file so that I can try to reproduce this here.

By: Matthew Nicholson (mnicholson) 2011-10-12 13:46:48.634-0500

Hmm, it looks like you already tried func_curl and found a problem with it (ASTERISK-18708) so ignore that request. Please do follow up on the other things I mentioned though.

By: Dave Cabot (dcabot) 2011-10-12 15:43:36.079-0500

As requested.

By: Dave Cabot (dcabot) 2011-10-12 15:47:03.617-0500

Some sample code:

extension.lua:

{code}
require "bugtest"

extensions = {

 incoming = {
   ["500"] = function ()
                 bugtest.run()
             end;
 };

}
{code}

bugtest.lua:
{code}
module ("bugtest", package.seeall)

function run()

 channel.CHANNEL("language"):set("turkey")
 channel.CHANNEL("musicclass"):set("turkey")

 app.answer()
 app.wait(2)

 app.startmusiconhold("turkey");

 m_authkey = "ODod923i5kLk9s01Lkd0hnczii"
 m_csr = "IVR"
 local trn = channel.CALLERID("number"):get()
 local request = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><OTA><AuthenticationToken>"..m_authkey.."</AuthenticationToken><Action method=\"AttachRequest\"><TRN>"..trn.."</TRN><CSR>"..m_csr.."</CSR></Action></OTA>"

--  local result = channel.CURL(ota_config.ota_url,request):get() or ""

 local http = require("socket.http")
 autoservice_start()
 result, statuscode, content = socket.http.request(ota_config.ota_url, request)
 autoservice_stop()

 if nil ~= result then
   app.verbose("result "..result)
 end

 app.hangup()
end
{code}

By: Matthew Nicholson (mnicholson) 2011-10-12 16:12:38.624-0500

What version of the lua socket lib are you using? I can't work on this right away, but I plan to eventually try and reproduce this here.

By: Dave Cabot (dcabot) 2011-10-12 16:17:27.246-0500

Ok, thanks for looking into it.

From yum:
lua.i386                               5.1.4-4.iot5.1          installed        
lua-socket.i386                        2.0.2-4.iot5.1          installed      



By: Matthew Nicholson (mnicholson) 2011-10-21 12:42:12.615-0500

Ok, I think I can reproduce this. I am investigating it now.

By: Dave Cabot (dcabot) 2011-10-21 13:11:16.708-0500

Awesome.  Thanks for looking into this.  The func_curl option is really CPU intensive.  I'm hoping that this solution will work better.

By: Matthew Nicholson (mnicholson) 2011-10-21 14:06:33.909-0500

Seems like there is some sort of memory corruption going on here either in asterisk or in luasocket. I am not sure which yet and thus far running inside of valgrind has not given any useful results.

By: Matthew Nicholson (mnicholson) 2011-10-21 15:29:09.365-0500

I'll continue testing with valgrind here, but it would be helpful if you could run asterisk inside of valgrind in your environment and post the results here.

{noformat}
valgrind asterisk -c 2> valgrind.txt
{noformat}

Output would be stored in valgrind.txt. While running in valgrind, asterisk will run much much slower than it normally does.

By: Matthew Nicholson (mnicholson) 2011-10-24 09:42:43.240-0500

I think I found the problem. It is a bug in luasocket's use of select() detailed [here|http://wiki.voiceworks.pl/display/~pawel/Luasocket+core+dumps+in+socket_waitfd]. As that link states, this can be fixed by building luasocket with {{-DSOCKET_POLL}}. I tested this by adding "{{-DSOCKET_POLL}}" to the {{DEF=}} line in the "config" file in the luasocket source. It seems to fix the issue.

By: Dave Cabot (dcabot) 2011-10-24 09:49:47.573-0500

Awesome work.  Thanks much!