Summary: | ASTERISK-09408: Chan local with /n cause core dump | ||
Reporter: | Tristan Mahe (tristan_mahe) | Labels: | |
Date Opened: | 2007-05-09 05:53:23 | Date Closed: | 2011-06-07 14:00:32 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_local |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | Using chan_local with /n suffix and queues to call back dynamic members cause core dumps. Removing /n resolve core dumps. ****** ADDITIONAL INFORMATION ****** The setup is the following: 1) Incoming callers through PRI 2) Incoming callers are routed to queue with the following part of dialplan: exten => s,n,Dial(Local/enqueue@queue-service,,g) exten => s,n,Verbose(Now processing with Quality of service) (...) [queue-service] exten => enqueue,2,Answer exten => enqueue,3,Queue(incoming) (...) 3) Agents in the 'incoming' queue are dynamic, using chan_local. ( Exemple: Local/1234@agent_call ) with a context like [agent_call] exten => _XXXX,1,Verbose(Dialing ${EXTEN}) exten => _XXXX,n,Dial(${DialChannel}/${AGENTNUM}|${DIALTIMEOUT}|rM(agentanswer)) 4) when using /n Asterisk crash randomly but very often, and always produce the same core file: Program terminated with signal 11, Segmentation fault. #0 ast_queue_frame (chan=0x2aaac80140c0, fin=0x885460) at channel.c:924 924 AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list); (gdb) bt full #0 ast_queue_frame (chan=0x2aaac80140c0, fin=0x885460) at channel.c:924 f = (struct ast_frame *) 0xa30190 cur = <value optimized out> blah = 1 qlen = 22 __PRETTY_FUNCTION__ = "ast_queue_frame" #1 0x00002aaab065bef5 in local_queue_frame (p=0x948f00, isoutbound=<value optimized out>, f=0x885460, us=0x959c60) from /var/lib/asterisk/modules/chan_local.so other = (struct ast_channel *) 0x2aaac80140c0 __PRETTY_FUNCTION__ = "local_queue_frame" #2 0x00002aaab065cec1 in local_write (ast=0x959c60, f=0x885460) from /var/lib/asterisk/modules/chan_local.so p = (struct local_pvt *) 0xffffffff res = <value optimized out> isoutbound = 1 __PRETTY_FUNCTION__ = "local_write" #3 0x000000000043cd29 in ast_write (chan=0x959c60, fr=0x885460) at channel.c:2863 jump = <value optimized out> res = -1 f = (struct ast_frame *) 0x885460 __PRETTY_FUNCTION__ = "ast_write" #4 0x000000000043e92b in ast_channel_bridge (c0=0x959c60, c1=0x959c60, config=0x41150660, fo=0x4114f598, rc=0x4114f590) at channel.c:3951 monitored_source = 0 to = <value optimized out> res = <value optimized out> nativefailed = 0 firstpass = <value optimized out> o0nativeformats = 8 o1nativeformats = 72 time_left_ms = <value optimized out> nexteventts = {tv_sec = 0, tv_usec = 0} caller_warning = 0 '\0' callee_warning = dwarf2_read_address: Corrupted DWARF expression. (gdb) bt #0 ast_queue_frame (chan=0x2aaac80140c0, fin=0x885460) at channel.c:924 #1 0x00002aaab065bef5 in local_queue_frame (p=0x948f00, isoutbound=<value optimized out>, f=0x885460, us=0x959c60) from /var/lib/asterisk/modules/chan_local.so #2 0x00002aaab065cec1 in local_write (ast=0x959c60, f=0x885460) from /var/lib/asterisk/modules/chan_local.so #3 0x000000000043cd29 in ast_write (chan=0x959c60, fr=0x885460) at channel.c:2863 #4 0x000000000043e92b in ast_channel_bridge (c0=0x959c60, c1=0x959c60, config=0x41150660, fo=0x4114f598, rc=0x4114f590) at channel.c:3951 ASTERISK-1 0x00002aaaab703add in ast_bridge_call (chan=0x959c60, peer=0x9eccf0, config=0x41150660) at res_features.c:1378 ASTERISK-2 0x00002aaac0d64bc0 in dial_exec_full (chan=0x959c60, data=<value optimized out>, peerflags=0x41150910, continue_exec=0x0) from /var/lib/asterisk/modules/app_dial.so ASTERISK-3 0x00002aaac0d65a55 in dial_exec (chan=0xffffffff, data=0x2aaabb4c10e0) from /var/lib/asterisk/modules/app_dial.so ASTERISK-4 0x000000000047abed in pbx_extension_helper (c=0x959c60, con=<value optimized out>, context=0x959eb0 "call_agents", exten=0x959f00 "call", priority=9, label=<value optimized out>, callerid=0x2aaacc00a330 "561331230", action=E_SPAWN) at pbx.c:532 ASTERISK-5 0x000000000047c52d in __ast_pbx_run (c=0x959c60) at pbx.c:2288 ASTERISK-6 0x000000000047d139 in pbx_thread (data=0xffffffff) at pbx.c:2601 ASTERISK-7 0x00000000004a3a2c in dummy_start (data=<value optimized out>) at utils.c:545 ASTERISK-8 0x00000034ea806305 in start_thread () from /lib64/libpthread.so.0 ASTERISK-9 0x00000034e9ccd50d in clone () from /lib64/libc.so.6 ASTERISK-10 0x0000000000000000 in ?? () I'm sorry this is not a debug bt, but the core were happening on production system, and I did not had time to produce correct core files as we need production to go on. | ||
Comments: | By: gnudialer gnudialer (gnudialer) 2007-05-11 17:07:16 I'm having this happen too. For me, removing the /n helps reduce the crashing but does not completely eliminate it. I'm calling DIAL from FastAGI. I think it happens when the calling channel has been hungup. By: Tristan Mahe (tristan_mahe) 2007-05-14 16:03:50 I had today the same crash without /n too. I first thought this was a consequence of using res_snmp to get chan_local usage, but after removing all snmp tracks,it seems like this is not related to something else than chan_local. May you post more details on the way you are using chan_local ? Are you using the 'g' option in Dial ? Are you using queues or MixMonitor ? Do you have zap channels involved ? By: gnudialer gnudialer (gnudialer) 2007-05-14 22:18:30 I'm dialing a local channel w/o /n. The context that I dial then runs ParkAndAnnounce. No zap channels and MixMonitor is not happening at this point. I get this every single time... #0 0x0807c55b in ast_queue_frame (chan=0x82538d8, fin=0xb6609e8c) at channel.c:890 890 AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list); (gdb) bt #0 0x0807c55b in ast_queue_frame (chan=0x82538d8, fin=0xb6609e8c) at channel.c:890 #1 0xb6e2082c in local_queue_frame (p=0xb6623c20, isoutbound=0, f=0xb6609e8c, us=0xb66126c0) at chan_local.c:194 #2 0xb6e21786 in local_write (ast=0xb66126c0, f=0xb6609e8c) at chan_local.c:291 #3 0x0808340e in ast_write (chan=0xb66126c0, fr=0xb6609e8c) at channel.c:2766 #4 0xb7994a95 in wait_for_answer (in=0xb66143b8, outgoing=0xb6601ae0, to=0xb63a06e0, peerflags=0xb63a08b4, sentringing=0xb63a0534, status=0xb63a0538 "NOANSWER", statussize=256, busystart=0, nochanstart=0, congestionstart=0, priority_jump=0, result=0xb63a0530) at app_dial.c:733 ASTERISK-1 0xb7997903 in dial_exec_full (chan=0xb66143b8, data=0xb63a0bb2, peerflags=0xb63a08b4, continue_exec=0x0) at app_dial.c:1267 ASTERISK-2 0xb7999507 in dial_exec (chan=0xb66143b8, data=0xb63a0bb2) at app_dial.c:1686 ASTERISK-3 0x080c0d47 in pbx_exec (c=0xb66143b8, app=0x81a9e68, data=0xb63a0bb2) at pbx.c:532 ASTERISK-4 0xb6964829 in handle_exec (chan=0xb66143b8, agi=0xb63a1420, argc=3, argv=0xb63a095c) at res_agi.c:1116 ASTERISK-5 0xb696599d in agi_handle_command (chan=0xb66143b8, agi=0xb63a1420, buf=0xb63a0ba8 "EXEC") at res_agi.c:1812 ASTERISK-6 0xb6965d81 in run_agi (chan=0xb66143b8, request=0xb63a1440 "agi://127.0.0.1/?mode=autologin&num=900912340008", agi=0xb63a1420, pid=-1, status=0xb63a141c, dead=0) at res_agi.c:1894 ---Type <return> to continue, or q <return> to quit--- ASTERISK-7 0xb6966466 in agi_exec_full (chan=0xb66143b8, data=0xb63a4008, enhanced=0, dead=0) at res_agi.c:2038 ASTERISK-8 0xb69665ef in agi_exec (chan=0xb66143b8, data=0xb63a4008) at res_agi.c:2069 ASTERISK-9 0x080c0d47 in pbx_exec (c=0xb66143b8, app=0x81fe640, data=0xb63a4008) at pbx.c:532 ASTERISK-10 0x080c48dc in pbx_extension_helper (c=0xb66143b8, con=0x0, context=0xb66145dc "agentlogin", exten=0xb661462c "900912340008", priority=1, label=0x0, callerid=0xb6600bb8 "agent", action=E_SPAWN) at pbx.c:1819 ASTERISK-11 0x080c5cbf in ast_spawn_extension (c=0xb66143b8, context=0xb66145dc "agentlogin", exten=0xb661462c "900912340008", priority=1, callerid=0xb6600bb8 "agent") at pbx.c:2274 ASTERISK-12 0x080c61aa in __ast_pbx_run (c=0xb66143b8) at pbx.c:2374 ASTERISK-13 0x080c6f86 in pbx_thread (data=0xb66143b8) at pbx.c:2587 ASTERISK-14 0x08105cf4 in dummy_start (data=0xb6602060) at utils.c:545 ASTERISK-15 0xb7f7d0bd in start_thread () from /lib/tls/libpthread.so.0 ASTERISK-16 0xb7daf9ee in clone () from /lib/tls/libc.so.6 (gdb) thread apply all bt Thread 43 (process 7006): #0 0x00200293 in ?? () Cannot access memory at address 0x1742fe94 0x0807c55b 890 AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list); By: gnudialer gnudialer (gnudialer) 2007-05-14 22:25:02 Mods: should we change the title to "Chan local causes core dump" ? By: gnudialer gnudialer (gnudialer) 2007-05-15 12:14:47 Last night we tried a ghetto rigged setup where we were using iax2 for Local/ emulation and were still able to crash it at high volume. We were doing: Agi EXEC Dial -> iax2 to 127.0.0.1 -> ParkAndAnnounce The backtrace still shows it stopping on AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list); Would it be helpful for me to expand AST_LIST_INSERT_TAIL(...) inline in ast_queue_frame when we are not in production? By: Tristan Mahe (tristan_mahe) 2007-05-15 13:56:28 It would be very helpful if you can upload a core dump produced with asterisk compiled with all debug options ( in make menuselect choice 10, you check the choices 1 2 3 4 5 7 10 13). If you can also upload the output of running with a debug/verbose level of 10 at least ( core set debug 10 and core set debug 10), this will make things a lot easier. I cannot do this on my productions servers and don't have a lab available at the moment to reproduce this bug. Btw: I agree with you, this is not /n related so we should change the title. By: gnudialer gnudialer (gnudialer) 2007-05-16 15:46:25 We found a workaround. Switching from ParkAndAnnounce to Park caused the crashes to cease. By: Tristan Mahe (tristan_mahe) 2007-05-21 07:58:20 strange, because I don't use park nor parkandannounce at all... By: gnudialer gnudialer (gnudialer) 2007-05-21 12:21:03 That's why I think it is really serious. I've been able to reproduce it without any chan_local involvement at all. If I make a whole bunch of iax2 calls and loop them back and then call ParkAndAnnounce, I get a core that looks similar to both yours and my old one. It ends on that AST_LIST_INSERT_TAIL() macro. I bet if you inundated it with calls, then call Dial() from the dp like you are to a local chan, you could reproduce it. Oddly, it would take a lot longer to crash if I gave it a really heavy load or printed debug info inside of ast_queue_frame() which suggests a synchronization issue. By: gnudialer gnudialer (gnudialer) 2007-06-04 13:47:24 Got another one today, this time just using Park() ... Thread 36 (process 5672): #0 0x00200293 in ?? () Cannot access memory at address 0x68430a0d 0x0807c55b 900 AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list); (gdb) bt #0 0x0807c55b in ast_queue_frame (chan=0xa2e1818, fin=0x8169760) at channel.c:900 #1 0x08084c66 in ast_channel_masquerade (original=0xa2c9ee8, clone=0xa2e1818) at channel.c:3257 #2 0xb6e1f5df in check_bridge (p=0xa3462a8, isoutbound=1) at chan_local.c:240 #3 0xb6e1f743 in local_write (ast=0xa327b70, f=0xa308a1c) at chan_local.c:289 #4 0x0808340e in ast_write (chan=0xa327b70, fr=0xa308a1c) at channel.c:2777 ASTERISK-1 0x08086dca in ast_generic_bridge (c0=0xa327b70, c1=0xa2e1818, config=0xb65ce33c, fo=0xb65cdff8, rc=0xb65cdff4, bridge_end= {tv_sec = 0, tv_usec = 0}) at channel.c:3864 ASTERISK-2 0x08087d88 in ast_channel_bridge (c0=0xa327b70, c1=0xa2e1818, config=0xb65ce33c, fo=0xb65cdff8, rc=0xb65cdff4) at channel.c:4108 ASTERISK-3 0xb7a3e80a in ast_bridge_call (chan=0xa327b70, peer=0xa2e1818, config=0xb65ce33c) at res_features.c:1353 ASTERISK-4 0xb7999289 in dial_exec_full (chan=0xa327b70, data=0xb65d1008, peerflags=0xb65cee74, continue_exec=0x0) at app_dial.c:1635 ASTERISK-5 0xb7999507 in dial_exec (chan=0xa327b70, data=0xb65d1008) at app_dial.c:1686 ASTERISK-6 0x080c0d47 in pbx_exec (c=0xa327b70, app=0x81a9878, data=0xb65d1008) at pbx.c:532 ASTERISK-7 0x080c48dc in pbx_extension_helper (c=0xa327b70, con=0x0, context=0xa327d94 "park-dial", exten=0xa327de4 "Local/2@onwait", priority=1, label=0x0, callerid=0x9e2a8e8 "555", action=E_SPAWN) at pbx.c:1819 ---Type <return> to continue, or q <return> to quit--- ASTERISK-8 0x080c5cbf in ast_spawn_extension (c=0xa327b70, context=0xa327d94 "park-dial", exten=0xa327de4 "Local/2@onwait", priority=1, callerid=0x9e2a8e8 "555") at pbx.c:2274 ASTERISK-9 0x080c61aa in __ast_pbx_run (c=0xa327b70) at pbx.c:2374 ASTERISK-10 0x080c6f86 in pbx_thread (data=0xa327b70) at pbx.c:2587 ASTERISK-11 0x08105cf4 in dummy_start (data=0xb67245d8) at utils.c:545 ASTERISK-12 0xb7f780bd in start_thread () from /lib/tls/libpthread.so.0 ASTERISK-13 0xb7daa9ee in clone () from /lib/tls/libc.so.6 By: Joshua C. Colp (jcolp) 2007-06-04 14:13:41 Would it be possible to log on to the machine where this backtrace is located so I can poke at it? Email jcolp@digium.com if possible. Thanks! By: Tristan Mahe (tristan_mahe) 2007-06-08 09:53:49 I did not had any crash since I enabled the DONT_OPTIMIZE flag in menuselecT. Strange... File: If it happens again, I will open you an access on my production box. By: Joshua C. Colp (jcolp) 2007-08-08 08:48:27 I'm suspending this since it's been two months with no news. If this is still an issue feel free to reopen with the latest info. Peace! |