[Home]

Summary:ASTERISK-09408: Chan local with /n cause core dump
Reporter:Tristan Mahe (tristan_mahe)Labels:
Date Opened:2007-05-09 05:53:23Date Closed:2011-06-07 14:00:32
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_local
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:Using chan_local with /n suffix and queues to call back dynamic members cause core dumps.

Removing /n resolve core dumps.

****** ADDITIONAL INFORMATION ******

The setup is the following:

1) Incoming callers through PRI
2) Incoming callers are routed to queue with the following part of dialplan:

exten => s,n,Dial(Local/enqueue@queue-service,,g)
exten => s,n,Verbose(Now processing with Quality of service)
(...)

[queue-service]
exten => enqueue,2,Answer
exten => enqueue,3,Queue(incoming)
(...)

3) Agents in the 'incoming' queue are dynamic, using chan_local.
( Exemple: Local/1234@agent_call ) with a context like

[agent_call]
exten => _XXXX,1,Verbose(Dialing ${EXTEN})
exten => _XXXX,n,Dial(${DialChannel}/${AGENTNUM}|${DIALTIMEOUT}|rM(agentanswer))



4) when using /n Asterisk crash randomly but very often, and always produce the same core file:

Program terminated with signal 11, Segmentation fault.
#0  ast_queue_frame (chan=0x2aaac80140c0, fin=0x885460) at channel.c:924
924             AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list);
(gdb) bt full
#0  ast_queue_frame (chan=0x2aaac80140c0, fin=0x885460) at channel.c:924
       f = (struct ast_frame *) 0xa30190
       cur = <value optimized out>
       blah = 1
       qlen = 22
       __PRETTY_FUNCTION__ = "ast_queue_frame"
#1  0x00002aaab065bef5 in local_queue_frame (p=0x948f00, isoutbound=<value optimized out>, f=0x885460, us=0x959c60)
  from /var/lib/asterisk/modules/chan_local.so
       other = (struct ast_channel *) 0x2aaac80140c0
       __PRETTY_FUNCTION__ = "local_queue_frame"
#2  0x00002aaab065cec1 in local_write (ast=0x959c60, f=0x885460) from /var/lib/asterisk/modules/chan_local.so
       p = (struct local_pvt *) 0xffffffff
       res = <value optimized out>
       isoutbound = 1
       __PRETTY_FUNCTION__ = "local_write"
#3  0x000000000043cd29 in ast_write (chan=0x959c60, fr=0x885460) at channel.c:2863
       jump = <value optimized out>
       res = -1
       f = (struct ast_frame *) 0x885460
       __PRETTY_FUNCTION__ = "ast_write"
#4  0x000000000043e92b in ast_channel_bridge (c0=0x959c60, c1=0x959c60, config=0x41150660, fo=0x4114f598, rc=0x4114f590) at channel.c:3951
       monitored_source = 0
       to = <value optimized out>
       res = <value optimized out>
       nativefailed = 0
       firstpass = <value optimized out>
       o0nativeformats = 8
       o1nativeformats = 72
       time_left_ms = <value optimized out>
       nexteventts = {tv_sec = 0, tv_usec = 0}
       caller_warning = 0 '\0'
       callee_warning = dwarf2_read_address: Corrupted DWARF expression.
(gdb) bt
#0  ast_queue_frame (chan=0x2aaac80140c0, fin=0x885460) at channel.c:924
#1  0x00002aaab065bef5 in local_queue_frame (p=0x948f00, isoutbound=<value optimized out>, f=0x885460, us=0x959c60)
  from /var/lib/asterisk/modules/chan_local.so
#2  0x00002aaab065cec1 in local_write (ast=0x959c60, f=0x885460) from /var/lib/asterisk/modules/chan_local.so
#3  0x000000000043cd29 in ast_write (chan=0x959c60, fr=0x885460) at channel.c:2863
#4  0x000000000043e92b in ast_channel_bridge (c0=0x959c60, c1=0x959c60, config=0x41150660, fo=0x4114f598, rc=0x4114f590) at channel.c:3951
ASTERISK-1  0x00002aaaab703add in ast_bridge_call (chan=0x959c60, peer=0x9eccf0, config=0x41150660) at res_features.c:1378
ASTERISK-2  0x00002aaac0d64bc0 in dial_exec_full (chan=0x959c60, data=<value optimized out>, peerflags=0x41150910, continue_exec=0x0)
  from /var/lib/asterisk/modules/app_dial.so
ASTERISK-3  0x00002aaac0d65a55 in dial_exec (chan=0xffffffff, data=0x2aaabb4c10e0) from /var/lib/asterisk/modules/app_dial.so
ASTERISK-4  0x000000000047abed in pbx_extension_helper (c=0x959c60, con=<value optimized out>, context=0x959eb0 "call_agents", exten=0x959f00 "call", priority=9,
   label=<value optimized out>, callerid=0x2aaacc00a330 "561331230", action=E_SPAWN) at pbx.c:532
ASTERISK-5  0x000000000047c52d in __ast_pbx_run (c=0x959c60) at pbx.c:2288
ASTERISK-6 0x000000000047d139 in pbx_thread (data=0xffffffff) at pbx.c:2601
ASTERISK-7 0x00000000004a3a2c in dummy_start (data=<value optimized out>) at utils.c:545
ASTERISK-8 0x00000034ea806305 in start_thread () from /lib64/libpthread.so.0
ASTERISK-9 0x00000034e9ccd50d in clone () from /lib64/libc.so.6
ASTERISK-10 0x0000000000000000 in ?? ()


I'm sorry this is not a debug bt, but the core were happening on production system, and I did not had time to produce correct core files as we need production to go on.
Comments:By: gnudialer gnudialer (gnudialer) 2007-05-11 17:07:16

I'm having this happen too.  For me, removing the /n helps reduce the crashing but does not completely eliminate it.

I'm calling DIAL from FastAGI.  I think it happens when the calling channel has been hungup.

By: Tristan Mahe (tristan_mahe) 2007-05-14 16:03:50

I had today the same crash without /n too.

I first thought this was a consequence of using res_snmp to get chan_local usage, but after removing all snmp tracks,it seems like this is not related to something else than chan_local.

May you post more details on the way you are using chan_local ?

Are you using the 'g' option in Dial ?

Are you using queues or MixMonitor ?

Do you have zap channels involved ?



By: gnudialer gnudialer (gnudialer) 2007-05-14 22:18:30

I'm dialing a local channel w/o /n. The context that I dial then runs ParkAndAnnounce.  No zap channels and MixMonitor is not happening at this point.

I get this every single time...

#0  0x0807c55b in ast_queue_frame (chan=0x82538d8, fin=0xb6609e8c)
   at channel.c:890
890             AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list);
(gdb) bt
#0  0x0807c55b in ast_queue_frame (chan=0x82538d8, fin=0xb6609e8c)
   at channel.c:890
#1  0xb6e2082c in local_queue_frame (p=0xb6623c20, isoutbound=0, f=0xb6609e8c,
   us=0xb66126c0) at chan_local.c:194
#2  0xb6e21786 in local_write (ast=0xb66126c0, f=0xb6609e8c)
   at chan_local.c:291
#3  0x0808340e in ast_write (chan=0xb66126c0, fr=0xb6609e8c) at channel.c:2766
#4  0xb7994a95 in wait_for_answer (in=0xb66143b8, outgoing=0xb6601ae0,
   to=0xb63a06e0, peerflags=0xb63a08b4, sentringing=0xb63a0534,
   status=0xb63a0538 "NOANSWER", statussize=256, busystart=0, nochanstart=0,
   congestionstart=0, priority_jump=0, result=0xb63a0530) at app_dial.c:733
ASTERISK-1  0xb7997903 in dial_exec_full (chan=0xb66143b8, data=0xb63a0bb2,
   peerflags=0xb63a08b4, continue_exec=0x0) at app_dial.c:1267
ASTERISK-2  0xb7999507 in dial_exec (chan=0xb66143b8, data=0xb63a0bb2)
   at app_dial.c:1686
ASTERISK-3  0x080c0d47 in pbx_exec (c=0xb66143b8, app=0x81a9e68, data=0xb63a0bb2)
   at pbx.c:532
ASTERISK-4  0xb6964829 in handle_exec (chan=0xb66143b8, agi=0xb63a1420, argc=3,
   argv=0xb63a095c) at res_agi.c:1116
ASTERISK-5  0xb696599d in agi_handle_command (chan=0xb66143b8, agi=0xb63a1420,
   buf=0xb63a0ba8 "EXEC") at res_agi.c:1812
ASTERISK-6 0xb6965d81 in run_agi (chan=0xb66143b8,
   request=0xb63a1440 "agi://127.0.0.1/?mode=autologin&num=900912340008",
   agi=0xb63a1420, pid=-1, status=0xb63a141c, dead=0) at res_agi.c:1894
---Type <return> to continue, or q <return> to quit---
ASTERISK-7 0xb6966466 in agi_exec_full (chan=0xb66143b8, data=0xb63a4008, enhanced=0,
   dead=0) at res_agi.c:2038
ASTERISK-8 0xb69665ef in agi_exec (chan=0xb66143b8, data=0xb63a4008) at res_agi.c:2069
ASTERISK-9 0x080c0d47 in pbx_exec (c=0xb66143b8, app=0x81fe640, data=0xb63a4008)
   at pbx.c:532
ASTERISK-10 0x080c48dc in pbx_extension_helper (c=0xb66143b8, con=0x0,
   context=0xb66145dc "agentlogin", exten=0xb661462c "900912340008",
   priority=1, label=0x0, callerid=0xb6600bb8 "agent", action=E_SPAWN)
   at pbx.c:1819
ASTERISK-11 0x080c5cbf in ast_spawn_extension (c=0xb66143b8,
   context=0xb66145dc "agentlogin", exten=0xb661462c "900912340008",
   priority=1, callerid=0xb6600bb8 "agent") at pbx.c:2274
ASTERISK-12 0x080c61aa in __ast_pbx_run (c=0xb66143b8) at pbx.c:2374
ASTERISK-13 0x080c6f86 in pbx_thread (data=0xb66143b8) at pbx.c:2587
ASTERISK-14 0x08105cf4 in dummy_start (data=0xb6602060) at utils.c:545
ASTERISK-15 0xb7f7d0bd in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-16 0xb7daf9ee in clone () from /lib/tls/libc.so.6
(gdb) thread apply all bt

Thread 43 (process 7006):
#0  0x00200293 in ?? ()
Cannot access memory at address 0x1742fe94
0x0807c55b      890             AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list);

By: gnudialer gnudialer (gnudialer) 2007-05-14 22:25:02

Mods: should we change the title to "Chan local causes core dump" ?

By: gnudialer gnudialer (gnudialer) 2007-05-15 12:14:47

Last night we tried a ghetto rigged setup where we were using iax2 for Local/ emulation and were still able to crash it at high volume.  We were doing:

Agi EXEC Dial -> iax2 to 127.0.0.1 -> ParkAndAnnounce

The backtrace still shows it stopping on AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list);

Would it be helpful for me to expand AST_LIST_INSERT_TAIL(...) inline in ast_queue_frame when we are not in production?



By: Tristan Mahe (tristan_mahe) 2007-05-15 13:56:28

It would be very helpful if you can upload a core dump produced with asterisk compiled with all debug options ( in make menuselect choice 10, you check the choices 1 2 3 4 5 7 10 13).

If you can also upload the output of running with a debug/verbose level of 10 at least ( core set debug 10 and core set debug 10), this will make things a lot easier.

I cannot do this on my productions servers and don't have a lab available at the moment to reproduce this bug.


Btw: I agree with you, this is not /n related so we should change the title.



By: gnudialer gnudialer (gnudialer) 2007-05-16 15:46:25

We found a workaround.  Switching from ParkAndAnnounce to Park caused the crashes to cease.

By: Tristan Mahe (tristan_mahe) 2007-05-21 07:58:20

strange, because I don't use park nor parkandannounce at all...

By: gnudialer gnudialer (gnudialer) 2007-05-21 12:21:03

That's why I think it is really serious.  I've been able to reproduce it without any chan_local involvement at all.  If I make a whole bunch of iax2 calls and loop them back and then call ParkAndAnnounce, I get a core that looks similar to both yours and my old one.  It ends on that AST_LIST_INSERT_TAIL() macro.

I bet if you inundated it with calls, then call Dial() from the dp like you are to a local chan, you could reproduce it.

Oddly, it would take a lot longer to crash if I gave it a really heavy load or printed debug info inside of ast_queue_frame() which suggests a synchronization issue.



By: gnudialer gnudialer (gnudialer) 2007-06-04 13:47:24

Got another one today, this time just using Park() ...

Thread 36 (process 5672):
#0  0x00200293 in ?? ()
Cannot access memory at address 0x68430a0d
0x0807c55b      900             AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list);
(gdb) bt
#0  0x0807c55b in ast_queue_frame (chan=0xa2e1818, fin=0x8169760)
   at channel.c:900
#1  0x08084c66 in ast_channel_masquerade (original=0xa2c9ee8, clone=0xa2e1818)
   at channel.c:3257
#2  0xb6e1f5df in check_bridge (p=0xa3462a8, isoutbound=1) at chan_local.c:240
#3  0xb6e1f743 in local_write (ast=0xa327b70, f=0xa308a1c) at chan_local.c:289
#4  0x0808340e in ast_write (chan=0xa327b70, fr=0xa308a1c) at channel.c:2777
ASTERISK-1  0x08086dca in ast_generic_bridge (c0=0xa327b70, c1=0xa2e1818,
   config=0xb65ce33c, fo=0xb65cdff8, rc=0xb65cdff4, bridge_end=
     {tv_sec = 0, tv_usec = 0}) at channel.c:3864
ASTERISK-2  0x08087d88 in ast_channel_bridge (c0=0xa327b70, c1=0xa2e1818,
   config=0xb65ce33c, fo=0xb65cdff8, rc=0xb65cdff4) at channel.c:4108
ASTERISK-3  0xb7a3e80a in ast_bridge_call (chan=0xa327b70, peer=0xa2e1818,
   config=0xb65ce33c) at res_features.c:1353
ASTERISK-4  0xb7999289 in dial_exec_full (chan=0xa327b70, data=0xb65d1008,
   peerflags=0xb65cee74, continue_exec=0x0) at app_dial.c:1635
ASTERISK-5  0xb7999507 in dial_exec (chan=0xa327b70, data=0xb65d1008)
   at app_dial.c:1686
ASTERISK-6 0x080c0d47 in pbx_exec (c=0xa327b70, app=0x81a9878, data=0xb65d1008)
   at pbx.c:532
ASTERISK-7 0x080c48dc in pbx_extension_helper (c=0xa327b70, con=0x0,
   context=0xa327d94 "park-dial", exten=0xa327de4 "Local/2@onwait",
   priority=1, label=0x0, callerid=0x9e2a8e8 "555", action=E_SPAWN)
   at pbx.c:1819
---Type <return> to continue, or q <return> to quit---
ASTERISK-8 0x080c5cbf in ast_spawn_extension (c=0xa327b70,
   context=0xa327d94 "park-dial", exten=0xa327de4 "Local/2@onwait",
   priority=1, callerid=0x9e2a8e8 "555") at pbx.c:2274
ASTERISK-9 0x080c61aa in __ast_pbx_run (c=0xa327b70) at pbx.c:2374
ASTERISK-10 0x080c6f86 in pbx_thread (data=0xa327b70) at pbx.c:2587
ASTERISK-11 0x08105cf4 in dummy_start (data=0xb67245d8) at utils.c:545
ASTERISK-12 0xb7f780bd in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-13 0xb7daa9ee in clone () from /lib/tls/libc.so.6

By: Joshua C. Colp (jcolp) 2007-06-04 14:13:41

Would it be possible to log on to the machine where this backtrace is located so I can poke at it? Email jcolp@digium.com if possible. Thanks!

By: Tristan Mahe (tristan_mahe) 2007-06-08 09:53:49

I did not had any crash since I enabled the DONT_OPTIMIZE flag in menuselecT.

Strange...

File: If it happens again, I will open you an access on my production box.

By: Joshua C. Colp (jcolp) 2007-08-08 08:48:27

I'm suspending this since it's been two months with no news. If this is still an issue feel free to reopen with the latest info. Peace!