[Home]

Summary:ASTERISK-08019: Segmentation fault on ast_channel_spy_remove
Reporter:Adolfo R. Brandes (arbrandes)Labels:
Date Opened:2006-10-27 06:01:53Date Closed:2007-02-15 19:09:25.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_chanspy
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) btfull1.txt
( 1) btfull2.txt
( 2) btfull3.txt
( 3) btfull-dont-optimize1.txt
( 4) btfull-dont-optimize2.txt
Description:Segmentation fault on ChanSpy(), running on a heavily loaded asterisk-1.2.13 server, with 100+ SIP calls to a single Queue() (strategy = rrmemory) with 60+ agents on AgentCallbackLogin().  Asterisk will segfault only occasionally, which probably indicates a race condition.

All SIP friends are disallow = 'all', allow = 'ulaw'.

****** ADDITIONAL INFORMATION ******

Backtrace:

Core was generated by `/usr/sbin/asterisk -vfg'.
Program terminated with signal 11, Segmentation fault.
#0  ast_translator_free_path (p=0x1) at translate.c:99
99                      if (pl->state && pl->step->destroy)
(gdb) bt
#0  ast_translator_free_path (p=0x1) at translate.c:99
#1  0x080663e5 in ast_channel_spy_remove (chan=0x858de20, spy=0xb50f6988) at channel.c:1028
#2  0xb70e2a94 in channel_spy (chan=0xb4a683e8, spyee=0x858de20, volfactor=0xb50f6f14, fd=0) at app_chanspy.c:336
#3  0xb70e33c1 in chanspy_exec (chan=0xb4a683e8, data=0xb50fafe8) at app_chanspy.c:511
#4  0x08091118 in pbx_extension_helper (c=0xb4a683e8, con=<value optimized out>, context=<value optimized out>, exten=0xb4a6862c "*1016", priority=1,
   label=0x0, callerid=0xb4ffd7f0 "1091", action=1) at pbx.c:553
ASTERISK-1  0x08092e7e in __ast_pbx_run (c=0xb4a683e8) at pbx.c:2230
ASTERISK-2  0x08093aac in pbx_thread (data=0xb4a683e8) at pbx.c:2517
ASTERISK-3  0xb7f750ed in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-4  0xb7e4f8fe in clone () from /lib/tls/libc.so.6
Comments:By: Adolfo R. Brandes (arbrandes) 2006-10-27 11:36:56

A few hours later, a different dump:

#0  ast_channel_spy_add (chan=0xa026c30, spy=0xb52b8988) at channel.c:1005
1005                    AST_LIST_INSERT_TAIL(&chan->spies->list, spy, list);

(gdb) bt
#0  ast_channel_spy_add (chan=0xa026c30, spy=0xb52b8988) at channel.c:1005
#1  0xb71e59a2 in channel_spy (chan=0xacc5ff10, spyee=0xa026c30,
volfactor=0xb52b8f14, fd=0) at app_chanspy.c:200
#2  0xb71e63c1 in chanspy_exec (chan=0xacc5ff10, data=0xb52bcfe8) at
app_chanspy.c:511
#3  0x08091118 in pbx_extension_helper (c=0xacc5ff10, con=<value optimized
out>, context=<value optimized out>, exten=0xacc60154 "*1000", priority=1,
   label=0x0, callerid=0xaef21e10 "1090", action=1) at pbx.c:553
#4  0x08092e7e in __ast_pbx_run (c=0xacc5ff10) at pbx.c:2230
ASTERISK-1  0x08093aac in pbx_thread (data=0xacc5ff10) at pbx.c:2517
ASTERISK-2  0xb7f760ed in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-3  0xb7e508fe in clone () from /lib/tls/libc.so.6

By: Joshua C. Colp (jcolp) 2006-11-16 12:54:02.000-0600

Can you get a bt full and potentially allow me to access to examine the core dump? Thanks

By: Adolfo R. Brandes (arbrandes) 2006-11-16 13:54:30.000-0600

Bt full from 3 different dumps attached.  Would you like a "thread apply all bt" too?

SSH access for debugging can be arranged.  How can this be done privately?

By: Adolfo R. Brandes (arbrandes) 2006-11-17 11:56:11.000-0600

I have a few "probable cause" scenarios, all of which actually happen in the call center where Asterisk is segfaulting.  What would happen if:

1) One SIP user tried to ChanSpy() multiple Agents simultaneously (multiple lines in a SIP softphone)?

2) One SIP user tried to ChanSpy() the same Agent more than once simultaneously (again, multiple lines)?

3) Multiple SIP users tried to ChanSpy() the same Agent simultaneously?

4) Multiple SIP users tried to ChanSpy() multiple Agents simultaneously (as in item 1 and 2) and some of these Agents were already being monitored by other users (as in 3)?

By: Adolfo R. Brandes (arbrandes) 2006-11-22 09:18:40.000-0600

It seems my previous hypothesis in the above comment was bogus.  By way of GROUP() and GROUP_COUNT(), I limited ChanSpy() to one spyer per spyee and one spyee per spyer, all to no avail.  I still get segfaults about twice a day.

In contrast to the previous backtraces I attached, "btfull-dont-optimize1.txt" is a backtrace from a "make dont-optimize" installation.

By: Adolfo R. Brandes (arbrandes) 2006-11-23 06:37:29.000-0600

For the sake of completeness, another bt full from a different segfault: "btfull-dont-optimize2.txt".

By: Joshua C. Colp (jcolp) 2006-12-06 10:11:07.000-0600

Would it be possible to gain access to the box where this core dump is so I could examine the data structures in it? Thanks.

By: Serge Vecher (serge-v) 2007-01-09 13:17:05.000-0600

arbrandes: please reply.

By: Adolfo R. Brandes (arbrandes) 2007-01-09 14:21:49.000-0600

I'm sorry, but my client does not permit third-party access to the machine at the moment.  I realize this makes it quite difficult (if not impossible) for any serious debugging effort.  Furthermore, I have not been able to reproduce the problem anywhere else.

However, we (as in my company) are devoting developer time to the resolution of this problem.  If and when I have news on the status or nature of the segfault, I'll post here.

Meanwhile, I'm at the disposal of any prospective bug-fixers, within the limits established by my client.  But seeing as apparently I'm the only one with this specific problem (which, I might add, is not even necessarily a problem with Asterisk itself), and at only one machine, feel free to mark this bug up as "worksforme".  As I said, I'll get back to it if new information surfaces.

By: Larry McConnell (lmcconnell) 2007-01-18 18:07:50.000-0600

I am having an issue with ChanSpy. If I am spying on a channel and then Hang up.. Asterisk will stop responding and I have to shut it down using stop now.

By: Serge Vecher (serge-v) 2007-01-25 12:44:47.000-0600

are you using asterisk 1.2.14? If not, do you see an issue present there?

By: Joshua C. Colp (jcolp) 2007-02-15 19:09:24.000-0600

Okay since the issue is only happening on one box and you are taking care of it I'll close this out for now. If this does indeed appear to be an issue we can solve though or you have a patch, feel free to reopen this.