[Home]

Summary:ASTERISK-11336: Crash in chanspy
Reporter:Atis Lezdins (atis)Labels:
Date Opened:2008-01-30 04:32:12.000-0600Date Closed:2008-02-25 17:45:51.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_chanspy
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 11877.diff
( 1) 2008-01-31-a-bt.txt
( 2) 2008-01-31-a-cli.txt
( 3) 2008-01-31-b-bt.txt
( 4) 2008-01-31-b-debug.txt
( 5) 2008-01-31-c-bt.txt
( 6) 2008-01-31-c-debug.txt
( 7) 2008-01-31-d-bt.txt
( 8) 2008-01-31-d-debug.txt.gz
( 9) 2008-01-31-d-valgrind.txt
(10) 2008-01-31-e-bt.txt
(11) 2008-01-31-e-debug.txt
(12) 2008-01-31-e-malloc.txt
(13) 2008-01-31-e-valgrind.txt
(14) 2008-01-31-f-bt.txt
(15) 2008-01-31-f-debug.txt
(16) 2008-01-31-g-bt.txt
(17) 2008-01-31-g-debug.txt
(18) 2008-01-31-i-bt.txt
(19) 2008-01-31-i-debug.txt
(20) 2008-01-31-i-valgrind.txt
(21) 2008-01-31-j-bt.txt
(22) 2008-01-31-j-debug.txt
(23) 2008-01-31-j-malloc.txt
(24) 2008-01-31-j-valgrind.txt
(25) 2008-01-31-k-bt.txt
(26) 2008-01-31-k-debug.txt
(27) 2008-01-31-k-valgrind.txt
(28) bt_rc1.txt
(29) bt_rc2.txt
(30) chanspy.diff
(31) chanspy.diff2
(32) chanspy.diff3
(33) valgrind_1.txt
(34) valgrind_2.txt
(35) valgrind_3.txt
Description:I've tested 1.4.18 rc2 with automated call-center tests, and got crashes in app_chanspy for several times (after 1 hour of running).

Attached full backtraces for two crashes (rc1 and rc2) and valgrind logs for rc2 running for 10 hours (without crash).



****** ADDITIONAL INFORMATION ******

#0  0x47ee3f9b in strlen () from /lib/libc.so.6^
#1  0xb61b5a04 in channel_spy (chan=0x88c95b0, spyee=0x878a440, volfactor=0xb5446b78, fd=0, flags=0xb5446df8) at app_chanspy.c:228^
#2  0xb61b6617 in common_exec (chan=0x88c95b0, flags=0xb5446df8, volfactor=0, fd=0, mygroup=0x0, spec=0xb5446b90 "SIP/22255", exten=0x0, context=0x0) at app_chanspy.c:474^
#3  0xb61b6b7a in chanspy_exec (chan=0x88c95b0, data=0xb5446b90) at app_chanspy.c:571^
#4  0x080bde76 in pbx_exec (c=0x88c95b0, app=0x8234498, data=0xb5448f50) at pbx.c:532^
ASTERISK-1  0x080c106f in pbx_extension_helper (c=0x88c95b0, con=0x0, context=0x88c9730 "static_ivr", exten=0x88c9780 "007", priority=31, label=0x0, callerid=0x89338e8 "21167", ^
   action=E_SPAWN) at pbx.c:1851^
ASTERISK-2  0x080c215d in ast_spawn_extension (c=0x88c95b0, context=0x88c9730 "static_ivr", exten=0x88c9780 "007", priority=31, callerid=0x89338e8 "21167") at pbx.c:2306^
ASTERISK-3  0x080c269b in __ast_pbx_run (c=0x88c95b0) at pbx.c:2408^
ASTERISK-4  0x080c33f0 in pbx_thread (data=0x88c95b0) at pbx.c:2623^
ASTERISK-5  0x080fde73 in dummy_start (data=0x851fbd0) at utils.c:852^
ASTERISK-6 0x47fed45b in start_thread () from /lib/libpthread.so.0^
ASTERISK-7 0x47f4523e in clone () from /lib/libc.so.6^

Comments:By: Atis Lezdins (atis) 2008-01-30 04:38:32.000-0600

Valgrind logs processed with:
grep -F -v "Warning: invalid file descriptor" valgrind.txt | grep -F -v "Use --log-fd=<number> to select an alternative log fd."

By: Joshua C. Colp (jcolp) 2008-01-30 11:07:45.000-0600

Is this compiled with DONT_OPTIMIZE? do you have the actual console output? what are the options to Chanspy and how is it being used?

By: Joshua C. Colp (jcolp) 2008-01-30 11:36:06.000-0600

Please give this patch a try.

By: Russell Bryant (russell) 2008-01-30 17:39:39.000-0600

I noticed that the previous approach could have introduced a deadlock.  Here is another attempt at it.  Please give it a try.

It's also in a branch if you want to just grab it from there:

svn co http://svn.digium.com/svn/asterisk/team/russell/chanspy_fixes

By: Digium Subversion (svnbot) 2008-01-30 17:39:47.000-0600

Repository: asterisk
Revision: 101366

U   team/russell/chanspy_fixes/apps/app_chanspy.c

------------------------------------------------------------------------
r101366 | russell | 2008-01-30 17:39:46 -0600 (Wed, 30 Jan 2008) | 2 lines

add chanspy fixes for issue ASTERISK-11336

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=101366

By: Atis Lezdins (atis) 2008-01-30 17:51:58.000-0600

Huh, however i noticed res_monitor is not audiohook'ed.
Is there a point in testing this, or i should wait until it's ported and test with mixmonitor instead. I'm going to sleep and could leave tests running overnight..

By: Russell Bryant (russell) 2008-01-30 18:28:30.000-0600

This second patch should be no different than the first.  It is just more efficient on its memory handling (uses stack instead of heap).

By: Atis Lezdins (atis) 2008-01-31 06:06:45.000-0600

I tested with 1.4.18 rc3 and chanspy.diff2 applied.

First i got deadlock of SIP (IAX seemed to respond to IAXmodem registrations) - registrations and incoming calls timed out, "show queues" and "show channels" exited immediately. I killed asterisk with -5 and took full backtraces. See attachments 2008-01-31-a-*

After that i enabled DETECT_DEADLOCKS in menuselect, and got two segfaults (attachments 2008-01-31-b-* and 2008-01-31-c-*) on
#0  0x080890f0 in ast_channel_datastore_find (chan=0x85a8ec0, info=0xb5b9f8d0, uid=0x0) at channel.c:1356^
1356            AST_LIST_TRAVERSE_SAFE_BEGIN(&chan->datastores, datastore, entry) {^

From logs seems that it happens when chanspy is starting spying (after few successfully spied calls). Log before segfault shows:

[Jan 31 05:28:13] ERROR[19310] /usr/dist/asterisk-1.4.18-rc1/include/asterisk/lock.h: app_chanspy.c line 242 (channel_spy): Error obtaining mutex: Invalid argument
[Jan 31 05:28:13] ERROR[19310] /usr/dist/asterisk-1.4.18-rc1/include/asterisk/lock.h: app_chanspy.c line 408 (chanspy_ds_free): Error obtaining mutex: Invalid argument

By: Atis Lezdins (atis) 2008-01-31 06:16:28.000-0600

file:

Of course i have DONT_OPTIMIZE - otherwise i wouldn't get useful backtraces, right?

I'm monitoring other devices with Chanspy(SIP/12345) - that's all. Also i use Monitor() for each call. Simultaneously there are up to 5 chanspies running, they're listening all the time, and after spied devices hangs up - they are waiting for next calls (listening beeps).

By: Atis Lezdins (atis) 2008-01-31 07:44:57.000-0600

Two more crashes when running with valgrind. In first i had a custom patch - one line "chmod" in file.c - but when i saw that in first valgrind log, i removed that and recompiled. Attachments:  2008-01-31-d-* and  2008-01-31-e-*

By: Atis Lezdins (atis) 2008-01-31 09:34:39.000-0600

Another crash (named f), however this time i replaced Monitor with MixMonitor. I got all queue recordings 160 bytes in size, but i guess it's because MixMonitor doesn't like to sit in answer macro of Dial.

By: Russell Bryant (russell) 2008-02-25 17:35:41.000-0600

I'm going to merge this patch, as I am confident that it resolves the original crash.  If you are still having troubles, please open a new report.  Thanks!

By: Digium Subversion (svnbot) 2008-02-25 17:39:03.000-0600

Repository: asterisk
Revision: 104106

U   branches/1.4/apps/app_chanspy.c

------------------------------------------------------------------------
r104106 | russell | 2008-02-25 17:39:02 -0600 (Mon, 25 Feb 2008) | 10 lines

This patch fixes some pretty significant problems with how app_chanspy handles
pointers to channels that are being spied upon.  It was very likely that a
crash would occur if the channel being spied upon hung up.  This was because
the current ast_channel handling _requires_ that the object is locked or else
it could disappear at any time (except in the owning channel thread).  So, this
patch uses some channel datastore magic on the spied upon channel to be able to
detect if and when the channel goes away.
(closes issue ASTERISK-11336)
(patch written by me, but thanks to kpfleming for the idea, and to file for review)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=104106

By: Digium Subversion (svnbot) 2008-02-25 17:44:37.000-0600

Repository: asterisk
Revision: 104107

_U  trunk/
U   trunk/apps/app_chanspy.c

------------------------------------------------------------------------
r104107 | russell | 2008-02-25 17:44:36 -0600 (Mon, 25 Feb 2008) | 18 lines

Merged revisions 104106 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r104106 | russell | 2008-02-25 17:42:42 -0600 (Mon, 25 Feb 2008) | 10 lines

This patch fixes some pretty significant problems with how app_chanspy handles
pointers to channels that are being spied upon.  It was very likely that a
crash would occur if the channel being spied upon hung up.  This was because
the current ast_channel handling _requires_ that the object is locked or else
it could disappear at any time (except in the owning channel thread).  So, this
patch uses some channel datastore magic on the spied upon channel to be able to
detect if and when the channel goes away.
(closes issue ASTERISK-11336)
(patch written by me, but thanks to kpfleming for the idea, and to file for review)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=104107

By: Digium Subversion (svnbot) 2008-02-25 17:45:51.000-0600

Repository: asterisk
Revision: 104108

_U  team/russell/chan_refcount/

------------------------------------------------------------------------
r104108 | russell | 2008-02-25 17:45:50 -0600 (Mon, 25 Feb 2008) | 25 lines

Blocked revisions 104107 via svnmerge

................
r104107 | russell | 2008-02-25 17:48:16 -0600 (Mon, 25 Feb 2008) | 18 lines

Merged revisions 104106 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r104106 | russell | 2008-02-25 17:42:42 -0600 (Mon, 25 Feb 2008) | 10 lines

This patch fixes some pretty significant problems with how app_chanspy handles
pointers to channels that are being spied upon.  It was very likely that a
crash would occur if the channel being spied upon hung up.  This was because
the current ast_channel handling _requires_ that the object is locked or else
it could disappear at any time (except in the owning channel thread).  So, this
patch uses some channel datastore magic on the spied upon channel to be able to
detect if and when the channel goes away.
(closes issue ASTERISK-11336)
(patch written by me, but thanks to kpfleming for the idea, and to file for review)

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=104108