Summary: | ASTERISK-11336: Crash in chanspy | ||
Reporter: | Atis Lezdins (atis) | Labels: | |
Date Opened: | 2008-01-30 04:32:12.000-0600 | Date Closed: | 2008-02-25 17:45:51.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Applications/app_chanspy |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 11877.diff ( 1) 2008-01-31-a-bt.txt ( 2) 2008-01-31-a-cli.txt ( 3) 2008-01-31-b-bt.txt ( 4) 2008-01-31-b-debug.txt ( 5) 2008-01-31-c-bt.txt ( 6) 2008-01-31-c-debug.txt ( 7) 2008-01-31-d-bt.txt ( 8) 2008-01-31-d-debug.txt.gz ( 9) 2008-01-31-d-valgrind.txt (10) 2008-01-31-e-bt.txt (11) 2008-01-31-e-debug.txt (12) 2008-01-31-e-malloc.txt (13) 2008-01-31-e-valgrind.txt (14) 2008-01-31-f-bt.txt (15) 2008-01-31-f-debug.txt (16) 2008-01-31-g-bt.txt (17) 2008-01-31-g-debug.txt (18) 2008-01-31-i-bt.txt (19) 2008-01-31-i-debug.txt (20) 2008-01-31-i-valgrind.txt (21) 2008-01-31-j-bt.txt (22) 2008-01-31-j-debug.txt (23) 2008-01-31-j-malloc.txt (24) 2008-01-31-j-valgrind.txt (25) 2008-01-31-k-bt.txt (26) 2008-01-31-k-debug.txt (27) 2008-01-31-k-valgrind.txt (28) bt_rc1.txt (29) bt_rc2.txt (30) chanspy.diff (31) chanspy.diff2 (32) chanspy.diff3 (33) valgrind_1.txt (34) valgrind_2.txt (35) valgrind_3.txt | |
Description: | I've tested 1.4.18 rc2 with automated call-center tests, and got crashes in app_chanspy for several times (after 1 hour of running). Attached full backtraces for two crashes (rc1 and rc2) and valgrind logs for rc2 running for 10 hours (without crash). ****** ADDITIONAL INFORMATION ****** #0 0x47ee3f9b in strlen () from /lib/libc.so.6^ #1 0xb61b5a04 in channel_spy (chan=0x88c95b0, spyee=0x878a440, volfactor=0xb5446b78, fd=0, flags=0xb5446df8) at app_chanspy.c:228^ #2 0xb61b6617 in common_exec (chan=0x88c95b0, flags=0xb5446df8, volfactor=0, fd=0, mygroup=0x0, spec=0xb5446b90 "SIP/22255", exten=0x0, context=0x0) at app_chanspy.c:474^ #3 0xb61b6b7a in chanspy_exec (chan=0x88c95b0, data=0xb5446b90) at app_chanspy.c:571^ #4 0x080bde76 in pbx_exec (c=0x88c95b0, app=0x8234498, data=0xb5448f50) at pbx.c:532^ ASTERISK-1 0x080c106f in pbx_extension_helper (c=0x88c95b0, con=0x0, context=0x88c9730 "static_ivr", exten=0x88c9780 "007", priority=31, label=0x0, callerid=0x89338e8 "21167", ^ action=E_SPAWN) at pbx.c:1851^ ASTERISK-2 0x080c215d in ast_spawn_extension (c=0x88c95b0, context=0x88c9730 "static_ivr", exten=0x88c9780 "007", priority=31, callerid=0x89338e8 "21167") at pbx.c:2306^ ASTERISK-3 0x080c269b in __ast_pbx_run (c=0x88c95b0) at pbx.c:2408^ ASTERISK-4 0x080c33f0 in pbx_thread (data=0x88c95b0) at pbx.c:2623^ ASTERISK-5 0x080fde73 in dummy_start (data=0x851fbd0) at utils.c:852^ ASTERISK-6 0x47fed45b in start_thread () from /lib/libpthread.so.0^ ASTERISK-7 0x47f4523e in clone () from /lib/libc.so.6^ | ||
Comments: | By: Atis Lezdins (atis) 2008-01-30 04:38:32.000-0600 Valgrind logs processed with: grep -F -v "Warning: invalid file descriptor" valgrind.txt | grep -F -v "Use --log-fd=<number> to select an alternative log fd." By: Joshua C. Colp (jcolp) 2008-01-30 11:07:45.000-0600 Is this compiled with DONT_OPTIMIZE? do you have the actual console output? what are the options to Chanspy and how is it being used? By: Joshua C. Colp (jcolp) 2008-01-30 11:36:06.000-0600 Please give this patch a try. By: Russell Bryant (russell) 2008-01-30 17:39:39.000-0600 I noticed that the previous approach could have introduced a deadlock. Here is another attempt at it. Please give it a try. It's also in a branch if you want to just grab it from there: svn co http://svn.digium.com/svn/asterisk/team/russell/chanspy_fixes By: Digium Subversion (svnbot) 2008-01-30 17:39:47.000-0600 Repository: asterisk Revision: 101366 U team/russell/chanspy_fixes/apps/app_chanspy.c ------------------------------------------------------------------------ r101366 | russell | 2008-01-30 17:39:46 -0600 (Wed, 30 Jan 2008) | 2 lines add chanspy fixes for issue ASTERISK-11336 ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=101366 By: Atis Lezdins (atis) 2008-01-30 17:51:58.000-0600 Huh, however i noticed res_monitor is not audiohook'ed. Is there a point in testing this, or i should wait until it's ported and test with mixmonitor instead. I'm going to sleep and could leave tests running overnight.. By: Russell Bryant (russell) 2008-01-30 18:28:30.000-0600 This second patch should be no different than the first. It is just more efficient on its memory handling (uses stack instead of heap). By: Atis Lezdins (atis) 2008-01-31 06:06:45.000-0600 I tested with 1.4.18 rc3 and chanspy.diff2 applied. First i got deadlock of SIP (IAX seemed to respond to IAXmodem registrations) - registrations and incoming calls timed out, "show queues" and "show channels" exited immediately. I killed asterisk with -5 and took full backtraces. See attachments 2008-01-31-a-* After that i enabled DETECT_DEADLOCKS in menuselect, and got two segfaults (attachments 2008-01-31-b-* and 2008-01-31-c-*) on #0 0x080890f0 in ast_channel_datastore_find (chan=0x85a8ec0, info=0xb5b9f8d0, uid=0x0) at channel.c:1356^ 1356 AST_LIST_TRAVERSE_SAFE_BEGIN(&chan->datastores, datastore, entry) {^ From logs seems that it happens when chanspy is starting spying (after few successfully spied calls). Log before segfault shows: [Jan 31 05:28:13] ERROR[19310] /usr/dist/asterisk-1.4.18-rc1/include/asterisk/lock.h: app_chanspy.c line 242 (channel_spy): Error obtaining mutex: Invalid argument [Jan 31 05:28:13] ERROR[19310] /usr/dist/asterisk-1.4.18-rc1/include/asterisk/lock.h: app_chanspy.c line 408 (chanspy_ds_free): Error obtaining mutex: Invalid argument By: Atis Lezdins (atis) 2008-01-31 06:16:28.000-0600 file: Of course i have DONT_OPTIMIZE - otherwise i wouldn't get useful backtraces, right? I'm monitoring other devices with Chanspy(SIP/12345) - that's all. Also i use Monitor() for each call. Simultaneously there are up to 5 chanspies running, they're listening all the time, and after spied devices hangs up - they are waiting for next calls (listening beeps). By: Atis Lezdins (atis) 2008-01-31 07:44:57.000-0600 Two more crashes when running with valgrind. In first i had a custom patch - one line "chmod" in file.c - but when i saw that in first valgrind log, i removed that and recompiled. Attachments: 2008-01-31-d-* and 2008-01-31-e-* By: Atis Lezdins (atis) 2008-01-31 09:34:39.000-0600 Another crash (named f), however this time i replaced Monitor with MixMonitor. I got all queue recordings 160 bytes in size, but i guess it's because MixMonitor doesn't like to sit in answer macro of Dial. By: Russell Bryant (russell) 2008-02-25 17:35:41.000-0600 I'm going to merge this patch, as I am confident that it resolves the original crash. If you are still having troubles, please open a new report. Thanks! By: Digium Subversion (svnbot) 2008-02-25 17:39:03.000-0600 Repository: asterisk Revision: 104106 U branches/1.4/apps/app_chanspy.c ------------------------------------------------------------------------ r104106 | russell | 2008-02-25 17:39:02 -0600 (Mon, 25 Feb 2008) | 10 lines This patch fixes some pretty significant problems with how app_chanspy handles pointers to channels that are being spied upon. It was very likely that a crash would occur if the channel being spied upon hung up. This was because the current ast_channel handling _requires_ that the object is locked or else it could disappear at any time (except in the owning channel thread). So, this patch uses some channel datastore magic on the spied upon channel to be able to detect if and when the channel goes away. (closes issue ASTERISK-11336) (patch written by me, but thanks to kpfleming for the idea, and to file for review) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=104106 By: Digium Subversion (svnbot) 2008-02-25 17:44:37.000-0600 Repository: asterisk Revision: 104107 _U trunk/ U trunk/apps/app_chanspy.c ------------------------------------------------------------------------ r104107 | russell | 2008-02-25 17:44:36 -0600 (Mon, 25 Feb 2008) | 18 lines Merged revisions 104106 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r104106 | russell | 2008-02-25 17:42:42 -0600 (Mon, 25 Feb 2008) | 10 lines This patch fixes some pretty significant problems with how app_chanspy handles pointers to channels that are being spied upon. It was very likely that a crash would occur if the channel being spied upon hung up. This was because the current ast_channel handling _requires_ that the object is locked or else it could disappear at any time (except in the owning channel thread). So, this patch uses some channel datastore magic on the spied upon channel to be able to detect if and when the channel goes away. (closes issue ASTERISK-11336) (patch written by me, but thanks to kpfleming for the idea, and to file for review) ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=104107 By: Digium Subversion (svnbot) 2008-02-25 17:45:51.000-0600 Repository: asterisk Revision: 104108 _U team/russell/chan_refcount/ ------------------------------------------------------------------------ r104108 | russell | 2008-02-25 17:45:50 -0600 (Mon, 25 Feb 2008) | 25 lines Blocked revisions 104107 via svnmerge ................ r104107 | russell | 2008-02-25 17:48:16 -0600 (Mon, 25 Feb 2008) | 18 lines Merged revisions 104106 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r104106 | russell | 2008-02-25 17:42:42 -0600 (Mon, 25 Feb 2008) | 10 lines This patch fixes some pretty significant problems with how app_chanspy handles pointers to channels that are being spied upon. It was very likely that a crash would occur if the channel being spied upon hung up. This was because the current ast_channel handling _requires_ that the object is locked or else it could disappear at any time (except in the owning channel thread). So, this patch uses some channel datastore magic on the spied upon channel to be able to detect if and when the channel goes away. (closes issue ASTERISK-11336) (patch written by me, but thanks to kpfleming for the idea, and to file for review) ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=104108 |