Summary: | ASTERISK-11617: Crash in ChanSpy / AST_LIST_TRAVERSE_SAFE_BEGIN | ||
Reporter: | Atis Lezdins (atis) | Labels: | |
Date Opened: | 2008-03-11 18:36:02 | Date Closed: | 2008-03-17 13:03:40 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Applications/app_chanspy |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) bt_1.txt ( 1) bt_2.txt ( 2) bt.core.16470.txt ( 3) bt.core.21576.txt ( 4) full_1.log ( 5) full_involved.log ( 6) full.21576.log.gz | |
Description: | I got this while running automated tests on 1.4.19-rc2 after approximately hour. Bt full attached. #0 0x08082d07 in ast_channel_datastore_find (chan=0x863ebf8, info=0xb60d009c, uid=0x0) at channel.c:1379 1379 AST_LIST_TRAVERSE_SAFE_BEGIN(&chan->datastores, datastore, entry) { (gdb) bt #0 0x08082d07 in ast_channel_datastore_find (chan=0x863ebf8, info=0xb60d009c, uid=0x0) at channel.c:1379 #1 0xb60cc484 in chanspy_ds_free (chanspy_ds=0xb5b3f900) at app_chanspy.c:409 #2 0xb60cce89 in common_exec (chan=0x8595890, flags=0xb5b3fde8, volfactor=0, fd=0, mygroup=0x0, spec=0xb5b3fb80 "SIP/22226", exten=0x0, context=0x0) at app_chanspy.c:521 #3 0xb60cd40d in chanspy_exec (chan=0x8595890, data=0xb5b3fb80) at app_chanspy.c:725 #4 0x080c110c in pbx_exec (c=0x8595890, app=0x8237da8, data=0xb5b41f50) at /usr/dist/asterisk-svn-1.4.19-rc2/include/asterisk/strings.h:35 ASTERISK-1 0x080c4355 in pbx_extension_helper (c=0x8595890, con=0x0, context=0x8595a10 "static_ivr", exten=0x8595a60 "007", priority=25, label=0x0, callerid=0x82f3080 "22231", action=E_SPAWN) at pbx.c:1859 ASTERISK-2 0x080c5443 in ast_spawn_extension (c=0x8595890, context=0x8595a10 "static_ivr", exten=0x8595a60 "007", priority=25, callerid=0x82f3080 "22231") at pbx.c:2314 ASTERISK-3 0x080c5981 in __ast_pbx_run (c=0x8595890) at pbx.c:2416 ASTERISK-4 0x080c66d6 in pbx_thread (data=0x8595890) at pbx.c:2631 ASTERISK-5 0x08101357 in dummy_start (data=0x8409190) at utils.c:865 ASTERISK-6 0x4b0db45b in start_thread () from /lib/libpthread.so.0 ASTERISK-7 0x4b03323e in clone () from /lib/libc.so.6 | ||
Comments: | By: Atis Lezdins (atis) 2008-03-11 20:45:47 another crash in chanspy, could be related to first one. By: Atis Lezdins (atis) 2008-03-11 21:03:03 For the last crash, i also posted complete log since start - it crashed in less than a minute By: Digium Subversion (svnbot) 2008-03-12 11:55:00 Repository: asterisk Revision: 108031 U branches/1.4/main/channel.c ------------------------------------------------------------------------ r108031 | russell | 2008-03-12 11:54:59 -0500 (Wed, 12 Mar 2008) | 4 lines Destroy the channel lock after the channel datastores. (inspired by issue ASTERISK-11617) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108031 By: Digium Subversion (svnbot) 2008-03-12 11:58:57 Repository: asterisk Revision: 108032 _U trunk/ U trunk/main/channel.c ------------------------------------------------------------------------ r108032 | russell | 2008-03-12 11:58:56 -0500 (Wed, 12 Mar 2008) | 12 lines Merged revisions 108031 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r108031 | russell | 2008-03-12 11:59:07 -0500 (Wed, 12 Mar 2008) | 4 lines Destroy the channel lock after the channel datastores. (inspired by issue ASTERISK-11617) ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108032 By: Digium Subversion (svnbot) 2008-03-12 11:59:17 Repository: asterisk Revision: 108033 _U branches/1.6.0/ U branches/1.6.0/main/channel.c ------------------------------------------------------------------------ r108033 | russell | 2008-03-12 11:59:16 -0500 (Wed, 12 Mar 2008) | 20 lines Merged revisions 108032 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r108032 | russell | 2008-03-12 12:02:57 -0500 (Wed, 12 Mar 2008) | 12 lines Merged revisions 108031 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r108031 | russell | 2008-03-12 11:59:07 -0500 (Wed, 12 Mar 2008) | 4 lines Destroy the channel lock after the channel datastores. (inspired by issue ASTERISK-11617) ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108033 By: Digium Subversion (svnbot) 2008-03-12 14:53:48 Repository: asterisk Revision: 108135 U branches/1.4/apps/app_chanspy.c U branches/1.4/main/channel.c ------------------------------------------------------------------------ r108135 | russell | 2008-03-12 14:53:42 -0500 (Wed, 12 Mar 2008) | 40 lines (closes issue ASTERISK-11617, reported by atis, fixed by me after some brainstorming on the issue with mmichelson) - Update copyright info on app_chanspy. - Fix a race condition that caused app_chanspy to crash. The issue was that the chanspy datastore magic that was used to ensure that spyee channels did not disappear out from under the code did not completely solve the problem. It was actually possible for chanspy to acquire a channel reference out of its datastore to a channel that was in the middle of being destroyed. That was because datastore destruction in ast_channel_free() was done near the end. So, this left the code in app_chanspy accessing a channel that was partially, or completely invalid because it was in the process of being free'd by another thread. The following sort of shows the code path where the race occurred: ============================================================================= Thread 1 (PBX thread for spyee chan) || Thread 2 (chanspy) --------------------------------------||------------------------------------- ast_channel_free() || - remove channel from channel list || - lock/unlock the channel to ensure || that no references retrieved from || the channel list exist. || --------------------------------------||------------------------------------- || channel_spy() - destroy some channel data || - Lock chanspy datastore || - Retrieve reference to channel - destroy channel datastores || - lock channel - call chanspy datastore d'tor || - Unlock chanspy datastore which NULL's out the ds' || reference to the channel || || - Operate on the channel ... - free the channel || || || - unlock the channel --------------------------------------||------------------------------------- ============================================================================= ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108135 By: Digium Subversion (svnbot) 2008-03-12 14:55:18 Repository: asterisk Revision: 108137 _U trunk/ U trunk/apps/app_chanspy.c U trunk/main/channel.c ------------------------------------------------------------------------ r108137 | russell | 2008-03-12 14:55:18 -0500 (Wed, 12 Mar 2008) | 48 lines Merged revisions 108135 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r108135 | russell | 2008-03-12 14:57:42 -0500 (Wed, 12 Mar 2008) | 40 lines (closes issue ASTERISK-11617, reported by atis, fixed by me after some brainstorming on the issue with mmichelson) - Update copyright info on app_chanspy. - Fix a race condition that caused app_chanspy to crash. The issue was that the chanspy datastore magic that was used to ensure that spyee channels did not disappear out from under the code did not completely solve the problem. It was actually possible for chanspy to acquire a channel reference out of its datastore to a channel that was in the middle of being destroyed. That was because datastore destruction in ast_channel_free() was done near the end. So, this left the code in app_chanspy accessing a channel that was partially, or completely invalid because it was in the process of being free'd by another thread. The following sort of shows the code path where the race occurred: ============================================================================= Thread 1 (PBX thread for spyee chan) || Thread 2 (chanspy) --------------------------------------||------------------------------------- ast_channel_free() || - remove channel from channel list || - lock/unlock the channel to ensure || that no references retrieved from || the channel list exist. || --------------------------------------||------------------------------------- || channel_spy() - destroy some channel data || - Lock chanspy datastore || - Retrieve reference to channel - destroy channel datastores || - lock channel - call chanspy datastore d'tor || - Unlock chanspy datastore which NULL's out the ds' || reference to the channel || || - Operate on the channel ... - free the channel || || || - unlock the channel --------------------------------------||------------------------------------- ============================================================================= ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108137 By: Digium Subversion (svnbot) 2008-03-12 14:55:22 Repository: asterisk Revision: 108138 _U branches/1.6.0/ U branches/1.6.0/apps/app_chanspy.c U branches/1.6.0/main/channel.c ------------------------------------------------------------------------ r108138 | russell | 2008-03-12 14:55:22 -0500 (Wed, 12 Mar 2008) | 56 lines Merged revisions 108137 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r108137 | russell | 2008-03-12 14:59:05 -0500 (Wed, 12 Mar 2008) | 48 lines Merged revisions 108135 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r108135 | russell | 2008-03-12 14:57:42 -0500 (Wed, 12 Mar 2008) | 40 lines (closes issue ASTERISK-11617, reported by atis, fixed by me after some brainstorming on the issue with mmichelson) - Update copyright info on app_chanspy. - Fix a race condition that caused app_chanspy to crash. The issue was that the chanspy datastore magic that was used to ensure that spyee channels did not disappear out from under the code did not completely solve the problem. It was actually possible for chanspy to acquire a channel reference out of its datastore to a channel that was in the middle of being destroyed. That was because datastore destruction in ast_channel_free() was done near the end. So, this left the code in app_chanspy accessing a channel that was partially, or completely invalid because it was in the process of being free'd by another thread. The following sort of shows the code path where the race occurred: ============================================================================= Thread 1 (PBX thread for spyee chan) || Thread 2 (chanspy) --------------------------------------||------------------------------------- ast_channel_free() || - remove channel from channel list || - lock/unlock the channel to ensure || that no references retrieved from || the channel list exist. || --------------------------------------||------------------------------------- || channel_spy() - destroy some channel data || - Lock chanspy datastore || - Retrieve reference to channel - destroy channel datastores || - lock channel - call chanspy datastore d'tor || - Unlock chanspy datastore which NULL's out the ds' || reference to the channel || || - Operate on the channel ... - free the channel || || || - unlock the channel --------------------------------------||------------------------------------- ============================================================================= ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108138 By: Atis Lezdins (atis) 2008-03-13 16:02:26 I tested r108288 and i still got identical backtraces. I'll attach processed log of acting threads from previously posted full.21576.log.gz The basic scenario is this (in correct order) [21643]: SIP/22234-08334e40 20:40:00 -> ChanSpy("SIP/22234-08334e40", "SIP/22254") [21719]: SIP/112233-08395e40 20:40:20 -> Queue("SIP/112233-08395e40", "22901|t|||300") 20:40:20 -> Trying 'Local/22254@default_queue' [21780]: Local/22254@default_queue-85f7,2 20:40:21 -> Dial("Local/22254@default_queue-85f7,2", "SIP/22254|15|gtM(queue_call_answer^22254)") 20:40:21 -> Called 22254 [21643]: 20:40:21 -> <SIP/22234-08334e40> Playing 'spy-sip' [21780]: 20:40:21 -> SIP/22254-08314ea8 answered Local/22254@default_queue-85f7,2 20:40:21 -> s@macro-queue_call_answer [21719]: 20:40:21 -> Local/22254@default_queue-85f7,1 answered SIP/112233-08395e40 [21780]: 20:40:21 -> Planning to masquerade channel SIP/22254-08314ea8 into the structure of Local/22254@default_queue-85f7,1 [21719]: 20:40:21 -> Done Masquerading SIP/22254-08314ea8 (6) [21780]: 20:40:21 -> Soft-Hanging up channel 'Local/22254@default_queue-85f7,2' 20:40:21 -> Hanging up channel 'Local/22254@default_queue-85f7,2' [21643]: 20:40:24 -> Set channel SIP/22234-08334e40 to write format slin boom By: Digium Subversion (svnbot) 2008-03-13 16:34:21 Repository: asterisk Revision: 108583 U branches/1.4/apps/app_chanspy.c U branches/1.4/include/asterisk/channel.h U branches/1.4/main/channel.c ------------------------------------------------------------------------ r108583 | russell | 2008-03-13 16:34:13 -0500 (Thu, 13 Mar 2008) | 11 lines Fix another issue that was causing crashes in chanspy. This introduces a new datastore callback, called chan_fixup(). The concept is exactly like the fixup callback that is used in the channel technology interface. This callback gets called when the owning channel changes due to a masquerade. Before this was introduced, if a masquerade happened on a channel being spyed on, the channel pointer in the datastore became invalid. (closes issue ASTERISK-11617) (reported by, and lots of testing from atis) (props to file for the help with ideas) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108583 By: Digium Subversion (svnbot) 2008-03-13 16:37:09 Repository: asterisk Revision: 108584 _U trunk/ U trunk/apps/app_chanspy.c U trunk/include/asterisk/channel.h U trunk/main/channel.c ------------------------------------------------------------------------ r108584 | russell | 2008-03-13 16:36:52 -0500 (Thu, 13 Mar 2008) | 19 lines Merged revisions 108583 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r108583 | russell | 2008-03-13 16:38:16 -0500 (Thu, 13 Mar 2008) | 11 lines Fix another issue that was causing crashes in chanspy. This introduces a new datastore callback, called chan_fixup(). The concept is exactly like the fixup callback that is used in the channel technology interface. This callback gets called when the owning channel changes due to a masquerade. Before this was introduced, if a masquerade happened on a channel being spyed on, the channel pointer in the datastore became invalid. (closes issue ASTERISK-11617) (reported by, and lots of testing from atis) (props to file for the help with ideas) ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108584 By: Digium Subversion (svnbot) 2008-03-13 16:37:33 Repository: asterisk Revision: 108585 _U branches/1.6.0/ U branches/1.6.0/apps/app_chanspy.c U branches/1.6.0/include/asterisk/channel.h U branches/1.6.0/main/channel.c ------------------------------------------------------------------------ r108585 | russell | 2008-03-13 16:37:31 -0500 (Thu, 13 Mar 2008) | 27 lines Merged revisions 108584 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r108584 | russell | 2008-03-13 16:40:43 -0500 (Thu, 13 Mar 2008) | 19 lines Merged revisions 108583 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r108583 | russell | 2008-03-13 16:38:16 -0500 (Thu, 13 Mar 2008) | 11 lines Fix another issue that was causing crashes in chanspy. This introduces a new datastore callback, called chan_fixup(). The concept is exactly like the fixup callback that is used in the channel technology interface. This callback gets called when the owning channel changes due to a masquerade. Before this was introduced, if a masquerade happened on a channel being spyed on, the channel pointer in the datastore became invalid. (closes issue ASTERISK-11617) (reported by, and lots of testing from atis) (props to file for the help with ideas) ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108585 By: Digium Subversion (svnbot) 2008-03-17 13:03:40 Repository: asterisk Revision: 109173 _U team/murf/bug11210/ U team/murf/bug11210/apps/app_chanspy.c U team/murf/bug11210/apps/app_mixmonitor.c U team/murf/bug11210/channels/chan_sip.c U team/murf/bug11210/funcs/func_config.c U team/murf/bug11210/include/asterisk/astobj2.h U team/murf/bug11210/include/asterisk/audiohook.h U team/murf/bug11210/include/asterisk/pbx.h U team/murf/bug11210/main/astobj2.c U team/murf/bug11210/main/audiohook.c U team/murf/bug11210/main/channel.c U team/murf/bug11210/main/pbx.c ------------------------------------------------------------------------ r109173 | murf | 2008-03-17 13:03:22 -0500 (Mon, 17 Mar 2008) | 105 lines Merged revisions 107998,108032,108034,108084,108137 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r107998 | tilghman | 2008-03-12 01:43:03 -0600 (Wed, 12 Mar 2008) | 7 lines Deadlock fixes (closes issue ASTERISK-11576) Reported by: kactus Patches: 20080312__bug12143__2.diff.txt uploaded by Corydon76 (license 14) Tested by: kactus ................ r108032 | russell | 2008-03-12 11:02:57 -0600 (Wed, 12 Mar 2008) | 12 lines Merged revisions 108031 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r108031 | russell | 2008-03-12 11:59:07 -0500 (Wed, 12 Mar 2008) | 4 lines Destroy the channel lock after the channel datastores. (inspired by issue ASTERISK-11617) ........ ................ r108034 | russell | 2008-03-12 11:06:37 -0600 (Wed, 12 Mar 2008) | 4 lines - Add Tilghman to the copyright info ... he wrote the hard part :) - Remove some magic in unload_module that isn't needed. Module use counts already ensure that the function isn't going to be in use at this point. ................ r108084 | file | 2008-03-12 12:29:33 -0600 (Wed, 12 Mar 2008) | 12 lines Merged revisions 108083 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r108083 | file | 2008-03-12 15:26:37 -0300 (Wed, 12 Mar 2008) | 4 lines Add a trigger mode that triggers on both read and write. The actual function that returns the combined audio frame though will wait until both sides have fed in audio, or until one side stops (such as the case when you call Wait). (closes issue ASTERISK-11394) Reported by: xheliox ........ ................ r108137 | russell | 2008-03-12 13:59:05 -0600 (Wed, 12 Mar 2008) | 50 lines Merged revisions 108135 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r108135 | russell | 2008-03-12 14:57:42 -0500 (Wed, 12 Mar 2008) | 40 lines (closes issue ASTERISK-11617, reported by atis, fixed by me after some brainstorming on the issue with mmichelson) - Update copyright info on app_chanspy. - Fix a race condition that caused app_chanspy to crash. The issue was that the chanspy datastore magic that was used to ensure that spyee channels did not disappear out from under the code did not completely solve the problem. It was actually possible for chanspy to acquire a channel reference out of its datastore to a channel that was in the middle of being destroyed. That was because datastore destruction in ast_channel_free() was done near the end. So, this left the code in app_chanspy accessing a channel that was partially, or completely invalid because it was in the process of being free'd by another thread. The following sort of shows the code path where the race occurred: ============================================================================= Thread 1 (PBX thread for spyee chan) || Thread 2 (chanspy) --------------------------------------||------------------------------------- ast_channel_free() || - remove channel from channel list || - lock/unlock the channel to ensure || that no references retrieved from || the channel list exist. || --------------------------------------||------------------------------------- || channel_spy() - destroy some channel data || - Lock chanspy datastore || - Retrieve reference to channel || - lock channel || - Unlock chanspy datastore --------------------------------------||------------------------------------- - destroy channel datastores || - call chanspy datastore d'tor || which NULL's out the ds' || - Operate on the channel ... reference to the channel || || - free the channel || || || - unlock the channel --------------------------------------||------------------------------------- ============================================================================= ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=109173 |