[Home]

Summary:ASTERISK-11617: Crash in ChanSpy / AST_LIST_TRAVERSE_SAFE_BEGIN
Reporter:Atis Lezdins (atis)Labels:
Date Opened:2008-03-11 18:36:02Date Closed:2008-03-17 13:03:40
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_chanspy
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt_1.txt
( 1) bt_2.txt
( 2) bt.core.16470.txt
( 3) bt.core.21576.txt
( 4) full_1.log
( 5) full_involved.log
( 6) full.21576.log.gz
Description:I got this while running automated tests on 1.4.19-rc2 after approximately hour.

Bt full attached.

#0  0x08082d07 in ast_channel_datastore_find (chan=0x863ebf8, info=0xb60d009c, uid=0x0) at channel.c:1379
1379            AST_LIST_TRAVERSE_SAFE_BEGIN(&chan->datastores, datastore, entry) {
(gdb) bt
#0  0x08082d07 in ast_channel_datastore_find (chan=0x863ebf8, info=0xb60d009c, uid=0x0) at channel.c:1379
#1  0xb60cc484 in chanspy_ds_free (chanspy_ds=0xb5b3f900) at app_chanspy.c:409
#2  0xb60cce89 in common_exec (chan=0x8595890, flags=0xb5b3fde8, volfactor=0, fd=0, mygroup=0x0, spec=0xb5b3fb80 "SIP/22226",
   exten=0x0, context=0x0) at app_chanspy.c:521
#3  0xb60cd40d in chanspy_exec (chan=0x8595890, data=0xb5b3fb80) at app_chanspy.c:725
#4  0x080c110c in pbx_exec (c=0x8595890, app=0x8237da8, data=0xb5b41f50)
   at /usr/dist/asterisk-svn-1.4.19-rc2/include/asterisk/strings.h:35
ASTERISK-1  0x080c4355 in pbx_extension_helper (c=0x8595890, con=0x0, context=0x8595a10 "static_ivr", exten=0x8595a60 "007",
   priority=25, label=0x0, callerid=0x82f3080 "22231", action=E_SPAWN) at pbx.c:1859
ASTERISK-2  0x080c5443 in ast_spawn_extension (c=0x8595890, context=0x8595a10 "static_ivr", exten=0x8595a60 "007", priority=25,
   callerid=0x82f3080 "22231") at pbx.c:2314
ASTERISK-3  0x080c5981 in __ast_pbx_run (c=0x8595890) at pbx.c:2416
ASTERISK-4  0x080c66d6 in pbx_thread (data=0x8595890) at pbx.c:2631
ASTERISK-5  0x08101357 in dummy_start (data=0x8409190) at utils.c:865
ASTERISK-6 0x4b0db45b in start_thread () from /lib/libpthread.so.0
ASTERISK-7 0x4b03323e in clone () from /lib/libc.so.6
Comments:By: Atis Lezdins (atis) 2008-03-11 20:45:47

another crash in chanspy, could be related to first one.

By: Atis Lezdins (atis) 2008-03-11 21:03:03

For the last crash, i also posted complete log since start - it crashed in less than a minute

By: Digium Subversion (svnbot) 2008-03-12 11:55:00

Repository: asterisk
Revision: 108031

U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r108031 | russell | 2008-03-12 11:54:59 -0500 (Wed, 12 Mar 2008) | 4 lines

Destroy the channel lock after the channel datastores.

(inspired by issue ASTERISK-11617)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=108031

By: Digium Subversion (svnbot) 2008-03-12 11:58:57

Repository: asterisk
Revision: 108032

_U  trunk/
U   trunk/main/channel.c

------------------------------------------------------------------------
r108032 | russell | 2008-03-12 11:58:56 -0500 (Wed, 12 Mar 2008) | 12 lines

Merged revisions 108031 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r108031 | russell | 2008-03-12 11:59:07 -0500 (Wed, 12 Mar 2008) | 4 lines

Destroy the channel lock after the channel datastores.

(inspired by issue ASTERISK-11617)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=108032

By: Digium Subversion (svnbot) 2008-03-12 11:59:17

Repository: asterisk
Revision: 108033

_U  branches/1.6.0/
U   branches/1.6.0/main/channel.c

------------------------------------------------------------------------
r108033 | russell | 2008-03-12 11:59:16 -0500 (Wed, 12 Mar 2008) | 20 lines

Merged revisions 108032 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r108032 | russell | 2008-03-12 12:02:57 -0500 (Wed, 12 Mar 2008) | 12 lines

Merged revisions 108031 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r108031 | russell | 2008-03-12 11:59:07 -0500 (Wed, 12 Mar 2008) | 4 lines

Destroy the channel lock after the channel datastores.

(inspired by issue ASTERISK-11617)

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=108033

By: Digium Subversion (svnbot) 2008-03-12 14:53:48

Repository: asterisk
Revision: 108135

U   branches/1.4/apps/app_chanspy.c
U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r108135 | russell | 2008-03-12 14:53:42 -0500 (Wed, 12 Mar 2008) | 40 lines

(closes issue ASTERISK-11617, reported by atis, fixed by me after some brainstorming
on the issue with mmichelson)

- Update copyright info on app_chanspy.

- Fix a race condition that caused app_chanspy to crash.  The issue was that
 the chanspy datastore magic that was used to ensure that spyee channels did
 not disappear out from under the code did not completely solve the problem.
 It was actually possible for chanspy to acquire a channel reference out of
 its datastore to a channel that was in the middle of being destroyed.  That
 was because datastore destruction in ast_channel_free() was done near the
 end.  So, this left the code in app_chanspy accessing a channel that was
 partially, or completely invalid because it was in the process of being free'd
 by another thread.  The following sort of shows the code path where the race
 occurred:

 =============================================================================
 Thread 1 (PBX thread for spyee chan)  ||   Thread 2 (chanspy)
 --------------------------------------||-------------------------------------
 ast_channel_free()                    ||
   - remove channel from channel list  ||
   - lock/unlock the channel to ensure ||
     that no references retrieved from ||
     the channel list exist.           ||
 --------------------------------------||-------------------------------------
                                       || channel_spy()
   - destroy some channel data         ||  - Lock chanspy datastore
                                       ||  - Retrieve reference to channel
   - destroy channel datastores        ||  - lock channel
      - call chanspy datastore d'tor   ||  - Unlock chanspy datastore
        which NULL's out the ds'       ||
        reference to the channel       ||  
                                       ||  - Operate on the channel ...
   - free the channel                  ||
                                       ||
                                       ||  - unlock the channel
 --------------------------------------||-------------------------------------
 =============================================================================


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=108135

By: Digium Subversion (svnbot) 2008-03-12 14:55:18

Repository: asterisk
Revision: 108137

_U  trunk/
U   trunk/apps/app_chanspy.c
U   trunk/main/channel.c

------------------------------------------------------------------------
r108137 | russell | 2008-03-12 14:55:18 -0500 (Wed, 12 Mar 2008) | 48 lines

Merged revisions 108135 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r108135 | russell | 2008-03-12 14:57:42 -0500 (Wed, 12 Mar 2008) | 40 lines

(closes issue ASTERISK-11617, reported by atis, fixed by me after some brainstorming
on the issue with mmichelson)

- Update copyright info on app_chanspy.

- Fix a race condition that caused app_chanspy to crash.  The issue was that
 the chanspy datastore magic that was used to ensure that spyee channels did
 not disappear out from under the code did not completely solve the problem.
 It was actually possible for chanspy to acquire a channel reference out of
 its datastore to a channel that was in the middle of being destroyed.  That
 was because datastore destruction in ast_channel_free() was done near the
 end.  So, this left the code in app_chanspy accessing a channel that was
 partially, or completely invalid because it was in the process of being free'd
 by another thread.  The following sort of shows the code path where the race
 occurred:

 =============================================================================
 Thread 1 (PBX thread for spyee chan)  ||   Thread 2 (chanspy)
 --------------------------------------||-------------------------------------
 ast_channel_free()                    ||
   - remove channel from channel list  ||
   - lock/unlock the channel to ensure ||
     that no references retrieved from ||
     the channel list exist.           ||
 --------------------------------------||-------------------------------------
                                       || channel_spy()
   - destroy some channel data         ||  - Lock chanspy datastore
                                       ||  - Retrieve reference to channel
   - destroy channel datastores        ||  - lock channel
      - call chanspy datastore d'tor   ||  - Unlock chanspy datastore
        which NULL's out the ds'       ||
        reference to the channel       ||  
                                       ||  - Operate on the channel ...
   - free the channel                  ||
                                       ||
                                       ||  - unlock the channel
 --------------------------------------||-------------------------------------
 =============================================================================


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=108137

By: Digium Subversion (svnbot) 2008-03-12 14:55:22

Repository: asterisk
Revision: 108138

_U  branches/1.6.0/
U   branches/1.6.0/apps/app_chanspy.c
U   branches/1.6.0/main/channel.c

------------------------------------------------------------------------
r108138 | russell | 2008-03-12 14:55:22 -0500 (Wed, 12 Mar 2008) | 56 lines

Merged revisions 108137 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r108137 | russell | 2008-03-12 14:59:05 -0500 (Wed, 12 Mar 2008) | 48 lines

Merged revisions 108135 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r108135 | russell | 2008-03-12 14:57:42 -0500 (Wed, 12 Mar 2008) | 40 lines

(closes issue ASTERISK-11617, reported by atis, fixed by me after some brainstorming
on the issue with mmichelson)

- Update copyright info on app_chanspy.

- Fix a race condition that caused app_chanspy to crash.  The issue was that
 the chanspy datastore magic that was used to ensure that spyee channels did
 not disappear out from under the code did not completely solve the problem.
 It was actually possible for chanspy to acquire a channel reference out of
 its datastore to a channel that was in the middle of being destroyed.  That
 was because datastore destruction in ast_channel_free() was done near the
 end.  So, this left the code in app_chanspy accessing a channel that was
 partially, or completely invalid because it was in the process of being free'd
 by another thread.  The following sort of shows the code path where the race
 occurred:

 =============================================================================
 Thread 1 (PBX thread for spyee chan)  ||   Thread 2 (chanspy)
 --------------------------------------||-------------------------------------
 ast_channel_free()                    ||
   - remove channel from channel list  ||
   - lock/unlock the channel to ensure ||
     that no references retrieved from ||
     the channel list exist.           ||
 --------------------------------------||-------------------------------------
                                       || channel_spy()
   - destroy some channel data         ||  - Lock chanspy datastore
                                       ||  - Retrieve reference to channel
   - destroy channel datastores        ||  - lock channel
      - call chanspy datastore d'tor   ||  - Unlock chanspy datastore
        which NULL's out the ds'       ||
        reference to the channel       ||  
                                       ||  - Operate on the channel ...
   - free the channel                  ||
                                       ||
                                       ||  - unlock the channel
 --------------------------------------||-------------------------------------
 =============================================================================


........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=108138

By: Atis Lezdins (atis) 2008-03-13 16:02:26

I tested r108288 and i still got identical backtraces.

I'll attach processed log of acting threads from previously posted full.21576.log.gz

The basic scenario is this (in correct order)

[21643]: SIP/22234-08334e40
20:40:00 -> ChanSpy("SIP/22234-08334e40", "SIP/22254")

[21719]: SIP/112233-08395e40
20:40:20 -> Queue("SIP/112233-08395e40", "22901|t|||300")
20:40:20 -> Trying 'Local/22254@default_queue'

[21780]:
Local/22254@default_queue-85f7,2
20:40:21 -> Dial("Local/22254@default_queue-85f7,2", "SIP/22254|15|gtM(queue_call_answer^22254)")
20:40:21 -> Called 22254

[21643]:
20:40:21 -> <SIP/22234-08334e40> Playing 'spy-sip'

[21780]:
20:40:21 -> SIP/22254-08314ea8 answered Local/22254@default_queue-85f7,2
20:40:21 -> s@macro-queue_call_answer

[21719]:
20:40:21 -> Local/22254@default_queue-85f7,1 answered SIP/112233-08395e40

[21780]:
20:40:21 -> Planning to masquerade channel SIP/22254-08314ea8 into the structure of Local/22254@default_queue-85f7,1

[21719]:
20:40:21 -> Done Masquerading SIP/22254-08314ea8 (6)

[21780]:
20:40:21 -> Soft-Hanging up channel 'Local/22254@default_queue-85f7,2'
20:40:21 -> Hanging up channel 'Local/22254@default_queue-85f7,2'

[21643]:
20:40:24 -> Set channel SIP/22234-08334e40 to write format slin
boom


By: Digium Subversion (svnbot) 2008-03-13 16:34:21

Repository: asterisk
Revision: 108583

U   branches/1.4/apps/app_chanspy.c
U   branches/1.4/include/asterisk/channel.h
U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r108583 | russell | 2008-03-13 16:34:13 -0500 (Thu, 13 Mar 2008) | 11 lines

Fix another issue that was causing crashes in chanspy.  This introduces a new
datastore callback, called chan_fixup().  The concept is exactly like the
fixup callback that is used in the channel technology interface.  This callback
gets called when the owning channel changes due to a masquerade.  Before this
was introduced, if a masquerade happened on a channel being spyed on, the
channel pointer in the datastore became invalid.

(closes issue ASTERISK-11617)
(reported by, and lots of testing from atis)
(props to file for the help with ideas)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=108583

By: Digium Subversion (svnbot) 2008-03-13 16:37:09

Repository: asterisk
Revision: 108584

_U  trunk/
U   trunk/apps/app_chanspy.c
U   trunk/include/asterisk/channel.h
U   trunk/main/channel.c

------------------------------------------------------------------------
r108584 | russell | 2008-03-13 16:36:52 -0500 (Thu, 13 Mar 2008) | 19 lines

Merged revisions 108583 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r108583 | russell | 2008-03-13 16:38:16 -0500 (Thu, 13 Mar 2008) | 11 lines

Fix another issue that was causing crashes in chanspy.  This introduces a new
datastore callback, called chan_fixup().  The concept is exactly like the
fixup callback that is used in the channel technology interface.  This callback
gets called when the owning channel changes due to a masquerade.  Before this
was introduced, if a masquerade happened on a channel being spyed on, the
channel pointer in the datastore became invalid.

(closes issue ASTERISK-11617)
(reported by, and lots of testing from atis)
(props to file for the help with ideas)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=108584

By: Digium Subversion (svnbot) 2008-03-13 16:37:33

Repository: asterisk
Revision: 108585

_U  branches/1.6.0/
U   branches/1.6.0/apps/app_chanspy.c
U   branches/1.6.0/include/asterisk/channel.h
U   branches/1.6.0/main/channel.c

------------------------------------------------------------------------
r108585 | russell | 2008-03-13 16:37:31 -0500 (Thu, 13 Mar 2008) | 27 lines

Merged revisions 108584 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r108584 | russell | 2008-03-13 16:40:43 -0500 (Thu, 13 Mar 2008) | 19 lines

Merged revisions 108583 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r108583 | russell | 2008-03-13 16:38:16 -0500 (Thu, 13 Mar 2008) | 11 lines

Fix another issue that was causing crashes in chanspy.  This introduces a new
datastore callback, called chan_fixup().  The concept is exactly like the
fixup callback that is used in the channel technology interface.  This callback
gets called when the owning channel changes due to a masquerade.  Before this
was introduced, if a masquerade happened on a channel being spyed on, the
channel pointer in the datastore became invalid.

(closes issue ASTERISK-11617)
(reported by, and lots of testing from atis)
(props to file for the help with ideas)

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=108585

By: Digium Subversion (svnbot) 2008-03-17 13:03:40

Repository: asterisk
Revision: 109173

_U  team/murf/bug11210/
U   team/murf/bug11210/apps/app_chanspy.c
U   team/murf/bug11210/apps/app_mixmonitor.c
U   team/murf/bug11210/channels/chan_sip.c
U   team/murf/bug11210/funcs/func_config.c
U   team/murf/bug11210/include/asterisk/astobj2.h
U   team/murf/bug11210/include/asterisk/audiohook.h
U   team/murf/bug11210/include/asterisk/pbx.h
U   team/murf/bug11210/main/astobj2.c
U   team/murf/bug11210/main/audiohook.c
U   team/murf/bug11210/main/channel.c
U   team/murf/bug11210/main/pbx.c

------------------------------------------------------------------------
r109173 | murf | 2008-03-17 13:03:22 -0500 (Mon, 17 Mar 2008) | 105 lines

Merged revisions 107998,108032,108034,108084,108137 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r107998 | tilghman | 2008-03-12 01:43:03 -0600 (Wed, 12 Mar 2008) | 7 lines

Deadlock fixes
(closes issue ASTERISK-11576)
Reported by: kactus
Patches:
      20080312__bug12143__2.diff.txt uploaded by Corydon76 (license 14)
Tested by: kactus

................
r108032 | russell | 2008-03-12 11:02:57 -0600 (Wed, 12 Mar 2008) | 12 lines

Merged revisions 108031 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r108031 | russell | 2008-03-12 11:59:07 -0500 (Wed, 12 Mar 2008) | 4 lines

Destroy the channel lock after the channel datastores.

(inspired by issue ASTERISK-11617)

........

................
r108034 | russell | 2008-03-12 11:06:37 -0600 (Wed, 12 Mar 2008) | 4 lines

- Add Tilghman to the copyright info ... he wrote the hard part :)
- Remove some magic in unload_module that isn't needed.  Module use counts
  already ensure that the function isn't going to be in use at this point.

................
r108084 | file | 2008-03-12 12:29:33 -0600 (Wed, 12 Mar 2008) | 12 lines

Merged revisions 108083 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r108083 | file | 2008-03-12 15:26:37 -0300 (Wed, 12 Mar 2008) | 4 lines

Add a trigger mode that triggers on both read and write. The actual function that returns the combined audio frame though will wait until both sides have fed in audio, or until one side stops (such as the case when you call Wait).
(closes issue ASTERISK-11394)
Reported by: xheliox

........

................
r108137 | russell | 2008-03-12 13:59:05 -0600 (Wed, 12 Mar 2008) | 50 lines

Merged revisions 108135 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r108135 | russell | 2008-03-12 14:57:42 -0500 (Wed, 12 Mar 2008) | 40 lines

(closes issue ASTERISK-11617, reported by atis, fixed by me after some brainstorming
on the issue with mmichelson)

- Update copyright info on app_chanspy.

- Fix a race condition that caused app_chanspy to crash.  The issue was that
 the chanspy datastore magic that was used to ensure that spyee channels did
 not disappear out from under the code did not completely solve the problem.
 It was actually possible for chanspy to acquire a channel reference out of
 its datastore to a channel that was in the middle of being destroyed.  That
 was because datastore destruction in ast_channel_free() was done near the
 end.  So, this left the code in app_chanspy accessing a channel that was
 partially, or completely invalid because it was in the process of being free'd
 by another thread.  The following sort of shows the code path where the race
 occurred:

 =============================================================================
 Thread 1 (PBX thread for spyee chan)  ||   Thread 2 (chanspy)
 --------------------------------------||-------------------------------------
 ast_channel_free()                    ||
   - remove channel from channel list  ||
   - lock/unlock the channel to ensure ||
     that no references retrieved from ||
     the channel list exist.           ||
 --------------------------------------||-------------------------------------
                                       || channel_spy()
   - destroy some channel data         ||  - Lock chanspy datastore
                                       ||  - Retrieve reference to channel
                                       ||  - lock channel
                                       ||  - Unlock chanspy datastore
 --------------------------------------||-------------------------------------
    - destroy channel datastores       ||
       - call chanspy datastore d'tor  ||  
         which NULL's out the ds'      ||  - Operate on the channel ...
         reference to the channel      ||    
                                       ||
   - free the channel                  ||
                                       ||
                                       ||  - unlock the channel
 --------------------------------------||-------------------------------------
 =============================================================================

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=109173