[Home]

Summary:ASTERISK-11586: race condition between sip hangup and "core show channel xxx" results in crash
Reporter:Torrey Searle (tsearle)Labels:
Date Opened:2008-03-06 02:56:43.000-0600Date Closed:2008-04-11 16:14:07
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_features
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) show_channels_crash.patch
( 1) show_channels_crash2.patch
Description:sometimes when the command "core show channel (channel name)" it results in a crash when trying to print the contents of the channels cdr object.  Analysis of the core dumps have show this to always occur at the same time the channel is in the process of hanging up.

Following the code backwards I found that the root cause was in res_features.c
After shutting down the bridge, it frees the channel cdr object without first locking the channel to make sure it's safe.

Attached is a patch that introduces locking.

****** ADDITIONAL INFORMATION ******

The show channels thread.  At this point cdr is corrupt because the original cdr->next no longer points to NULL after being freed.

#0  0x0807d842 in ast_cdr_serialize_variables (cdr=0x3f0, buf=0xb4f81e21 "",
                   size=1699, delim=61 '=', sep=10 '\n', recur=1) at cdr.c:382
#1  0x080937ad in handle_showchan (fd=947, argc=4, argv=0xb4f828f8)
   at cli.c:1156
#2  0x08095525 in ast_cli_command (fd=947,
                   s=0xb4f82c89 "core show channel SIP/5555-0881d288") at cli.c:1992
#3  0x080bca63 in action_command (s=0x87dc8e0, m=0xb4f82d00) at manager.c:1710
#4  0x080bedf6 in process_message (s=0x87dc8e0, m=0xb4f82d00) at manager.c:2163
ASTERISK-1  0x080bf322 in do_message (s=0x87dc8e0) at manager.c:2254
ASTERISK-2  0x080bf465 in session_do (data=0x87dc8e0) at manager.c:2270
ASTERISK-3  0x0810bfeb in dummy_start (data=0x84978d0) at utils.c:852
ASTERISK-4  0xb7f4346b in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-5  0xb7e4a6de in clone () from /lib/tls/i686/cmov/libc.so.6



The main thread for the channel

#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f49e1e in __lll_mutex_lock_wait ()
  from /lib/tls/i686/cmov/libpthread.so.0
#2  0xb7f45b74 in _L_mutex_lock_218 () from /lib/tls/i686/cmov/libpthread.so.0
#3  0xb7f45609 in pthread_mutex_lock () from /lib/tls/i686/cmov/libpthread.so.0
#4  0x08081cf5 in ast_mutex_lock (pmutex=0x881e890)
      at /home/torrey/asterisk/asterisk-1.4.18/include/asterisk/lock.h:701
ASTERISK-1  0x08084c8e in ast_hangup (chan=0x881e818) at channel.c:1418
ASTERISK-2  0xb702359b in dial_exec_full (chan=0x87a1db8, data=0xb6c6e0f8,
                   peerflags=0xb6c6be24, continue_exec=0x0) at app_dial.c:1726
ASTERISK-3  0xb70237ef in dial_exec (chan=0x87a1db8, data=0xb6c6e0f8)
   at app_dial.c:1760
ASTERISK-4  0x080c356c in pbx_exec (c=0x87a1db8, app=0x81ddb38, data=0xb6c6e0f8)
       at pbx.c:532
ASTERISK-5  0x080c6c42 in pbx_extension_helper (c=0x87a1db8, con=0x0,
                   context=0x87a1f38 "sip", exten=0x87a1f88 "5555", priority=1, label=0x0,
                       callerid=0x8208a68 "sipp", action=E_SPAWN) at pbx.c:1851
ASTERISK-6 0x080c7db1 in ast_spawn_extension (c=0x87a1db8, context=0x87a1f38 "sip",
                  exten=0x87a1f88 "5555", priority=1, callerid=0x8208a68 "sipp")
   at pbx.c:2306
ASTERISK-7 0x080c835f in __ast_pbx_run (c=0x87a1db8) at pbx.c:2408
ASTERISK-8 0x080c932c in pbx_thread (data=0x87a1db8) at pbx.c:2623
ASTERISK-9 0x0810bfeb in dummy_start (data=0x8748a98) at utils.c:852
ASTERISK-10 0xb7f4346b in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-11 0xb7e4a6de in clone () from /lib/tls/i686/cmov/libc.so.6

Comments:By: Eliel Sardanons (eliel) 2008-03-10 09:24:08

peer->cdr = NULL;
chan->cdr = bridge_cdr; /* make this available to the rest of the world via the chan while the call is in progress */
+ ast_channel_unlock(peer);

In the second block of the patch you are unlocking the channel structure but you continue writing into it. (chan->cdr = bridge_cdr). Why you think it is not necessary to lock till the end of the 'if' block?.

Thanks in advanced

By: Torrey Searle (tsearle) 2008-03-10 09:52:15

in the case of the assignment operation the pointer is null (which is ok) and pointing to a valid cdr object after so there is no risk of a crash (assignment is an atomic operation)

I will update the patch as per your comments

By: Eliel Sardanons (eliel) 2008-03-10 11:11:49

What happen if the channel structure is freed before the assigment?

EDITED: show_channels_crash2.patch is ok.



By: Tilghman Lesher (tilghman) 2008-03-10 12:16:42

Have you already updated to 1.4.19-rc1 and tried that?  I've already fixed a recent bug like this in SVN.

By: Torrey Searle (tsearle) 2008-03-11 01:48:11

just tested in 1.4.19-rc1, the segfault is still present, this patch is for an issue not already fixed



By: Torrey Searle (tsearle) 2008-03-14 11:45:00

Are there any other questions about this patch?

By: pliew (pliew) 2008-04-09 19:55:54

I can add that we have seen this behaviour on a few of our customer sites using 1.4.12 to the latest 1.4.19. They monitor Zap channel utilisation via the AMI "command -> core show channels" and at intermittent times will cause the SMP kernel to have a "panic attack". Asterisk still runs, but the thread that deals with further AMI requests for "show channels" dies. We will try the second patch on one of those servers and report back.

By: Torrey Searle (tsearle) 2008-04-10 02:40:34

are you doing "show channels" or "show channel [channel name]"?

for me the crash was involving showing a specific channel

By: Mark Michelson (mmichelson) 2008-04-11 10:23:47

I've taken a careful look at the patch and it seems sound to me. I will get it merged into the code. I'm surprised it took this long for someone to get on this and I apologize for it.

By: Digium Subversion (svnbot) 2008-04-11 10:39:38

Repository: asterisk
Revision: 114063

U   branches/1.4/res/res_features.c

------------------------------------------------------------------------
r114063 | mmichelson | 2008-04-11 10:39:36 -0500 (Fri, 11 Apr 2008) | 11 lines

Fix a race condition that may happen between a sip hangup
and a "core show channel" command. This patch adds locking
to prevent the resulting crash.

(closes issue ASTERISK-11586)
Reported by: tsearle
Patches:
     show_channels_crash2.patch uploaded by tsearle (license 373)
Tested by: tsearle


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=114063

By: Digium Subversion (svnbot) 2008-04-11 10:44:40

Repository: asterisk
Revision: 114064

_U  trunk/
U   trunk/main/features.c

------------------------------------------------------------------------
r114064 | mmichelson | 2008-04-11 10:44:37 -0500 (Fri, 11 Apr 2008) | 19 lines

Merged revisions 114063 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r114063 | mmichelson | 2008-04-11 10:44:28 -0500 (Fri, 11 Apr 2008) | 11 lines

Fix a race condition that may happen between a sip hangup
and a "core show channel" command. This patch adds locking
to prevent the resulting crash.

(closes issue ASTERISK-11586)
Reported by: tsearle
Patches:
     show_channels_crash2.patch uploaded by tsearle (license 373)
Tested by: tsearle


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=114064

By: Digium Subversion (svnbot) 2008-04-11 10:46:16

Repository: asterisk
Revision: 114065

_U  branches/1.6.0/
U   branches/1.6.0/main/features.c

------------------------------------------------------------------------
r114065 | mmichelson | 2008-04-11 10:46:14 -0500 (Fri, 11 Apr 2008) | 27 lines

Merged revisions 114064 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r114064 | mmichelson | 2008-04-11 10:49:35 -0500 (Fri, 11 Apr 2008) | 19 lines

Merged revisions 114063 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r114063 | mmichelson | 2008-04-11 10:44:28 -0500 (Fri, 11 Apr 2008) | 11 lines

Fix a race condition that may happen between a sip hangup
and a "core show channel" command. This patch adds locking
to prevent the resulting crash.

(closes issue ASTERISK-11586)
Reported by: tsearle
Patches:
     show_channels_crash2.patch uploaded by tsearle (license 373)
Tested by: tsearle


........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=114065

By: Digium Subversion (svnbot) 2008-04-11 16:14:07

Repository: asterisk
Revision: 114070

_U  team/murf/bug11210/
U   team/murf/bug11210/CHANGES
U   team/murf/bug11210/channels/chan_local.c
U   team/murf/bug11210/channels/chan_sip.c
U   team/murf/bug11210/contrib/scripts/astcli
U   team/murf/bug11210/include/asterisk/lock.h
U   team/murf/bug11210/main/features.c
U   team/murf/bug11210/main/manager.c
U   team/murf/bug11210/main/utils.c
U   team/murf/bug11210/res/res_config_ldap.c
U   team/murf/bug11210/res/res_phoneprov.c
U   team/murf/bug11210/utils/Makefile
U   team/murf/bug11210/utils/ael_main.c
U   team/murf/bug11210/utils/astman.c
U   team/murf/bug11210/utils/check_expr.c
U   team/murf/bug11210/utils/conf2ael.c
U   team/murf/bug11210/utils/hashtest.c
U   team/murf/bug11210/utils/hashtest2.c

------------------------------------------------------------------------
r114070 | murf | 2008-04-11 16:14:03 -0500 (Fri, 11 Apr 2008) | 80 lines

Merged revisions 114042-114069 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r114042 | tilghman | 2008-04-10 13:04:29 -0600 (Thu, 10 Apr 2008) | 7 lines
 
 The hydra grows yet another head...
 (closes issue ASTERISK-11814)
  Reported by: davevg
  Patches:
        astcli.diff2 uploaded by davevg (license 209)
  Tested by: davevg, Corydon76
................
 r114046 | mmichelson | 2008-04-10 13:58:36 -0600 (Thu, 10 Apr 2008) | 14 lines
 
 Merged revisions 114045 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
 r114045 | mmichelson | 2008-04-10 14:55:33 -0500 (Thu, 10 Apr 2008) | 6 lines
 
 Be sure that we're not about to set bridgepvt NULL prior to dereferencing it.
 
 (closes issue ASTERISK-11244)
 Reported by: fujin
 
 
 ........
................
 r114049 | file | 2008-04-10 14:28:40 -0600 (Thu, 10 Apr 2008) | 2 lines
 
 A 'b' option has been added which causes chan_local to return the actual channel that is behind it when queried. This is useful for transfer scenarios as the actual channel will be transferred, not the Local channel. If you have been using Local channels as queue members and having issues when the agent did a blind transfer this option may solve the issue.
................
 r114052 | mmichelson | 2008-04-10 16:02:32 -0600 (Thu, 10 Apr 2008) | 11 lines
 
 Merged revisions 114051 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
 r114051 | mmichelson | 2008-04-10 15:59:49 -0500 (Thu, 10 Apr 2008) | 3 lines
 
 Fix 1.4 build when LOW_MEMORY is enabled.
 
 
 ........
................
 r114061 | tilghman | 2008-04-11 08:54:22 -0600 (Fri, 11 Apr 2008) | 6 lines
 
 Errors are all greater than 0
 (closes issue ASTERISK-11831)
  Reported by: nito
  Patches:
        res_config_ldap_result_check_patch.diff uploaded by nito (license 340)
................
 r114064 | mmichelson | 2008-04-11 09:49:35 -0600 (Fri, 11 Apr 2008) | 19 lines
 
 Merged revisions 114063 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
 r114063 | mmichelson | 2008-04-11 10:44:28 -0500 (Fri, 11 Apr 2008) | 11 lines
 
 Fix a race condition that may happen between a sip hangup
 and a "core show channel" command. This patch adds locking
 to prevent the resulting crash.
 
 (closes issue ASTERISK-11586)
 Reported by: tsearle
 Patches:
       show_channels_crash2.patch uploaded by tsearle (license 373)
 Tested by: tsearle
 
 
 ........
................
 r114067 | twilson | 2008-04-11 15:04:46 -0600 (Fri, 11 Apr 2008) | 3 lines
 
 Fix the fact that global_variables 1) weren't being updated on reload (thanks for the report, Doug), and 2) weren't actually being appended to the list of profile variables because build_profile was called before the list was populated. Also needed to free the contents returned by load_file().
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=114070