[Home]

Summary:ASTERISK-15994: [patch] System() taking excessive time to return with nonroot
Reporter:David Backeberg (dbackeberg)Labels:
Date Opened:2010-04-21 14:37:03Date Closed:2010-04-30 01:22:15
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Applications/app_system
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20100422__debug_17223.diff.txt
( 1) 20100423__issue17223.diff.txt
Description:I've opened a discussion on asterisk-users on April 2010 regarding problems I've seen with System() calls on asterisk-1.6.2.6. I have a dialplan that does seemingly bening things that should return lightning fast, like:

exten => s,1,Verbose(EXTENSION is: ${EXTEN})
exten => s,n,Set(FIREBREAK_GENERIC=/var/lib/asterisk/sounds/firebreak/catchall)
exten => s,n,System(test -e ${FIREBREAK_GENERIC}.*)
exten => s,n,Verbose(System call result was ${SYSTEMSTATUS})
exten => s,n,ExecIf($[${SYSTEMSTATUS} = SUCCESS]?Playback(${FIREBREAK_GENERIC}))

The System() call in this example is very straightforward. Use System() to determine if file exists, if yes, play it.

As a verification that the system call runs fine...
[root@fivr03 ~]# test -e /var/lib/asterisk/sounds/firebreak/catchall.*
[root@fivr03 ~]# echo $?
1
[root@fivr03 ~]# time test -e /var/lib/asterisk/sounds/firebreak/catchall.*

real 0m0.000s
user 0m0.001s
sys 0m0.000s

However, with stock asterisk-1.6.2.6 on two of my CentOS systems, each call to System() takes upwards of half a second to return from the call. Problem was always reproducible on these systems with this dialplan. I have a third system with asterisk-1.6.2.6 on Gentoo, and I did not have this problem on that system. I don't know what that means.

Kevin Fleming suggested on asterisk-users to modify main/app.c, ast_close_fds_above_n() function.

I have done so, and this solves my problem.

****** ADDITIONAL INFORMATION ******

[root@fivr03 src]# diff -u asterisk-1.6.2.6.original.before.kevin.fleming.suggestion/main/app.c asterisk-1.6.2.6/main/app.c
--- asterisk-1.6.2.6.original.before.kevin.fleming.suggestion/main/app.c 2009-11-30 16:36:06.000000000 -0500
+++ asterisk-1.6.2.6/main/app.c 2010-04-21 15:06:19.000000000 -0400
@@ -1988,7 +1988,9 @@
struct rlimit rl;
getrlimit(RLIMIT_NOFILE, &rl);
null = open("/dev/null", O_RDONLY);
- for (x = n + 1; x < rl.rlim_cur; x++) {
+        for (x = n + 1; x < 4096; x++) {
+
+/* for (x = n + 1; x < rl.rlim_cur; x++) { */
if (x != null) {
/* Side effect of dup2 is that it closes any existing fd without error.
* This prevents valgrind and other debugging tools from sending up
Comments:By: Tilghman Lesher (tilghman) 2010-04-21 15:01:54

Please try SVN 1.6.2.  A workaround has already been committed.

By: David Backeberg (dbackeberg) 2010-04-22 09:31:51

I gave SVN a try. I'll be specific with the results of my testing. My dialplan does four calls to System() in rapid succession.

with released asterisk-1.6.2.6, watching my call timer on my SIP phone, there is a gap of six seconds between the last message playing, and when the System() calls return so the next message can play.

with Fleming-patched, released asterisk-1.6.2.6, same test, there is less than a second between with the four System() call start and when the next message plays.

with SVN, specifically (from 'core show version') SVN-branch-1.6.2-r258436, I get the same result as released, unpatched, asterisk-1.6.2.6. That is, I get a capy of six seconds between the last message playing, and when the System() call s return so the next message can play.

By: David Backeberg (dbackeberg) 2010-04-22 09:35:35

I do see that the structure of this function has changed between released 1.6.2.6 and SVN, I'll see if there is a logical way to apply the Fleming patch with the new structure.

By: David Backeberg (dbackeberg) 2010-04-22 09:43:39

apparently if you hack at your own copy of the SVN tree, it changes the output on your 'core show version'. So testing with SVN-branch-1.6.2-r258436M, with the updated Fleming patch applied, I'm again getting the successful, desirable result that System() calls complete lightning fast.

I inserted the same patch to the SVN version of main/app.c, this time it was at line 2021.

By: David Backeberg (dbackeberg) 2010-04-22 09:46:18

So just for my own edification, it looks like rl.rlim_cur is possibly coming from the max files open, set for the system?

Maybe that's why I'm seeing this problem before other people?

I've set my max files open to 1,000,000, or one million. I'll try unsetting it, leaving it at factory CentOS default, and test what happens.

By: David Backeberg (dbackeberg) 2010-04-22 09:54:08

Wow, that was dramatic.

So I had two files limits set.

There's a setting in /etc/asterisk/asterisk.conf,
called 'maxfiles'. Commenting out my change, and leaving it at default had zero effect.

I've also been forcing a high ulimit, with ulimit. I have an entry in my /etc/init.d/asterisk that forces a high ulimit. It runs asterisk as a non-root user, and does for that user,

echo -n "Setting ulimits:                                              "
ulimit -n 1000000 -c unlimited && echo "OK" || echo "FAILED"

When I comment out that entry, and just use the factory default, I solve my problem, that is, the System() calls again go lightning fast, even with released 1.6.2.6

So that begs the question. Should my changes have the effect they are having?

By: Tilghman Lesher (tilghman) 2010-04-22 09:55:36

What the SVN version does is to use the /proc filesystem, assuming you have one, which you should, to check how many actual file descriptors you have open and close each one.

By: David Backeberg (dbackeberg) 2010-04-22 10:30:32

>What the SVN version does is to use the /proc filesystem, assuming you have >one, which you should, to check how many actual file descriptors you have open >and close each one.

However, I'm reporting that even with SVN, I still have the slow System() calls unless I patch main/app.c, ast_close_fds_above_n() for loop to force a fraudulent open files value.

With released 1.6.2.6, setting ulimit lower fixes the slow System() calls, without having to patch.

With SVN-1.6.2, setting ulimit lower has no effect, and System() calls remain slow unless I patch.

Specifically, I'm saying that current SVN does not fix my problem, and that applying the patch DOES fix BOTH 1.6.2.6 and SVN 1.6.2.

By: David Backeberg (dbackeberg) 2010-04-22 11:30:36

I've been having a very frustrating testing experience, here, and I'm getting results that seemed to contradict my earlier testing. I apologize for that.

My latter testing has shown that the /etc/asterisk/asterisk.conf, maxfiles value does seems to make a difference. I'm going to have to make a table about this:

release-1.6.2.6, asterisk.conf maxfiles=1 million, ulimit set to 25000, SLOW
release-1.6.2.6, asterisk.conf maxfiles=1 million, ulimit set to 1million, SLOW
release-1.6.2.6, asterisk.conf maxfiles=default, ulimit set to 25000, FAST
release-1.6.2.6, asterisk.conf maxfiles=default, ulimit set to 1million, SLOW

patched-1.6.2.6, asterisk.conf maxfiles=1million, ulimit set to 25000, FAST
patched-1.6.2.6, asterisk.conf maxfiles=1million, ulimit set to 1million, FAST
patched-1.6.2.6, asterisk.conf maxfiles=default, ulimit set to 25000, FAST
patched-1.6.2.6, asterisk.conf maxfiles=default, ulimit set to 1million, FAST

SVN-1.6.2, asterisk.conf maxfiles=1million, ulimit set to 25000, SLOW
SVN-1.6.2, asterisk.conf maxfiles=1million, ulimit set to 1million,SLOW
SVN-1.6.2, asterisk.conf maxfiles=default, ulimit set to 25000,FAST
SVN-1.6.2, asterisk.conf maxfiles=default, ulimit set to 1million,SLOW

patched-SVN-1.6.2, asterisk.conf maxfiles=1million, ulimit set to 25000,FAST
patched-SVN-1.6.2, asterisk.conf maxfiles=1million, ulimit set to 1million,FAST
patched-SVN-1.6.2, asterisk.conf maxfiles=1million, ulimit set to 25000,FAST
patched-SVN-1.6.2, asterisk.conf maxfiles=1million, ulimit set to 1million,FAST

By: David Backeberg (dbackeberg) 2010-04-22 11:36:51

So my conclusion of the SVN escapade was that while the structure of the function in question is changed, my testing has shown that it still suffers from the same problems as release 1.6.2.6. The Fleming patch fixes both SVN and 1.6.2.6.

[root@fivr03 asterisk-1.6.2.6-svn-branch]# diff -u 1.6.2.modified.by.dave/main/app.c 1.6.2/main/app.c
--- 1.6.2.modified.by.dave/main/app.c 2010-04-22 10:38:25.000000000 -0400
+++ 1.6.2/main/app.c 2010-04-22 11:13:13.000000000 -0400
@@ -2018,8 +2018,7 @@
} else {
getrlimit(RLIMIT_NOFILE, &rl);
null = open("/dev/null", O_RDONLY);
- for (x = n + 1; x < 4096; x++) {
-/* for (x = n + 1; x < rl.rlim_cur; x++) { */
+ for (x = n + 1; x < rl.rlim_cur; x++) {
if (x != null) {
/* Side effect of dup2 is that it closes any existing fd without error.
* This prevents valgrind and other debugging tools from sending up

By: Tilghman Lesher (tilghman) 2010-04-22 13:32:46

It's very odd that that code is executing at all.  Do you have any output for the shell command:

$ ls -l /proc/$$/fd

The debug patch uploaded confirms whether, in fact, you're able to open this path at all.



By: David Backeberg (dbackeberg) 2010-04-22 16:35:29

I applied patch, and I'm not spotting where I'm supposed to be seeing the new debugging output. But I did make an interesting, probably relevant discovery....

If I startup asterisk from my daemon, I have the slow behavior, using the tilghman patch, and my asterisk.conf messages set to one million.

But if I start it up with

asterisk -vvvvvvvvvvvvvvvvvvvvvvvvvc

I get lightning fast behavior.

I tried as both root, and as asterisk user, and in both situations, it's slow running as daemon, and fast running as user process.

What on earth does that mean?

By: David Backeberg (dbackeberg) 2010-04-22 16:46:34

[root@fivr03 asterisk-1.6.2.6]# ls -l /proc/$$/fd
total 0
lrwx------ 1 root root 64 Apr 22 17:33 0 -> /dev/pts/0
lrwx------ 1 root root 64 Apr 22 17:45 1 -> /dev/pts/0
lrwx------ 1 root root 64 Apr 22 17:45 2 -> /dev/pts/0
lrwx------ 1 root root 64 Apr 22 17:45 255 -> /dev/pts/0
[root@fivr03 asterisk-1.6.2.6]# su - asterisk
[asterisk@fivr03 ~]$ ls -l /proc/$$/fd
total 0
lrwx------ 1 asterisk asterisk 64 Apr 22 17:45 0 -> /dev/pts/0
lrwx------ 1 asterisk asterisk 64 Apr 22 17:45 1 -> /dev/pts/0
lrwx------ 1 asterisk asterisk 64 Apr 22 17:45 2 -> /dev/pts/0
lrwx------ 1 asterisk asterisk 64 Apr 22 17:45 255 -> /dev/pts/0
[asterisk@fivr03 ~]$ ls -l /dev/pts/0
crw--w---- 1 root tty 136, 0 Apr 22 17:46 /dev/pts/0

By: Tilghman Lesher (tilghman) 2010-04-22 17:03:53

Aha.  What it sounds like is that your daemon is running Asterisk in a chroot'ed environment, and that environment does not have the /proc filesystem mounted within it.  Add the /proc filesystem to your chroot'ed environment, and it will once again be lightning fast.

By: David Backeberg (dbackeberg) 2010-04-23 09:53:48

Not chrooted. Not at all.

The difference was flags sent to asterisk.

with SVN-1.6.2

asterisk -vvvvvvvvvvvvvvvvvvvvc
works fast
asterisk -U root -G root
works fast
asterisk -U asterisk -G asterisk
works slow

So the slowness seems intrinsic to something that cannot write or read correctly as non-root.

By: Tilghman Lesher (tilghman) 2010-04-23 10:04:26

That seems odd, because I am likewise running as non-root, and it's still lightning fast.  Could you su to the asterisk user and try the ls command above again?

By: David Backeberg (dbackeberg) 2010-04-23 10:09:50

One more fun discovery...

su - asterisk
[asterisk@fivr03 ~]$ /usr/sbin/asterisk -vvvvvvvvvvvvvvvvvvvvvvvvvvc
works fast
[asterisk@fivr03 ~]$ /usr/sbin/asterisk
no arguments, and it will fork
works fast
[asterisk@fivr03 ~]$ /usr/sbin/asterisk -U asterisk -G asterisk
believe it or not, also works fast

it's trying to invoke asterisk with the -U asterisk -G asterisk flags, when I'm root, that's making it slow.

Isn't that weird?

By: David Backeberg (dbackeberg) 2010-04-23 10:15:14

Further information....

As root,

sudo -u asterisk /usr/sbin/asterisk -U asterisk -G asterisk
works fast
asterisk /usr/sbin/asterisk -U asterisk -G asterisk
works slow

Isn't that amazing?

Seems something intrinsic about the way asterisk is forking and changing users?

By: Tilghman Lesher (tilghman) 2010-04-23 10:57:51

Well, the -U and -G flags have no effect if you aren't root.  There must be something specific about CentOS, because this works fine on Ubuntu.  I've uploaded one more patch, which changes the maximum file descriptor to 65535, which should be more than sufficient.  After all, what takes up the most file descriptors in Asterisk are the sockets, and you have an intrinsic limit on the number of RTP sockets, because there are only 65535 possible port numbers.

By: David Backeberg (dbackeberg) 2010-04-23 11:03:56

So a really obvious workaround exists then, no patches to asterisk needed.

I've revised my init script to do a
sudo -u asterisk /usr/sbin/asterisk
(as you've mentioned the -U and -G flags have no effect as non-root)
rather than
/usr/sbin/asterisk -U asterisk -G asterisk

Thanks for the suggestion.

Do you think we should consider attacking this from the code?

My reason for a ridiculous number of open files was that this machine plays a whole lot of sound files, many of them very short. I agree that 65k is probably a reasonable maximum, and that I'll never come near 1 million.

By: David Backeberg (dbackeberg) 2010-04-23 11:26:35

I DO also need to point out that when I say this workaround works,
I'm talking about SVN-1.6.2

stock release 1.6.2.6 is definitely still broken for me, and I have to either apply my patch, or take off higher max open files, or go to SVN.

By: David Backeberg (dbackeberg) 2010-04-23 11:33:58

I'm not spotting the latest 65535 patch. Did you upload it? I only see the original patch to add debugging.

By: Tilghman Lesher (tilghman) 2010-04-23 15:47:17

Oops.  There we go.

By: David Backeberg (dbackeberg) 2010-04-26 16:02:38

Not surprisingly, the second tilghman patch works to fix my problem. I would suggest we adopt this in trunk. I'm no doubt in the minority of people who:

* set a ridiculous value for open files
* run asterisk as a daemon using -U -G flags, invoked as root

Nonetheless, I still feel like we should address this and put it in. It was certainly a behavior change, in that I had been okay with ridiculous values and daemon mode until I tried 1.6.2.6 (although this restructuring apparently happened earlier).

By: Digium Subversion (svnbot) 2010-04-30 01:19:36

Repository: asterisk
Revision: 260292

U   trunk/main/strcompat.c

------------------------------------------------------------------------
r260292 | tilghman | 2010-04-30 01:19:36 -0500 (Fri, 30 Apr 2010) | 13 lines

Don't allow file descriptors to go above 64k, when we're closing them in a fork(2).

This saves time, when, even though the system allows the process limit to be
that high, the practical limit is much lower.  Also introduce an additional
optimization, in the form of using the CLOEXEC flag to close descriptors at
the right time.

(closes issue ASTERISK-15994)
Reported by: dbackeberg
Patches:
      20100423__issue17223.diff.txt uploaded by tilghman (license 14)
Tested by: dbackeberg

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=260292

By: Digium Subversion (svnbot) 2010-04-30 01:22:15

Repository: asterisk
Revision: 260303

_U  branches/1.6.2/
U   branches/1.6.2/main/app.c

------------------------------------------------------------------------
r260303 | tilghman | 2010-04-30 01:22:14 -0500 (Fri, 30 Apr 2010) | 18 lines

Merged revisions 260292 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r260292 | tilghman | 2010-04-30 01:19:35 -0500 (Fri, 30 Apr 2010) | 13 lines
 
 Don't allow file descriptors to go above 64k, when we're closing them in a fork(2).
 
 This saves time, when, even though the system allows the process limit to be
 that high, the practical limit is much lower.
 
 (closes issue ASTERISK-15994)
  Reported by: dbackeberg
  Patches:
        20100423__issue17223.diff.txt uploaded by tilghman (license 14)
  Tested by: dbackeberg
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=260303