[Home]

Summary:ASTERISK-11731: high SIP call volume locks Asterisk 1.4.19rc3
Reporter:Matt Florell (mflorell)Labels:
Date Opened:2008-03-26 11:23:18Date Closed:2011-06-07 14:01:01
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:I am able to reliably lock up Asterisk and send the loadavg on the server to 100.00 and higher within 5 minutes of starting performance testing with 300+ channels(all SIP). With IAX, even at higher call volumes, it does not crash at all.

Here is a link to the 20,000+ line GDB output:
http://www.eflo.net/files/Asterisk_1.4.19rc3_crash_gdb_2008-03-26.txt

I tried doing a "core show threads" but Asterisk was not responding.

Comments:By: Clod Patry (junky) 2008-03-26 12:30:30

Could you recompile * with DONT_OPTIMIZE compiler flag?
Since we're seeing a lot of <value optimized out>, which is not really revelent.


Also, include that:
frame 1
p *rtp
p *rtp->rtcp

By: Matt Florell (mflorell) 2008-03-26 12:46:33

Sorry I posted an older one from today, here is the non-optimized GDB:
http://www.eflo.net/files/Asterisk_1.4.19rc3_crash_gdb_2008-03-26_NO-OPTIMIZE.txt


not sure what you mean by the "frame 1" stuff.

By: Clod Patry (junky) 2008-03-26 12:48:53

In gdb type this:
frame 1 <enter>
you will enter in the frame 1:
then
p *rtp
p *rtp->rtcp

By: Matt Florell (mflorell) 2008-03-26 12:54:06

All I get when I try "frame 1" is the following:

(gdb) frame 1
#1  0xb7e43e2d in _L_lock_14621 () from /lib/libc.so.6
(gdb) rtcp
Undefined command: "rtcp".  Try "help".
(gdb)

I am sorry that I am not much of an expert with debugging using gdb.

By: Clod Patry (junky) 2008-03-26 12:56:20

You need to type the p too before
like:
(gdb) p *rtp
(gdb) p *rtp->rtcp

By: Matt Florell (mflorell) 2008-03-26 13:04:50

I'm sure I'm missing something else, here's the output:

(gdb) frame 1
#1  0xb7e43e2d in _L_lock_14621 () from /lib/libc.so.6
(gdb) p *rtp
No symbol "rtp" in current context.
(gdb) p *rtp->rtcp
No symbol "rtp" in current context.
(gdb)
No symbol "rtp" in current context.
(gdb)

By: Clod Patry (junky) 2008-03-26 13:12:59

yes, cause the new gdb file has no ast_rtp_stop() in the backtrace. :)

By: Mark Michelson (mmichelson) 2008-03-26 15:46:41

mflorell:

The second backtrace you linked to has a bunch of threads waiting on locks, and so I'd be interested in seeing the output of "core show locks" once the lockup has happened. I see from your initial report, though, that this may not be possible since you said that Asterisk was not responsive when you entered commands.

As another suggestion, if you can reliably reproduce this, could you not load any modules you aren't actively using for the test? There are a lot of IAX2 threads in this backtrace, and from what I understand, you aren't actually using IAX2 for this test. If you could trim down the number of modules being used when you cause the lockup, you may be able to get a much less massive backtrace. Of course, if you are using IAX2 channels in your test, then that suggestion won't work well.

The problem right now is that with 284 threads in the backtrace, it's incredibly difficult to tell where the problems are occurring. Either "core show locks" or a smaller backtrace would make things much more manageable if possible.

By: Matt Florell (mflorell) 2008-03-26 17:51:46

I'm going to try to find the breaking point and see what call level causes the crash. Things are already a bit different since I cannot reach the same call volumes with a non-optimized build. After I get to that point I will try to remove unused modules for easier debugging.

By: Matt Florell (mflorell) 2008-03-26 18:51:22

After several trials I think I've figured it out and it's not really a bug. I had the ulimit -n set to 2048(open files) and this seemed to be enough for the IAX channel performance tests at the level of 300-350 channels, but not enough for the SIP channels at the same channel volume.

Also, it wasn't just the channel volume, it was also the rate at which the calls were being launched. Using SIP and going up to the 300-350 channels much more slowly and the crash did not take place. It seems that IAX is either faster at open-file cleanup, or it uses less open-files as compared to SIP.

In the end, raising ulimit -n to 10000 caused the problems to go away completely, which I confirmed with 5 separate tests.

You can close this issue, thanks for your time looking at it.

By: Olle Johansson (oej) 2008-03-27 03:09:34

I hope that you got decent error messages. We do complain when we can't open more file handles.

SIP channels die more slowly, since we have to have them hanging around for a while to handle re-transmits. That's not needed for IAX2.