Summary:ASTERISK-12126: Trunk version of chan_sip significantly slower than
Reporter:Chris Tooley (ctooley)Labels:
Date Opened:2008-06-02 10:09:16Date Closed:2011-06-07 14:03:16
Versions:Frequency of
Environment:Attachments:( 0) profile.1.4.higherload.nocdr.txt
( 1) profile.1.4.lowload.nocdr.txt
( 2) profile.1.4compare.txt
( 3) profile.higherload.txt
( 4) profile.higherload2.txt
( 5) profile.lowload.txt
( 6) profile.lowload2.txt
( 7) profiling.report.txt
Description:The version of chan_sip in trunk seems to perform actions quite a bit slower than, I have provided some information to Murf.
Comments:By: Russell Bryant (russell) 2008-06-02 11:14:03

If the issue is going to be on the tracker, then the information should be posted here.

By: Tilghman Lesher (tilghman) 2008-06-19 18:00:07

Suspended, due to lack of response.  If you are able to provide the requested information, please reopen this issue.

By: Steve Murphy (murf) 2008-06-23 11:47:03

uh, sorry, I got his data, but it didn't have any useful information,
so I didn't post it to the bug. Why waste the space and waste time?

I'm reopening this, and I'll assign it to myself.

By: Steve Murphy (murf) 2008-06-23 11:50:02

Russell says via irc that 1.6.0 cannot (or was it 'should not'?) be released until this issue is resolved. That would seem to give it fair amount of status.

By: Steve Murphy (murf) 2008-06-23 11:57:17

Oh, and to help debug this, I invested some personal time and created the branch http://svn.digium.com/svn/asterisk/team/murf/mtxprof, as an experiment to see if mutex locking is contributing to this problem, and specifically, which locks. There is a problem with this in that ctooley can't get it working...

By: Olle Johansson (oej) 2008-06-26 11:12:45

This bug seems very important but lacks a lot of information so that other people can understand it. Please add information on how trunk was tested and how you realized it was being slower, thank you.

By: pj (pj) 2008-06-26 11:32:12

I'm not original reporter of this bugreport, but my issue with last trunk is with excessive cpu utilization, that is actually caused by res_timing_pthread.so, consequence of this is overall performance degradation, when this module is loaded. I think it should not be loaded "by default" due this, but only in special cases, like iax trunking.

from ast-dev discusion:

----- "Pavel Jezek" <pavel.jezek@i.cz> wrote:

> > I'm using asterisk trunk on pc with 300MHz mobile pentium cpu and one
> > simple sip2sip call (without transcoding) causes cpu load about 20%!
> > seems, that new timing interface eats much cpu cycles,
> > As workaround, I put noload => res_timing_pthread.so into modules.conf
> > and now, I have utilization about 2-3%, that is like (maybe slightly
> > more) previous trunk revisions.
> > issue is probably related to suspended bugreport ASTERISK-1269772
> > Is res_timing_pthread.so actually needed for pure ip asterisk (without
> > pstn cards installed)? As I read, Meetme still need ztdummy, so what
> > res_timing is actually doing?

Timing in Asterisk is used in Asterisk for IAX2 trunking, generator timing, and file playback timing.  Without it, IAX2 trunking will not work at all.  Things like file playback, and anything that uses generators (music on hold, for example), will also not work in some cases.  They will rely on incoming audio for timing.

20% is more than I would have expected, but I'm not _too_ surprised.  The module is a bit of a hack.  There is a loop running every 5 ms.  I'm sure someone will come up with a much better userspace timing module.   :)

-- Russell Bryant Senior Software Engineer Open Source Team Lead Digium, Inc.

By: Steve Murphy (murf) 2008-07-17 17:57:41

In response to oej's comment, I have uploaded 5 text files, the first four are tables of profiling data, taken from runs of asterisk from the team/murf/mtxprof branch, which is based on trunk.

profile.lowload.txt, profile data after two non-overlapping sip phone to sip-phone calls, with cdr backends active, and the pthread timing not loaded.

profile.higherload.txt, profile data after 360 sipp generated calls at 10-90 cps, .1 sec duration, with cdr backends active, and the pthread timing not loaded.

profile.lowload2.txt, profile data after two non-overlapping sip phone to sip-phone calls, with NO cdr backends active, and the pthread timing not loaded.

profile.higherload2.txt, profile data after 1100+ sipp generated calls at 10-100 cps, .1 sec duration, with NO cdr backends active, and the pthread timing not loaded.

and lastly, a sort of summation in the file profiling.report.txt.

As to how we realized that it was slower: ctooley noticed that the call volume capability in trunk is substantially lower than in 1.4. For rigor, I separate volume performance into two opposing groups: ultra-short calls at high call/sec rates to test the call setup/teardown capabilities of the driver, and infinite-length calls in parallel to test the data-shovelling capabilities of
the driver. I noticed in the time between I had the astobj2 fixes made chan_sip operational, and the time I merged those changes in trunk, that performance dropped from 4.5x the original speed of trunk to .5x for max cps.

ctooley estimates the performance drop to be 1/10th of 1.4.

The goal is find the reason(s) for this drop and remove it.

By: Steve Murphy (murf) 2008-07-17 18:06:23

PS. I suspect lock contention may be a good chunk of this slowdown; we removed the lock in the INVITE handler, and performance doubled (a crippled doubling).
Hence my suspicion. But there's still around 4-5x to recover!

As I have noted previously, in previous runs of asterisk, (eg 1.4), chan_sip would fall apart when cpu cycle utilization reached 100%. Now, in trunk, chan_sip falls apart long before then.

REWARD: to the first person to find the reason(s) for the slowdown, I will reward them with 2 hostess cupcakes, ding-dongs, or chocodiles! (The stakes are high!).

By: Olle Johansson (oej) 2008-07-18 05:03:10

Thanks for all the information murf!

By: Steve Murphy (murf) 2008-07-24 13:49:36

OK, I backported the stuff in mtxprof to 1.4 and then ran asterisk thru the same tests as I did with trunk. I found a few interesting things, but nothing jumps at me. See the report, profile.1.4.compare.txt (attached to this bug report) for the specifics and the general conclusions.

I also attached these files:
profile.1.4.lowload.nocdr.txt   -- the raw data from two, non-overlapping
                                  calls made from sip phone to sip phone
                                  via asterisk.
profile.1.4.higherload.nocdr.txt -- raw data (o/p of 'core show profile')
                                  for 800+ calls made at a rate of 10-110
                                  calls/sec, .1 sec duration.

The software is available via the mtxprof4 branch (team/murf/mtxprof4)

By: Steve Murphy (murf) 2008-07-25 13:57:34

I've done sufficient performance testing that I'm satisfied that the changes merged to trunk and 1.6.0 (and 1.4) for bug 12771 today, have removed the main slowdown between 1.4 and trunk.

I am still not fully satisfied with the performance of chan_sip in trunk, but it is indeterminate how many man hours/days/weeks/months/years it will take to find and remove the bottlenecks. Let it suffice to say that, because of chan_sip's conversion to using astobj2 to index users, peers, and dialogs, it *should* run a lot faster than 1.4. But it's just on parity now.

The slowdown: setting the AST_DEVICE_UNKNOWN during sipp testing would trigger the device_state code to run the ast_parse_device_state() routine, which would in turn call the ast_get_channel_by_name_prefix_locked() routine, which is a front end for find_channel_locked. This is a real bad time to call this function, as the channel is locked, and it eats up huge amounts of CPU time trying to get the lock over and over again. THis little storm of activity at precisely these moments had a huge impact on chan_sip's performance. It slowed everything down, including the ability to spawn a new thread!
None of this ended up showing up in the profiling data, as the storms of activity were apparently fairly short, and didn't affect the overall cpu usage percentages much. I noticed on comparison with 1.4 activity, that the find_channel_locked routine was getting called like 7000 times in trunk, and only 3 times in 1.4; the first test reverted the change made in v.98954; which gave a significant speedup to trunk. It was judged that the 12771 changes should solve the problem in a better way, but perf testing showed it's a few percent less speedy than the gethostid code.

I suspect that further performance fixes to trunk for chan_sip will yield good speedups, but that is the subject of another bug report.