ASTERISK-09056: Manager Dropping Events Under Moderate Call Load

[Home]

Summary: ASTERISK-09056: Manager Dropping Events Under Moderate Call Load

Reporter: Douglas Garstang (dgarstang) Labels:

Date Opened: 2007-03-20 13:28:43 Date Closed: 2007-11-06 15:32:35.000-0600

Priority: Minor Regression? No

Status: Closed/Complete Components: Core/ManagerInterface

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments:

Description: Asterisk 1.4.1.

The Manager interface seems to drop events under moderate call load.

I am initiating calls to Asterisk with SIPP:
sipp tp-test5 -sn uac -m 200 -s 1xxx2423021 -d 65000 -l 200 -r 7 -rp 1

At the same time I am running an ngrep on the manager port:
ngrep -d eth1 port 5038 > foo

Seven calls per second is not an extremely high number. However, the manager interface seems to be dropping or missing sending NewChannel events.

Specifically, I am monitoring Newchannnel and Hangup events. When initiating new calls, if the number of calls per second, reaches some thresh-hold (around 7), then the manager interface does not send all the events over the socket. This is evidenced by the ouput of the ngrep command.

When calls are being disconnected, SIPP always disconnects at about 10 calls per second, which just about always results in the manager interface not sending Hangup events over the socket. In one example, I started new calls at 5 per second, and the manager interface sent all of those. However, on call teardown, it did not send 11 hangup events.

Under load the events should be delayed, not dropped!

The system is a single CPU 2.0Ghz machine.

****** ADDITIONAL INFORMATION ******

Do I need to provide any more data? Debug on 200 calls is a lot.

Comments: By: Rob M (nyt) 2007-03-25 10:37:14

I am also seeing this same behavior. After some time the manager stops outputting events from the dial plan. I still see SIP events however.

This was working fine until recently when I did an svn update to 1.4.1

I'm going to test again with SVN-branch-1.4-r56231

running reload pbx_config and reload manager restores proper output.
By: Serge Vecher (serge-v) 2007-03-26 09:31:24

assigning to bweschke as this is possibly related to his manager event-q work. Please unassign if not the case ;)
By: John Todd (jtodd) 2007-03-27 11:49:58

This is a fairly serious problem, as many sites use the AMI to keep call state. With moderate call load, this could lead to lost billing events which is kind of the lifeblood of anyone doing commercial work with Asterisk.
By: BJ Weschke (bweschke) 2007-03-27 12:17:08

I'm not really convinced that my branch work is going to fix this, but I think we should still test again after it gets merged. I agree with jtodd that it is an important issue and we should probably keep this bug open separately and not make it a dupe of anything that might get closed with the branch getting merged back in.
By: jmls (jmls) 2007-09-12 16:34:38

where are we with this ? Is it still a problem with 1.4.10+ ?
By: John Todd (jtodd) 2007-09-12 17:21:46

FWIW, the reporter is possibly no longer in an environment where this bug is important to track, so no response should not be considered as a "bug fixed" result.

Hopefully, nyt is able to confirm/deny based on a more recent version of the code.
By: Rob M (nyt) 2007-09-12 18:28:24

I cannot run 1.4.10+ due to 2 other bugs.

1: CDR logging is very broken
2: UPDATECDR functionality does not work. http://bugs.digium.com/view.php?id=9573

If these become fixed I can test for this bug.
By: Brandon Kruse (bkruse) 2007-10-30 09:27:35

Close issue?

-bk
By: John Todd (jtodd) 2007-10-30 10:20:39

Bug still exists as far as I know. This has been discussed by others (and was brought up in some question/answer sessions at Astricon) as being an impediment to useful AMI state machines use.
By: Russell Bryant (russell) 2007-11-06 15:31:57.000-0600

It is likely that the changes in the following two commits fixed this problem. However, if anyone still experiences the issue after running a version that includes these fixes (at least 1.4.12), then I'd like to know about it.

Thanks!

------------------------------------------------------------------------
r83121 | russell | 2007-09-19 10:10:14 -0500 (Wed, 19 Sep 2007) | 4 lines

Fix up another potential race condition. Do the loop decrementing use count
on events with the eventq protected from being changed.
(reported on IRC by Ivan)

------------------------------------------------------------------------
------------------------------------------------------------------------
r82867 | russell | 2007-09-18 15:56:43 -0500 (Tue, 18 Sep 2007) | 10 lines

Fix a memory leak that can occur on systems under higher load. The issue is
that when events are appended to the master event queue, they use the number
of active sessions as a use count so it will know when all active sessions
at the time the event happened have consumed it. However, the handling of
the number of sessions was not properly synchronized, so the use count was
not always correct, causing an event to disappear early, or get stuck in
the event queue for forever.

(closes issue ASTERISK-8971, reported by bweschke, patch from Ivan, modified by me)

------------------------------------------------------------------------