Summary:ASTERISK-00140: [request] VAD (voice activity detection + comfort noise) support
Reporter:lgoodman (lgoodman)Labels:
Date Opened:2003-08-21 10:13:50Date Closed:2011-06-07 14:04:51
Versions:Frequency of
Environment:Attachments:( 0) rfc3389.html
Description:Request that Asterisk supports VAD and Comfort Noise


Please refer to RFC 3389 , Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN)for description on VAD support
Comments:By: John Todd (jtodd) 2004-01-31 17:53:08.000-0600

Wow, this is a really unpopular topic, apparently.  Nothing since it was opened.  Anyone want to put some code towards this effort?  I see four of you out there listening to these updates...

By: Paul Cadach (pcadach) 2004-02-08 00:30:26.000-0600

As I looked for - it needs slightly re-working of Asterisk's code.

Related problem - stopping Music-On-Hold when zaptel isn't used and endpoint uses VAD and stops RTP transmission on silence.

By: Olle Johansson (oej) 2004-02-08 02:12:52.000-0600

Is it possible to recognize that the client uses VAD from the Asterisk side? Are there any special packets sent when VAD is turned on, that is not being sent otherwise?

By: Paul Cadach (pcadach) 2004-02-08 03:04:31.000-0600

When endpoint detects silence conditions its just stops sending VOICE RTP packets and MAY periodically send comfort-noice RTP packets. Distinguish of condition when RTP stream is paused and situation when some of RTP packets are lost is sort of heuristic task. If RTP receiver have a short buffer for incoming packets (for example, to handle jitter) so emptity of this buffer could be an indication of paused RTP stream. Also, RTP packets have a flag which indicates that RTP stream was resumed after a pause (it could be used to distinguish stream with lost packets and stream with paused traffic). And, VAD/CNG requires very nice accurate internal timing which is impossible for Asterisk without Zaptel hardware (may be it's possible with 2.6 kernels or with 2.4 with low-latency patches)...

So, full scheme of VAD/CNG (for single direction of voice) is next:
Transmitter -> VAD+noice estimation -> RTP packets over network -> CNG+paused RTP streams detection -> Receiver

Because VAD already used in some other algorithms (like echo cancellation) and codecs (for example, G.723.1A/B already have VAD and CNG blocks), so VAD/CNG support for Asterisk must be made complexly, some at Zaptel level, some at application level (Asterisk's internals).

By: Olle Johansson (oej) 2004-02-08 03:14:44.000-0600

I guess the way to do this is
1) Add error messages when receiving comfort-noice RTP packets or RTP packets that indicate restart after pause - this would help Asterisk admins to recognize the problem
2) Disconnect the outbound RTP stream from the incoming by applying a timer to the outbound RTP stream (meaning - don't rely on incoming RTP for outbound RTP timing). This would also mean that we support various RTP packet sizes inbound as well - or?
3) Add VAD for outbound RTP streams

I think that doing #1 is quite easy for the RTP-aware coder. It's number 2 that requires a huge redesign which I understand is not on the top of the priority list right now.

By: Olle Johansson (oej) 2004-02-08 03:17:11.000-0600

Also remember that RTP is used in H.323, so a change affects several channels

By: Paul Cadach (pcadach) 2004-02-08 03:46:58.000-0600

RTP is standarderized very well, so RTP for all channels (H.323, Skinny, SIP, etc.) works the same, and VAD/CNG support would apply to all channels if it will be handled at Asterisk's RTP layer (which realized in rtp.c).

Could you explain how error (warning) messages could help admins? Just tell them to disable VAD/CNG on the endpoints/gateways? This was noticed many times in this bug-tracker, so until VAD/CNG support isn't included it can be included to Asterisk's installation documentation...

VAD/CNG is applicable only to streams which goes through Asterisk. Better way (I think) is realized at Cisco CallManager (CCM) when RTP streams goes between endpoints (gateway+phone, phone+phone) directly, without CCM "intervention". Because most of VoIP protocols allows to divide stream to real voice and user indications, so re-sending RTP between endpoints by Asterisk is just over-usage of CPU time and network bandwidth.

So, VAD/CNG must be handled only for Asterisk's channels directly connected to PSTN/PBXes/Channel banks. All other types of transmission must just relay RTP/CN stream over endpoints, except for re-coding (from G.729 to G.711, for example). Recoding process must handle CNG packets for each codec and translate them correctly to destination codec.

By: Olle Johansson (oej) 2004-02-08 03:53:19.000-0600

It never hearts to explain -like the message "RFC2833 is not supported"... that Asterisk output when someone uses it. I think such an addition would help admins.

Thinking about it, we really need to support this. Can't control every user on the Internet that may call me. Should be SDP option to be negotiated really. Or is it required in the RTP specs to support VAD?

By: Paul Cadach (pcadach) 2004-02-08 04:44:30.000-0600

As I remember RTP spec declares possibility of DTX (discontiuous transmission), i.e. pausing of RTP stream due to silence. So, it must be not a problem for endpoints (and must be supported by endpoints). But for Asterisk itself DTX makes sense on MOH when no zap hardware (including ztdummy module) available and DTX used by endpoint. I think this is the single place where DTX plays bad game for Asterisk. Other places don't makes inconsistence so much to be so imporant (for example, RTP transmission stopped and user listens REAL SILENCE - which problem?)... Realizing full support for VAD/DTX/CNG is diffirent complex task, which is not so actual as other problems and features to be solved at Asterisk ASAP.

By: Olle Johansson (oej) 2004-02-08 06:05:33.000-0600

Remember that in a lot of calls, we receive SIP from someone out there and convert to something else, like a ZAP or CAPI call. If I understand right, if the SIP ua calling us support voice suppression, the return RTP stream from Asterisk to the SIP UA will be out of sync, since it lacks timers. The result is bad sound - or do I misunderstand something?

By: Paul Cadach (pcadach) 2004-02-08 12:13:37.000-0600

You wrong a little - stream will not out-of-sync because:
1) RTP endpoint (SIP, H.323, etc.) maintains RTP timestamp independedly on DTX (for example, after 200ms pause next RTP packet will have timestamp increased by 1600, i.e. 200*8, and have "mark" bit set, so timing will not screwed up);
2) Asterisk uses its own timing at RTP layer, so dropping of some packets will be replaced by "empty" packets (not tested with any hardware, just an opinion figured from sources).

I have bad sounds from Cisco-12SP+ working with G.723.1 over dial-up line (regular V34+ connection, up to 33.6 Kbps), and want to find where it happends. If I found some sort of mis-operations - I'll notify about it.

By: Brian West (bkw918) 2004-04-24 00:28:52

Ok does someone have an example of how we can do this?  If so lets get some ideas going and see if we can get this DONE!  This is the one thing that keeps asterisk out of the big boy toy box.... lets get it going boys.

By: Paul Cadach (pcadach) 2004-04-24 01:17:09

After connecting TE410P to PSTN I'll notify how DTX/VAD plays with TE410P. For VoIP<->VoIP connections (through Cisco's gateway, for example) DTX doesn't provides any problems.

By: John Todd (jtodd) 2004-04-24 10:22:02

Sorry, I can't offer any concrete suggestions, but I can offer some time on Google this morning while I wait for a meeting...


I found some interesting data that libspeex (and speex in general) supports VAD.  Perhaps looking in the speex-devel source tree could reveal their methods which could be applied to generalized RTP streams?

By: Paul Cadach (pcadach) 2004-04-24 10:52:47

VAD supported by G.723.1, G.729 and possibly other codecs (variant of GSM with VAD is available too). Both codecs have reference code at ITUs site so anyone could look for VAD algorithm and realization for each codec (ITU provides 3 documents per year for free).

Other interested story is converting CNG information between G.723.1, G.729 and RFC3389 as regular voice packets are converted.

Also, to follow regular usage of non-RFC3389 codecs (and for future realization of RFC3389 within Asterisk), Asterisk must have internal timing circuit for "picking up" packets from codec, because codec will provide CNG-filled frames when you ask it about new data without passing any input.

By: magg (magg) 2004-05-11 03:14:29

How much would it cost to have this implemented?
As far as I can see, this would reduce bandwitdh usage by some amount, and help potential problems with MOH from the * to a SIP phone?

By: stevekstevek (stevekstevek) 2004-05-11 16:01:32

Just a note:

Iaxclient supports VAD with IAX transmissions.  It probably suffers from more or less the same problems as RTP does, though, with applications which use incoming frames to trigger outbound frames.

IAX generally doesn't have a problem with this..

There is a decent VAD implementation (for the detection stuff only) inside of libspeex (1.1 branch); this is what we use in iaxclient, and also in app_conference (see iaxclient CVS).

By: Brian West (bkw918) 2004-06-17 22:03:10

Sponsor it or write it.. this bug is almost a year old we need action or a plan.  If you can help with this to get this in then we will reopen a NEW bug.


By: Olle Johansson (oej) 2013-08-20 07:29:23.170-0500

The roibos branches in my svn repo is working on a solution to this. Nine years later, but anyway.