[Home]

Summary:ASTERISK-08062: [patch] rtp goes into an error loop on network read error
Reporter:John Riordan (john)Labels:
Date Opened:2006-11-03 13:44:17.000-0600Date Closed:2007-07-09 21:20:43
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/RTP
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) patch-rtp-errorloop.diff
Description:Issue:
When a error occurs calling recvfrom, rtp goes into a loop (calling recvfrom and getting the same error again, and again, and again). This leads to general instability (not to mention it can fill the message log with thousands of errors a second).

Patch:
Cause the channel to hangup on an error by returning NULL instead of a null_frame.
Comments:By: Russell Bryant (russell) 2006-11-09 21:38:29.000-0600

I see the issue here.  However, I'm not sure I'm comfortable merging the patch as it is quite yet.  I'm sure there are other conditions where you would still want to return the null frame instead of NULL.  The first one I can think of would be EINTR (interrupted system call).  There may be others, though.  Have you seen this happen on a system, or is this just from code analysis?

By: John Riordan (john) 2006-11-09 22:50:14.000-0600

We are seeing it randomly once every 100,000 calls or so
and more often than not it would drag down the Asterisk instance
if we didn't hangup; it would end up on a tight loop just getting
the same read error over and over again. We've been patching Asterisk
internally with this for about a year.

I agree that there may be other error conditions where continuing may make sense. The only two error we've seen are ENOTSOCK and EBADF.

For example:

Sep 21 13:23:30 WARNING[16301] rtp.c: RTP Read error: Socket operation on non-socket
Sep 21 13:23:30 WARNING[16301] rtp.c: John thinks we should hangup now.

Oct 14 19:17:05 WARNING[30202] rtp.c: RTP Read error: Bad file descriptor
Oct 14 19:17:05 WARNING[30202] rtp.c: John thinks we should hangup now.

By: quid246 (quid246) 2006-12-14 10:11:51.000-0600

I've had this happen to me as well, running 1.2.12.1...

Dec 14 15:37:38 NOTICE[7405] chan_iax2.c: Avoiding IAX destroy deadlock
Dec 14 15:38:33 WARNING[7405] chan_iax2.c: Received mini frame before first full voice frame
Dec 14 15:40:58 WARNING[7405] chan_iax2.c: Received mini frame before first full voice frame
Dec 14 15:40:58 WARNING[19935] rtp.c: RTP Read error: Socket operation on non-socket
Dec 14 15:40:58 WARNING[19935] rtp.c: RTP Read error: Socket operation on non-socket
Dec 14 15:40:58 WARNING[19935] rtp.c: RTP Read error: Socket operation on non-socket
... and so on

* will spin-out of control and eventually crash within a few minutes.

One of my client phones returns alot of those mini-frame before voice frame errors quite often, but I don't believe this is the cause as I get those all the itme, yet this crash happens only once and awhile.

This bug rears it head when there is maybe 50 calls on a Dual-Opteron 244 server.

By: Serge Vecher (serge-v) 2006-12-14 12:37:20.000-0600

did you try john's patch?

By: Russell Bryant (russell) 2007-03-29 12:30:00

These changes have been merged into 1.2, 1.4, and trunk in revisions 59357, 59358, and 59359.  Thanks!