[Home]

Summary:ASTERISK-12767: Partial writes on Manager API
Reporter:Stefan Reuter (srt)Labels:
Date Opened:2008-09-23 08:35:42Date Closed:2008-12-22 11:19:03.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/ManagerInterface
Versions:Frequency of
Occurrence
Related
Issues:
is related toASTERISK-23917 res_http_websocket: Delay in client processing large streams of data causes disconnect and stuck socket
Environment:Attachments:
Description:Sometimes writing manager events with ast_carefulwrite in manager.c only results in a partial write i.e. not all data is written to the socket. There is no check for this and no code to send the missing data. As a workaround increasing the writetimeout in manager.conf mitigates this issue.
See http://jira.reucon.org/browse/AJ-174?focusedCommentId=10522#action_10522 for details.
Comments:By: Tilghman Lesher (tilghman) 2008-10-01 18:10:48

Looking over the related issue, I have a few comments.

1) ast_carefulwrite() already does error-correction, albeit with a timeout.  Multiple levels of error detection is rather silly.

2) I agree with increasing the timeout.  It seems to be the only way to go here.  Perhaps this should be a per-peer setting, which can be increased, if a particular peer is having trouble keeping up.  Then again, if a peer is having trouble keeping up, then you should look into why (probably network issues).

By: Tilghman Lesher (tilghman) 2008-10-01 18:44:33

Actually, it's already a per-peer setting.  Simply set writetimeout in manager.conf to a higher value.

By: David Woolley (davidw) 2008-11-04 10:07:59.000-0600

I believe the real cause for this is that send_string fails to handle the EAGAIN error from fflush.  The Linux man page seems to be incomplete.  Given that glibc claims to be a POSIX superset, the opengroup.org copy of the man page ought to be more correct (the difference between their pages and the normal vendor pages is that they are careful to cover all the edge cases).

We are just trying the effect of using setbuf, to force unbuffered output (although handling fflush properly would reduce the number of system calls).  I'll update when we've done.

http://opengroup.org/onlinepubs/007908775/xsh/fflush.html

By: David Woolley (davidw) 2008-11-04 10:15:00.000-0600

Note that our case is for 1.6.0, which doesn't actually have a routine called ast_carefulwrite in manager.c, although there are still comments about it.  The one in main/utils.c, doesn't use stdio.

By: David Woolley (davidw) 2008-11-04 11:15:58.000-0600

setbuf doesn't seem to have helped.  It's still truncating at exactly the same place, (about 11K into a queue status listing)  Note that changing writetimeout didn't mitigate the problem for us, even with huge values.

By: David Woolley (davidw) 2008-11-04 11:56:22.000-0600

Another thought.  There is no EINTR case handling on the poll.  I presume this is the thread that gets woken up with signals when there are events to output.  I'll try patching one in, but the person with the test rig may have finished for the day.

By: David Woolley (davidw) 2008-11-04 12:36:03.000-0600

Excluding EINTR from the error cases didn't help either.

By: David Woolley (davidw) 2008-11-05 10:40:15.000-0600

I got confused by the reference to fdopen and put the setbuf in the wrong place; I don't think the temporary file should ever block!

Unfortunately, the right place seems to have TLS involvment, which probably means one can't do like main/utils.c, and use write, the FILE structure may not be a stdio one.

By: David Woolley (davidw) 2008-11-06 06:31:42.000-0600

Using the setbuf in the right place (just after the socket is set non-blocking), does remove the symptom.  However, I have severe reservations about whether this will work with TLS and think the right solution is to include the fflush in the poll loop (rather than just make it redundant).  (The TLS involvement precludes the use of write(2).)

Please note that our problem is not mitigated by increasing the timeout; I'm certain that it was still there with 5,000ms and I believe it was also tried with 300,000ms.

I did find that we could only reproduce it when running AMI from a remote site (remote worker testing the application against a central server).



By: Erik van Pienbroek (erik van pienbroek) 2008-12-08 09:28:17.000-0600

We've also been hit by this bug using Asterisk 1.6.0.3-rc1. We're using the AMI interface from a remote computer. If a large amount of messages need to be sent by the manager API, some messages may get lost (for example with the QueueStatus command).

We've figured out this is caused by the fact that there's no error handling performed after fflush() is called in the function send_string() in manager.c. As already mentioned by davidw, the errno = EAGAIN situation needs to be handled to solve this bug.

By: Leif Madsen (lmadsen) 2008-12-08 09:30:33.000-0600

Any additional thoughts on this Corydon?

By: Ronald Chan (loloski) 2008-12-08 09:39:44.000-0600

Yes, i can confirm this is the case. We are in the same boat as Erik van Pienbroek but this time for SVN-1.4 branch

By: Russell Bryant (russell) 2008-12-18 14:52:41.000-0600

Here is a patch to address some of the ast_carefulwrite() issues for Asterisk 1.4

http://reviewboard.digium.com/r/99/

By: Digium Subversion (svnbot) 2008-12-18 15:39:20.000-0600

Repository: asterisk
Revision: 165796

U   branches/1.4/main/utils.c

------------------------------------------------------------------------
r165796 | russell | 2008-12-18 15:39:20 -0600 (Thu, 18 Dec 2008) | 10 lines

Make ast_carefulwrite() be more careful.

This patch handles some additional cases that could result in partial writes
to the file description.  This was done to address complaints about partial
writes on AMI.

(issue ASTERISK-12767) (more changes needed to address potential problems in 1.6)
Reported by: srt
Tested by: russell

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=165796

By: Digium Subversion (svnbot) 2008-12-18 15:43:06.000-0600

Repository: asterisk
Revision: 165796

U   branches/1.4/main/utils.c

------------------------------------------------------------------------
r165796 | russell | 2008-12-18 15:39:25 -0600 (Thu, 18 Dec 2008) | 11 lines

Make ast_carefulwrite() be more careful.

This patch handles some additional cases that could result in partial writes
to the file description.  This was done to address complaints about partial
writes on AMI.

(issue ASTERISK-12767) (more changes needed to address potential problems in 1.6)
Reported by: srt
Tested by: russell
Review: http://reviewboard.digium.com/r/99/

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=165796

By: Digium Subversion (svnbot) 2008-12-18 15:44:42.000-0600

Repository: asterisk
Revision: 165801

_U  trunk/
U   trunk/main/utils.c

------------------------------------------------------------------------
r165801 | russell | 2008-12-18 15:44:42 -0600 (Thu, 18 Dec 2008) | 19 lines

Merged revisions 165796 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r165796 | russell | 2008-12-18 15:39:25 -0600 (Thu, 18 Dec 2008) | 11 lines

Make ast_carefulwrite() be more careful.

This patch handles some additional cases that could result in partial writes
to the file description.  This was done to address complaints about partial
writes on AMI.

(issue ASTERISK-12767) (more changes needed to address potential problems in 1.6)
Reported by: srt
Tested by: russell
Review: http://reviewboard.digium.com/r/99/

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=165801

By: Digium Subversion (svnbot) 2008-12-18 15:45:33.000-0600

Repository: asterisk
Revision: 165802

_U  branches/1.6.0/
U   branches/1.6.0/main/utils.c

------------------------------------------------------------------------
r165802 | russell | 2008-12-18 15:45:33 -0600 (Thu, 18 Dec 2008) | 27 lines

Merged revisions 165801 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r165801 | russell | 2008-12-18 15:44:47 -0600 (Thu, 18 Dec 2008) | 19 lines

Merged revisions 165796 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r165796 | russell | 2008-12-18 15:39:25 -0600 (Thu, 18 Dec 2008) | 11 lines

Make ast_carefulwrite() be more careful.

This patch handles some additional cases that could result in partial writes
to the file description.  This was done to address complaints about partial
writes on AMI.

(issue ASTERISK-12767) (more changes needed to address potential problems in 1.6)
Reported by: srt
Tested by: russell
Review: http://reviewboard.digium.com/r/99/

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=165802

By: Digium Subversion (svnbot) 2008-12-18 15:47:02.000-0600

Repository: asterisk
Revision: 165804

_U  branches/1.6.1/
U   branches/1.6.1/main/utils.c

------------------------------------------------------------------------
r165804 | russell | 2008-12-18 15:47:02 -0600 (Thu, 18 Dec 2008) | 27 lines

Merged revisions 165801 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r165801 | russell | 2008-12-18 15:44:47 -0600 (Thu, 18 Dec 2008) | 19 lines

Merged revisions 165796 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r165796 | russell | 2008-12-18 15:39:25 -0600 (Thu, 18 Dec 2008) | 11 lines

Make ast_carefulwrite() be more careful.

This patch handles some additional cases that could result in partial writes
to the file description.  This was done to address complaints about partial
writes on AMI.

(issue ASTERISK-12767) (more changes needed to address potential problems in 1.6)
Reported by: srt
Tested by: russell
Review: http://reviewboard.digium.com/r/99/

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=165804

By: David Woolley (davidw) 2008-12-19 05:46:33.000-0600

Note, the problems we were having relate to a re-implementation of ast_carefulwrite in manager.c, under the name send_string, which uses stdio routines, because it has to cope with TLS as well as direct socket stuff.

By: Russell Bryant (russell) 2008-12-19 08:00:48.000-0600

Thanks for the feedback, davidw.  I'm aware of that, and it is the reason I haven't closed this out yet.  I'm working on a similar patch for trunk/1.6 that will replace the current send_string() function.

By: Russell Bryant (russell) 2008-12-19 08:37:45.000-0600

http://reviewboard.digium.com/r/104/

By: Digium Subversion (svnbot) 2008-12-22 11:09:31.000-0600

Repository: asterisk
Revision: 166282

U   trunk/include/asterisk/utils.h
U   trunk/main/manager.c
U   trunk/main/utils.c

------------------------------------------------------------------------
r166282 | russell | 2008-12-22 11:09:31 -0600 (Mon, 22 Dec 2008) | 12 lines

Introduce ast_careful_fwrite() and use in AMI to prevent partial writes.

This patch introduces a function to do careful writes on a file stream which
will handle timeouts and partial writes.  It is currently used in AMI to
address the issue that has been reported.  However, there are probably a few
other places where this could be used.

(closes issue ASTERISK-12767)
Reported by: srt
Tested by: russell
http://reviewboard.digium.com/r/104/

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=166282

By: Digium Subversion (svnbot) 2008-12-22 11:14:14.000-0600

Repository: asterisk
Revision: 166283

_U  branches/1.6.0/
U   branches/1.6.0/include/asterisk/utils.h
U   branches/1.6.0/main/manager.c
U   branches/1.6.0/main/utils.c

------------------------------------------------------------------------
r166283 | russell | 2008-12-22 11:14:14 -0600 (Mon, 22 Dec 2008) | 20 lines

Merged revisions 166282 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
r166282 | russell | 2008-12-22 11:09:36 -0600 (Mon, 22 Dec 2008) | 12 lines

Introduce ast_careful_fwrite() and use in AMI to prevent partial writes.

This patch introduces a function to do careful writes on a file stream which
will handle timeouts and partial writes.  It is currently used in AMI to
address the issue that has been reported.  However, there are probably a few
other places where this could be used.

(closes issue ASTERISK-12767)
Reported by: srt
Tested by: russell
http://reviewboard.digium.com/r/104/

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=166283

By: Digium Subversion (svnbot) 2008-12-22 11:19:01.000-0600

Repository: asterisk
Revision: 166284

_U  branches/1.6.1/
U   branches/1.6.1/include/asterisk/utils.h
U   branches/1.6.1/main/manager.c
U   branches/1.6.1/main/utils.c

------------------------------------------------------------------------
r166284 | russell | 2008-12-22 11:19:01 -0600 (Mon, 22 Dec 2008) | 20 lines

Merged revisions 166282 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
r166282 | russell | 2008-12-22 11:09:36 -0600 (Mon, 22 Dec 2008) | 12 lines

Introduce ast_careful_fwrite() and use in AMI to prevent partial writes.

This patch introduces a function to do careful writes on a file stream which
will handle timeouts and partial writes.  It is currently used in AMI to
address the issue that has been reported.  However, there are probably a few
other places where this could be used.

(closes issue ASTERISK-12767)
Reported by: srt
Tested by: russell
http://reviewboard.digium.com/r/104/

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=166284