Summary: | ASTERISK-12767: Partial writes on Manager API | ||||
Reporter: | Stefan Reuter (srt) | Labels: | |||
Date Opened: | 2008-09-23 08:35:42 | Date Closed: | 2008-12-22 11:19:03.000-0600 | ||
Priority: | Minor | Regression? | No | ||
Status: | Closed/Complete | Components: | Core/ManagerInterface | ||
Versions: | Frequency of Occurrence | ||||
Related Issues: |
| ||||
Environment: | Attachments: | ||||
Description: | Sometimes writing manager events with ast_carefulwrite in manager.c only results in a partial write i.e. not all data is written to the socket. There is no check for this and no code to send the missing data. As a workaround increasing the writetimeout in manager.conf mitigates this issue. See http://jira.reucon.org/browse/AJ-174?focusedCommentId=10522#action_10522 for details. | ||||
Comments: | By: Tilghman Lesher (tilghman) 2008-10-01 18:10:48 Looking over the related issue, I have a few comments. 1) ast_carefulwrite() already does error-correction, albeit with a timeout. Multiple levels of error detection is rather silly. 2) I agree with increasing the timeout. It seems to be the only way to go here. Perhaps this should be a per-peer setting, which can be increased, if a particular peer is having trouble keeping up. Then again, if a peer is having trouble keeping up, then you should look into why (probably network issues). By: Tilghman Lesher (tilghman) 2008-10-01 18:44:33 Actually, it's already a per-peer setting. Simply set writetimeout in manager.conf to a higher value. By: David Woolley (davidw) 2008-11-04 10:07:59.000-0600 I believe the real cause for this is that send_string fails to handle the EAGAIN error from fflush. The Linux man page seems to be incomplete. Given that glibc claims to be a POSIX superset, the opengroup.org copy of the man page ought to be more correct (the difference between their pages and the normal vendor pages is that they are careful to cover all the edge cases). We are just trying the effect of using setbuf, to force unbuffered output (although handling fflush properly would reduce the number of system calls). I'll update when we've done. http://opengroup.org/onlinepubs/007908775/xsh/fflush.html By: David Woolley (davidw) 2008-11-04 10:15:00.000-0600 Note that our case is for 1.6.0, which doesn't actually have a routine called ast_carefulwrite in manager.c, although there are still comments about it. The one in main/utils.c, doesn't use stdio. By: David Woolley (davidw) 2008-11-04 11:15:58.000-0600 setbuf doesn't seem to have helped. It's still truncating at exactly the same place, (about 11K into a queue status listing) Note that changing writetimeout didn't mitigate the problem for us, even with huge values. By: David Woolley (davidw) 2008-11-04 11:56:22.000-0600 Another thought. There is no EINTR case handling on the poll. I presume this is the thread that gets woken up with signals when there are events to output. I'll try patching one in, but the person with the test rig may have finished for the day. By: David Woolley (davidw) 2008-11-04 12:36:03.000-0600 Excluding EINTR from the error cases didn't help either. By: David Woolley (davidw) 2008-11-05 10:40:15.000-0600 I got confused by the reference to fdopen and put the setbuf in the wrong place; I don't think the temporary file should ever block! Unfortunately, the right place seems to have TLS involvment, which probably means one can't do like main/utils.c, and use write, the FILE structure may not be a stdio one. By: David Woolley (davidw) 2008-11-06 06:31:42.000-0600 Using the setbuf in the right place (just after the socket is set non-blocking), does remove the symptom. However, I have severe reservations about whether this will work with TLS and think the right solution is to include the fflush in the poll loop (rather than just make it redundant). (The TLS involvement precludes the use of write(2).) Please note that our problem is not mitigated by increasing the timeout; I'm certain that it was still there with 5,000ms and I believe it was also tried with 300,000ms. I did find that we could only reproduce it when running AMI from a remote site (remote worker testing the application against a central server). By: Erik van Pienbroek (erik van pienbroek) 2008-12-08 09:28:17.000-0600 We've also been hit by this bug using Asterisk 1.6.0.3-rc1. We're using the AMI interface from a remote computer. If a large amount of messages need to be sent by the manager API, some messages may get lost (for example with the QueueStatus command). We've figured out this is caused by the fact that there's no error handling performed after fflush() is called in the function send_string() in manager.c. As already mentioned by davidw, the errno = EAGAIN situation needs to be handled to solve this bug. By: Leif Madsen (lmadsen) 2008-12-08 09:30:33.000-0600 Any additional thoughts on this Corydon? By: Ronald Chan (loloski) 2008-12-08 09:39:44.000-0600 Yes, i can confirm this is the case. We are in the same boat as Erik van Pienbroek but this time for SVN-1.4 branch By: Russell Bryant (russell) 2008-12-18 14:52:41.000-0600 Here is a patch to address some of the ast_carefulwrite() issues for Asterisk 1.4 http://reviewboard.digium.com/r/99/ By: Digium Subversion (svnbot) 2008-12-18 15:39:20.000-0600 Repository: asterisk Revision: 165796 U branches/1.4/main/utils.c ------------------------------------------------------------------------ r165796 | russell | 2008-12-18 15:39:20 -0600 (Thu, 18 Dec 2008) | 10 lines Make ast_carefulwrite() be more careful. This patch handles some additional cases that could result in partial writes to the file description. This was done to address complaints about partial writes on AMI. (issue ASTERISK-12767) (more changes needed to address potential problems in 1.6) Reported by: srt Tested by: russell ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=165796 By: Digium Subversion (svnbot) 2008-12-18 15:43:06.000-0600 Repository: asterisk Revision: 165796 U branches/1.4/main/utils.c ------------------------------------------------------------------------ r165796 | russell | 2008-12-18 15:39:25 -0600 (Thu, 18 Dec 2008) | 11 lines Make ast_carefulwrite() be more careful. This patch handles some additional cases that could result in partial writes to the file description. This was done to address complaints about partial writes on AMI. (issue ASTERISK-12767) (more changes needed to address potential problems in 1.6) Reported by: srt Tested by: russell Review: http://reviewboard.digium.com/r/99/ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=165796 By: Digium Subversion (svnbot) 2008-12-18 15:44:42.000-0600 Repository: asterisk Revision: 165801 _U trunk/ U trunk/main/utils.c ------------------------------------------------------------------------ r165801 | russell | 2008-12-18 15:44:42 -0600 (Thu, 18 Dec 2008) | 19 lines Merged revisions 165796 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r165796 | russell | 2008-12-18 15:39:25 -0600 (Thu, 18 Dec 2008) | 11 lines Make ast_carefulwrite() be more careful. This patch handles some additional cases that could result in partial writes to the file description. This was done to address complaints about partial writes on AMI. (issue ASTERISK-12767) (more changes needed to address potential problems in 1.6) Reported by: srt Tested by: russell Review: http://reviewboard.digium.com/r/99/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=165801 By: Digium Subversion (svnbot) 2008-12-18 15:45:33.000-0600 Repository: asterisk Revision: 165802 _U branches/1.6.0/ U branches/1.6.0/main/utils.c ------------------------------------------------------------------------ r165802 | russell | 2008-12-18 15:45:33 -0600 (Thu, 18 Dec 2008) | 27 lines Merged revisions 165801 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r165801 | russell | 2008-12-18 15:44:47 -0600 (Thu, 18 Dec 2008) | 19 lines Merged revisions 165796 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r165796 | russell | 2008-12-18 15:39:25 -0600 (Thu, 18 Dec 2008) | 11 lines Make ast_carefulwrite() be more careful. This patch handles some additional cases that could result in partial writes to the file description. This was done to address complaints about partial writes on AMI. (issue ASTERISK-12767) (more changes needed to address potential problems in 1.6) Reported by: srt Tested by: russell Review: http://reviewboard.digium.com/r/99/ ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=165802 By: Digium Subversion (svnbot) 2008-12-18 15:47:02.000-0600 Repository: asterisk Revision: 165804 _U branches/1.6.1/ U branches/1.6.1/main/utils.c ------------------------------------------------------------------------ r165804 | russell | 2008-12-18 15:47:02 -0600 (Thu, 18 Dec 2008) | 27 lines Merged revisions 165801 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r165801 | russell | 2008-12-18 15:44:47 -0600 (Thu, 18 Dec 2008) | 19 lines Merged revisions 165796 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r165796 | russell | 2008-12-18 15:39:25 -0600 (Thu, 18 Dec 2008) | 11 lines Make ast_carefulwrite() be more careful. This patch handles some additional cases that could result in partial writes to the file description. This was done to address complaints about partial writes on AMI. (issue ASTERISK-12767) (more changes needed to address potential problems in 1.6) Reported by: srt Tested by: russell Review: http://reviewboard.digium.com/r/99/ ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=165804 By: David Woolley (davidw) 2008-12-19 05:46:33.000-0600 Note, the problems we were having relate to a re-implementation of ast_carefulwrite in manager.c, under the name send_string, which uses stdio routines, because it has to cope with TLS as well as direct socket stuff. By: Russell Bryant (russell) 2008-12-19 08:00:48.000-0600 Thanks for the feedback, davidw. I'm aware of that, and it is the reason I haven't closed this out yet. I'm working on a similar patch for trunk/1.6 that will replace the current send_string() function. By: Russell Bryant (russell) 2008-12-19 08:37:45.000-0600 http://reviewboard.digium.com/r/104/ By: Digium Subversion (svnbot) 2008-12-22 11:09:31.000-0600 Repository: asterisk Revision: 166282 U trunk/include/asterisk/utils.h U trunk/main/manager.c U trunk/main/utils.c ------------------------------------------------------------------------ r166282 | russell | 2008-12-22 11:09:31 -0600 (Mon, 22 Dec 2008) | 12 lines Introduce ast_careful_fwrite() and use in AMI to prevent partial writes. This patch introduces a function to do careful writes on a file stream which will handle timeouts and partial writes. It is currently used in AMI to address the issue that has been reported. However, there are probably a few other places where this could be used. (closes issue ASTERISK-12767) Reported by: srt Tested by: russell http://reviewboard.digium.com/r/104/ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=166282 By: Digium Subversion (svnbot) 2008-12-22 11:14:14.000-0600 Repository: asterisk Revision: 166283 _U branches/1.6.0/ U branches/1.6.0/include/asterisk/utils.h U branches/1.6.0/main/manager.c U branches/1.6.0/main/utils.c ------------------------------------------------------------------------ r166283 | russell | 2008-12-22 11:14:14 -0600 (Mon, 22 Dec 2008) | 20 lines Merged revisions 166282 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r166282 | russell | 2008-12-22 11:09:36 -0600 (Mon, 22 Dec 2008) | 12 lines Introduce ast_careful_fwrite() and use in AMI to prevent partial writes. This patch introduces a function to do careful writes on a file stream which will handle timeouts and partial writes. It is currently used in AMI to address the issue that has been reported. However, there are probably a few other places where this could be used. (closes issue ASTERISK-12767) Reported by: srt Tested by: russell http://reviewboard.digium.com/r/104/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=166283 By: Digium Subversion (svnbot) 2008-12-22 11:19:01.000-0600 Repository: asterisk Revision: 166284 _U branches/1.6.1/ U branches/1.6.1/include/asterisk/utils.h U branches/1.6.1/main/manager.c U branches/1.6.1/main/utils.c ------------------------------------------------------------------------ r166284 | russell | 2008-12-22 11:19:01 -0600 (Mon, 22 Dec 2008) | 20 lines Merged revisions 166282 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r166282 | russell | 2008-12-22 11:09:36 -0600 (Mon, 22 Dec 2008) | 12 lines Introduce ast_careful_fwrite() and use in AMI to prevent partial writes. This patch introduces a function to do careful writes on a file stream which will handle timeouts and partial writes. It is currently used in AMI to address the issue that has been reported. However, there are probably a few other places where this could be used. (closes issue ASTERISK-12767) Reported by: srt Tested by: russell http://reviewboard.digium.com/r/104/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=166284 |