[Home]

Summary:ASTERISK-15196: [patch] ExternalIVR confuses AGI by double closing FDs
Reporter:dillec (dillec)Labels:
Date Opened:2009-11-22 10:38:24.000-0600Date Closed:2009-12-03 09:22:28.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_externalivr
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) app_externalivr.c_proper-fd-closing_v2.patch.gz
( 1) app_externalivr.c_proper-fd-closing_v3.patch
( 2) trace_pipe_syscalls.stp.gz
Description:By putting a asterisk system on higher load (15 Calls per Second) as normal curios effects are arrising when using AGI and ExternalIVR together.

For getting into the issue I've added some extensive output to the code which mainly concentrates on when and which fd is opened, closed and used. Further our AGI/ExternalIVR scripts use timeouts while reading from STDIN. Their log messages are the reason why I've started investigating on it.

1.) AGI suddenly can't write to the FD of the pipe which is connected to the agi scripts STDIN (in parantheses after ast_agi_send you can see the fd which is used for writing)

[2009-11-22 16:02:48.004] WARNING[18107] res_agi.c: 003_Agi opened pipes: toast 0=38 1=106 | fromast 0=199 1=363 | errno=0
[2009-11-22 16:02:48.007] WARNING[18107] res_agi.c: ast_agi_send(363) >> agi_request: init-call.php
[2009-11-22 16:02:48.007] ERROR[18107] utils.c: write() returned error: Bad file descriptor
[2009-11-22 16:02:48.007] WARNING[18107] res_agi.c: ast_agi_send(363) >> agi_channel: SIP/sipp-ae747388
[2009-11-22 16:02:48.007] ERROR[18107] utils.c: write() returned error: Bad file descriptor
[2009-11-22 16:02:48.007] WARNING[18107] res_agi.c: ast_agi_send(363) >> agi_language: de
[2009-11-22 16:02:48.007] ERROR[18107] utils.c: write() returned error: Bad file descriptor


2.) ExternalIVR script reads content from another AGI process (log messages of our externalivr script):

17:02:48 22.11.2009 | uid=1258905766.2546 chan=SIP/sipp-b00f88a8 | ExternalIVR: Command: S,/home/voice/cgm53stage2/6000/1/Ansage_RP1
17:02:48 22.11.2009 | uid=1258905766.2546 chan=SIP/sipp-b00f88a8 | ExternalIVR: 'T,1258905768'
17:02:50 22.11.2009 | uid=1258905766.2546 chan=SIP/sipp-b00f88a8 | ExternalIVR: '200 result=1'
17:02:50 22.11.2009 | uid=1258905766.2546 chan=SIP/sipp-b00f88a8 | ExternalIVR: DTMF Digit: 200 result=1
17:02:50 22.11.2009 | uid=1258905766.2546 chan=SIP/sipp-b00f88a8 | ExternalIVR: '200 result=1'
17:02:50 22.11.2009 | uid=1258905766.2546 chan=SIP/sipp-b00f88a8 | ExternalIVR: DTMF Digit: 200 result=1


3.) ExternalIVR can't write/read from the FDs of the pipes which are connected to the externalivr script:

[2009-11-22 16:02:53.520] WARNING[18047] app_externalivr.c: ExternalIVR: sending command 'G,1258905773,sys_context=6000_1' chan_state=6
[2009-11-22 16:02:53.520] WARNING[18047] app_externalivr.c: ExternalIVR: fprintf on handle 0xae7a8738 failed with error Bad file descriptor (9), chan state=6

First it came to my mind, that the kernel may has a synchronization bug while handing out FDs for the pipes:

AGI opened FDs:
[2009-11-22 16:02:48.004] WARNING[18107] res_agi.c: 003_Agi opened pipes: toast 0=38 1=106 | fromast 0=199 1=363 | errno=0

ExternalIVR opened FDs
[2009-11-22 16:02:48.026] WARNING[18047] app_externalivr.c: Pipes created child_stdin 0=199 1=363 | child_stdout 0=484 1=489 | child_stderr 0=491 1=49

I was able to negative this thoguht by using a systemtap script (see attachements) which shows that everytime a pipe is created it is always based on formerly closed fds - so kernel was right:

Sun Nov 22 16:02:48 2009 (ms=4) process asterisk[30094][18107]: pipe opened: 0=[199, 363]
Sun Nov 22 16:02:48 2009 (ms=4) process asterisk[30094][17731]: fd closed: 363
Sun Nov 22 16:02:48 2009 (ms=11) process asterisk[30094][17731]: pipe opened: 0=[363, 478]

But, the log also showed that while one thread of a process gets the FD by a pipe, another thread (17731) of the process is closing it:

Sun Nov 22 16:02:48 2009 (ms=14) process asterisk[30094][17731]: fd closed: 363
Sun Nov 22 16:02:48 2009 (ms=26) process asterisk[30094][18047]: pipe opened: 0=[199, 363]

And thats it - thread 17731 was running ExternalIVR, which closes the FDs first in eivr_comm() by fclose (line 785) and a second time in app_exec() (line=495) by close().

The patch in the attachments passes the FDs by reference and assigns them 0 after fclose has finished. With that close() in app_exec() is not called.
Comments:By: David Ruggles (thedavidfactor) 2009-11-22 13:53:22.000-0600

Thank you for the detailed explanation, testing and patch, however when I view the patch it appears to be the whole app_externalivr.c program. Please upload a new unified patch that is limited to the specific change(s) you're proposing and I will try to get it in to testing and review as soon as possible.

By: dillec (dillec) 2009-11-22 14:48:22.000-0600

ah - sorry, messed the diff command with relative pathnames and a wrong working dir. Please find the correct patch attached.

By: David Ruggles (thedavidfactor) 2009-11-23 08:07:55.000-0600

thank you for the corrected patch.

By: David Ruggles (thedavidfactor) 2009-11-23 10:13:33.000-0600

I see this problem and this patch seems to be completely valid for the parent/child instance of EIVR, however it exposed some issues with the client/server instance of EIVR. Please attach a patch without the modifications @ -441,7 and I will recommend it for testing and review. Also, if possible please don't gzip it as that makes it impossible to view without downloading.

By: dillec (dillec) 2009-11-23 13:00:00.000-0600

ok, I was not sure about that and not easily able to test it. Since the function eivr_comm now expects references instead of integers and uses them with the dereferncing op * it is not possible to simply delete the modifications.

I think the main issue on the client/server side is that ser->fd is invalidated in that case. The recent patch now copies ser->fd into the child_* array and then passes them to the function as a reference like the parent/child part does.

By: David Ruggles (thedavidfactor) 2009-11-23 13:07:36.000-0600

Well, the rest of the client server code is broken so it would never get to that point anyway. I am working on patching that part of it and will address this issue there. However this bug should be regressed and the decision was made to only fix the client/server implementation in trunk. So therefore, I think that adding more broken code to already broken code is less of a problem then muddying the already muddy waters of the client/server issues. :-)

By: David Ruggles (thedavidfactor) 2009-11-30 11:33:28.000-0600

Should have something soon

By: David Ruggles (thedavidfactor) 2009-11-30 12:02:17.000-0600

This is pending review on the reviewboard https://reviewboard.asterisk.org/r/436

By: Digium Subversion (svnbot) 2009-12-02 16:24:58.000-0600

Repository: asterisk
Revision: 232587

U   trunk/apps/app_externalivr.c

------------------------------------------------------------------------
r232587 | diruggles | 2009-12-02 16:24:58 -0600 (Wed, 02 Dec 2009) | 12 lines

Prevent double closing of FDs by EIVR

This caused a problem when asterisk was under heavy load and running both AGI and EIVR applications.
EIVR would close an FD at which point it would be considered freed and be used by a new AGI instance
the second close would then close the FD now in use by AGI.

(closes issue ASTERISK-15196)
Reported by: diLLec
Tested by: thedavidfactor, diLLec

Review: https://reviewboard.asterisk.org/r/436/

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=232587

By: Digium Subversion (svnbot) 2009-12-03 08:55:37.000-0600

Repository: asterisk
Revision: 232811

U   branches/1.6.0/apps/app_externalivr.c

------------------------------------------------------------------------
r232811 | diruggles | 2009-12-03 08:55:36 -0600 (Thu, 03 Dec 2009) | 19 lines

Merged revisions 232587 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r232587 | diruggles | 2009-12-02 17:17:22 -0500 (Wed, 02 Dec 2009) | 12 lines
 
 Prevent double closing of FDs by EIVR
 
 This caused a problem when asterisk was under heavy load and running both AGI and EIVR applications.
 EIVR would close an FD at which point it would be considered freed and be used by a new AGI instance
 the second close would then close the FD now in use by AGI.
 
 (closes issue ASTERISK-15196)
 Reported by: diLLec
 Tested by: thedavidfactor, diLLec
 
 Review: https://reviewboard.asterisk.org/r/436/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=232811

By: Digium Subversion (svnbot) 2009-12-03 09:11:00.000-0600

Repository: asterisk
Revision: 232812

U   branches/1.6.1/apps/app_externalivr.c

------------------------------------------------------------------------
r232812 | diruggles | 2009-12-03 09:11:00 -0600 (Thu, 03 Dec 2009) | 19 lines

Merged revisions 232587 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r232587 | diruggles | 2009-12-02 17:17:22 -0500 (Wed, 02 Dec 2009) | 12 lines
 
 Prevent double closing of FDs by EIVR
 
 This caused a problem when asterisk was under heavy load and running both AGI and EIVR applications.
 EIVR would close an FD at which point it would be considered freed and be used by a new AGI instance
 the second close would then close the FD now in use by AGI.
 
 (closes issue ASTERISK-15196)
 Reported by: diLLec
 Tested by: thedavidfactor, diLLec
 
 Review: https://reviewboard.asterisk.org/r/436/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=232812

By: Digium Subversion (svnbot) 2009-12-03 09:22:27.000-0600

Repository: asterisk
Revision: 232813

U   branches/1.6.2/apps/app_externalivr.c

------------------------------------------------------------------------
r232813 | diruggles | 2009-12-03 09:22:27 -0600 (Thu, 03 Dec 2009) | 19 lines

Merged revisions 232587 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r232587 | diruggles | 2009-12-02 17:17:22 -0500 (Wed, 02 Dec 2009) | 12 lines
 
 Prevent double closing of FDs by EIVR
 
 This caused a problem when asterisk was under heavy load and running both AGI and EIVR applications.
 EIVR would close an FD at which point it would be considered freed and be used by a new AGI instance
 the second close would then close the FD now in use by AGI.
 
 (closes issue ASTERISK-15196)
 Reported by: diLLec
 Tested by: thedavidfactor, diLLec
 
 Review: https://reviewboard.asterisk.org/r/436/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=232813