|Summary:||ASTERISK-12695: wait_for_answer never receives HANGUP frame sent via ast_queue_hangup|
|Reporter:||guy viviers (gui)||Labels:|
|Date Opened:||2008-09-08 14:24:27||Date Closed:||2011-06-07 14:02:49|
|Description:||1) A user calls into our Asterisk pbx via one of our PSTN lines and dials|
a SIP extension.
2) The caller hangs up the PSTN line before anyone answers the SIP extension.
3) Our channel driver calls ast_queue_hangup to inform Asterisk of the hangup.
4) Asterisk never acknowledges the hangup and the call eventually times out.
****** ADDITIONAL INFORMATION ******
I made a change in our code that fixes this problem but I wanted to run this
by you guys because I think it points to a larger problem. The change is ...
<Code removed. Code _must_ be included as an attachment.>
The wait_for_answer function sleeps in ast_waitfor_n until something
causes the poll system call to return. When the caller hangs up before
anyone answers our channel driver calls ast_queue_hangup, which queues
a HANGUP control frame on the channel's read queue and sends an interrupt
signal to the sleeping thread.
The poll function in ast_waitfor_n returns -1 because of the interrupt
which causes ast_waitfor_n to return 0. When wait_for_answer receives a 0
return value from ast_waitfor_n it doesn't check the caller's read queue.
The change I made causes wait_for_answer to check the calling channel's
read queue upon return from ast_waitfor_n, but I believe the burden should
be on ast_waitfor_n to return successful status not only when it receives
an RTP frame but when it receives a frame via a channel's read queue too.
I didn't make this change to ast_waitfor_n (actually ast_waitfor_nandfds)
myself because it is called by many other functions and the changes that
I make could cause unintended side-effects to code that is used to its
|Comments:||By: Mark Michelson (mmichelson) 2008-09-08 17:38:06|
Hmm, I have a feeling that there's nothing broken, per se, but that documentation regarding expected return values of certain functions is lacking.
For instance, you've stated that the AST_CONTROL_HANGUP frame is not detected properly in app_dial's wait_for_answer function. Here's something important to note. If a channel is ever returned by ast_waitfor_n and then an ast_read of that channel returns NULL, then you can safely take that to mean that the channel has hung up.
For an example, look at app_dial's wait_for_answer function. If you look at the 188.8.131.52 tag and look at line 563, you'll see that a null frame pointer is taken to mean that a hangup has occurred. Another example of this is on line 703.
I hope this has been helpful. If it hasn't and I'm completely missing the point of this bug report, let me know.
By: guy viviers (gui) 2008-09-09 09:43:54
Thanks for your reply.
Having spent 2 full days tracking down the source of this bug I am aware
of how ast_read behaves when it is called and finds an AST_CONTROL_HANGUP
frame on its read queue. The change that I made simply causes ast_read to
be called when ast_waitfor_n returns and the code finds something is on
its input channel's read queue.
The only point that I was trying to make was that it would be nice if
ast_waitfor_n returned successful status any time it was safe to call
ast_read, which is what is implied.
The reason I mentioned this is because I suspect that there are other
instances of code within Asterisk that call ast_waitfor_n which, like
wait_for_answer, dont behave as expected.
As far as we're concerned this issue is closed because wait_for_answer
is now behaving as we expect. I only made this bug report in the "spirit
of sharing" (yeeewwww ... being an evil capitalist at heart, that phrase
gives me the creeps!)
By: Joel Vandal (jvandal) 2008-09-16 09:02:44
Using latest branches/1.4 (SVN rev 143202), got these locks, maybe this is related to this ticket ?
=== Currently Held Locks ==============================================
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
=== Thread ID: 2997132192 (pbx_thread started at [ 2645] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 1451 ast_hangup &chan->lock 0xb28b59a8 (1)
=== ---> Lock #1 (chan_local.c): MUTEX 519 local_hangup &p->lock 0xb28ff118 (1)
=== ---> Tried and failed to get Lock #2 (channel.c): MUTEX 962 ast_queue_hangup &chan->lock 0xb2823b10 (1)
=== Thread ID: 2960927648 (pbx_thread started at [ 2645] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 2613 ast_write &chan->lock 0xb2823b10 (1)
=== ---> Waiting for Lock #1 (chan_local.c): MUTEX 313 local_write &p->lock 0xb28ff118 (1)
=== --- ---> Locked Here: chan_local.c line 519 (local_hangup)
By: Russell Bryant (russell) 2008-10-05 16:37:29
There are a number of problems that be the cause of this within the channel driver itself. Since you're using a custom channel driver, we can not support you here. If you are able to reproduce this without any custom code in use, then feel free to reopen this issue.