Summary:ASTERISK-12695: wait_for_answer never receives HANGUP frame sent via ast_queue_hangup
Reporter:guy viviers (gui)Labels:
Date Opened:2008-09-08 14:24:27Date Closed:2011-06-07 14:02:49
Versions:Frequency of
Description:1) A user calls into our Asterisk pbx via one of our PSTN lines and dials
  a SIP extension.
2) The caller hangs up the PSTN line before anyone answers the SIP extension.
3) Our channel driver calls ast_queue_hangup to inform Asterisk of the hangup.
4) Asterisk never acknowledges the hangup and the call eventually times out.


I made a change in our code that fixes this problem but I wanted to run this
by you guys because I think it points to a larger problem. The change is ...

<Code removed.  Code _must_ be included as an attachment.>

The wait_for_answer function sleeps in ast_waitfor_n until something
causes the poll system call to return. When the caller hangs up before
anyone answers our channel driver calls ast_queue_hangup, which queues
a HANGUP control frame on the channel's read queue and sends an interrupt
signal to the sleeping thread.

The poll function in ast_waitfor_n returns -1 because of the interrupt
which causes ast_waitfor_n to return 0. When wait_for_answer receives a 0
return value from ast_waitfor_n it doesn't check the caller's read queue.

The change I made causes wait_for_answer to check the calling channel's
read queue upon return from ast_waitfor_n, but I believe the burden should
be on ast_waitfor_n to return successful status not only when it receives
an RTP frame but when it receives a frame via a channel's read queue too.

I didn't make this change to ast_waitfor_n (actually ast_waitfor_nandfds)
myself because it is called by many other functions and the changes that
I make could cause unintended side-effects to code that is used to its
current behavior.

Comments:By: Mark Michelson (mmichelson) 2008-09-08 17:38:06

Hmm, I have a feeling that there's nothing broken, per se, but that documentation regarding expected return values of certain functions is lacking.

For instance, you've stated that the AST_CONTROL_HANGUP frame is not detected properly in app_dial's wait_for_answer function. Here's something important to note. If a channel is ever returned by ast_waitfor_n and then an ast_read of that channel returns NULL, then you can safely take that to mean that the channel has hung up.

For an example, look at app_dial's wait_for_answer function. If you look at the tag and look at line 563, you'll see that a null frame pointer is taken to mean that a hangup has occurred. Another example of this is on line 703.

I hope this has been helpful. If it hasn't and I'm completely missing the point of this bug report, let me know.

By: guy viviers (gui) 2008-09-09 09:43:54

Hi putnopvut,

Thanks for your reply.

Having spent 2 full days tracking down the source of this bug I am aware
of how ast_read behaves when it is called and finds an AST_CONTROL_HANGUP
frame on its read queue. The change that I made simply causes ast_read to
be called when ast_waitfor_n returns and the code finds something is on
its input channel's read queue.

The only point that I was trying to make was that it would be nice if
ast_waitfor_n returned successful status any time it was safe to call
ast_read, which is what is implied.

The reason I mentioned this is because I suspect that there are other
instances of code within Asterisk that call ast_waitfor_n which, like
wait_for_answer, dont behave as expected.

As far as we're concerned this issue is closed because wait_for_answer
is now behaving as we expect. I only made this bug report in the "spirit
of sharing" (yeeewwww ... being an evil capitalist at heart, that phrase
gives me the creeps!)


By: Joel Vandal (jvandal) 2008-09-16 09:02:44

Using latest branches/1.4 (SVN rev 143202), got these locks, maybe this is related to this ticket ?

=== Currently Held Locks ==============================================
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
=== Thread ID: 2997132192 (pbx_thread           started at [ 2645] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 1451 ast_hangup &chan->lock 0xb28b59a8 (1)
=== ---> Lock #1 (chan_local.c): MUTEX 519 local_hangup &p->lock 0xb28ff118 (1)
=== ---> Tried and failed to get Lock #2 (channel.c): MUTEX 962 ast_queue_hangup &chan->lock 0xb2823b10 (1)
=== -------------------------------------------------------------------
=== Thread ID: 2960927648 (pbx_thread           started at [ 2645] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 2613 ast_write &chan->lock 0xb2823b10 (1)
=== ---> Waiting for Lock #1 (chan_local.c): MUTEX 313 local_write &p->lock 0xb28ff118 (1)
=== --- ---> Locked Here: chan_local.c line 519 (local_hangup)
=== -------------------------------------------------------------------

By: Russell Bryant (russell) 2008-10-05 16:37:29

There are a number of problems that be the cause of this within the channel driver itself.  Since you're using a custom channel driver, we can not support you here.  If you are able to reproduce this without any custom code in use, then feel free to reopen this issue.