Summary:ASTERISK-16352: [patch] Dialplan execution stops after awhile on an attended transfer by the calling party (with use of Dial 'g' option)
Reporter:Ramon Peek-Fares (ramonpeek)Labels:
Date Opened:2010-07-12 06:42:08Date Closed:2015-03-27 15:48:44
Versions:Frequency of
Environment:Attachments:( 0) debug_log2.tar.gz
( 1) patch.diff
Description:Dialplan execution stops after awhile on an attended transfer by the calling party  (with use of Dial() command 'g' option)
This applies to systems with large dialplans (which we use), small dialplans do not seem to encounter much problems, due to the delay in SIP messaging.

If traced, you can see the after Asterisk receives a SIP BYE message, the dialplan execution is immediately stopped even though there are other priorities to evaluate in the dialplan.

SEE STEPS TO REPRODUCE (in advanced view mode)


I use this dialplan:

exten = 801,1,Dial(SIP/401,15,g)
exten = 801,n,Goto(test,s,1)

exten = 802,1,Dial(SIP/402,15,g)
exten = 802,n,Goto(test,s,1)

exten = 803,1,Dial(SIP/403,15,g)
exten = 803,n,Goto(test,s,1)

exten = h,1,Goto(test,s,1)

exten = s,1,Set(COUNT=1)
exten = s,n,While($[${COUNT} <= 1000])
exten = s,n,Noop(Waiting loop started for issue 1321 - counter value = ${COUNT})
exten = s,n,Set(COUNT=$[${COUNT} + 1])
exten = s,n,EndWhile
exten = s,n,Hangup


* From peer 403 (extension 803) dial 801 (peer 401) and answer.
* Put call on Hold at 403
* From peer 403 dial 802 (peer 402) and answer.
* transfer the call.. (SIP REFER)

Notice that the dialplan execution stops before the counter reaches 1000.

Please note that this example may seem ludicrous. However in my dialplan I use many ODBC calls that take up some time to execute, which is why I was confronted with this issue.
Comments:By: Ramon Peek-Fares (ramonpeek) 2010-07-12 06:53:56

See attached debug log in which the result of the "Steps to reproduce" are shown.

By: Ramon Peek-Fares (ramonpeek) 2010-07-12 07:00:51

I've noticed something strange in "pbx.c".

At line: 2442 it says:  "/* end while  - from here on we can use 'break' to go out */"
So why do we use break within the while loop???

I was able to resolve this issue by simply removing the "break" command at line 2439. (see attached patch)

But whether that's the right thing to do???
If it is, we should also check if all the other break commands in the while loop are correct?

By: Ramon Peek-Fares (ramonpeek) 2010-07-12 08:57:32

please remove old "debug_log.tar.gz".....  It's the wrong one.. :-(
The new one is "debug_log2.tar.gz"

By: Ramon Peek-Fares (ramonpeek) 2010-07-12 09:56:29

Mmm... weird behavior on my side..
Last Friday and this Monday morning I was able to reproduce the problem easily time after time (see the trace..)
But this afternoon I'm not able to reproduce it anymore.
I'm afraid it has to do with the phones too and the way they respond.
Anyway... I'm not letting go so easily ;-)
I'll try again tomorrow..

In the meantime, who can explain the behavior shown in the debug_log2 file?
(At least that was taken when the error was occurring.)

Please note:
On the CLI things do seems to stop responding. (at least at my system)
But I've seen that behavior before and its a console issue, since the full log in /var/log/asterisk/ shows clearly that every priority as been executed.

By: Ramon Peek-Fares (ramonpeek) 2010-07-13 01:55:10

It appears my machine just got a lot quicker (If I knew how, I would have sold my knowledge to Intel ;-) )

Anyway, it explains why I wasn't able to reproduce the problem.
Simply increasing the While loop to count until 2000 does the trick or like I did, include a command that takes a little longer to execute. Like this ODBC call:

exten = s,n,Set(RESULT=${ODBC_CST(SELECT callerid FROM device WHERE id= '\1'\)})

All of which will help if you want to see the error occur...

HINT: Don't run Asterisk in the background, it could make your CLI output incomplete. just start it using "asterisk -vvvc"

By: Ramon Peek-Fares (ramonpeek) 2010-07-23 11:28:47

Has anyone yet reviewed the patch I've uploaded?
I really need someone with the right knowledge to confirm (or not) that my code is OK

By: Matt Jordan (mjordan) 2015-03-27 15:48:36.311-0500

So, first, sorry no one ever replied back here. That kind of sucks.

Looking at your log, this never was actually a bug:
[Jul 12 13:06:42] VERBOSE[22529] logger.c:     -- Executing [s@test:4] Set("SIP/403-00000000", "COUNT=245") in new stack
[Jul 12 13:06:42] DEBUG[22529] pbx.c: Launching 'EndWhile'
[Jul 12 13:06:42] VERBOSE[22529] logger.c:     -- Executing [s@test:5] EndWhile("SIP/403-00000000", "") in new stack
[Jul 12 13:06:42] DEBUG[22464] chan_sip.c: = No match Their Call ID: 661e34bd326a41513b75a0177b0a7e36@pbx.example.net Their Tag 5m8wetfyfp Our tag: as6fde7297
[Jul 12 13:06:42] DEBUG[22464] chan_sip.c: = No match Their Call ID: 3c2e49bd00dd-k1oqkgshy5g4 Their Tag 8rtcqs97zi Our tag: as0d3c611e
[Jul 12 13:06:42] DEBUG[22464] chan_sip.c: = No match Their Call ID: 30d126e27bda6e9d6fc66bfc3723b050@pbx.example.net Their Tag n0xmr3nd78 Our tag: as4b3b9a42
[Jul 12 13:06:42] DEBUG[22464] chan_sip.c: = Found Their Call ID: 3c2e49b707ef-pezt1god6wj3 Their Tag pq023yzfo8 Our tag: as158b66b1
[Jul 12 13:06:42] DEBUG[22464] chan_sip.c: **** Received BYE (8) - Command in SIP BYE
[Jul 12 13:06:42] DEBUG[22464] chan_sip.c: Setting SIP_ALREADYGONE on dialog 3c2e49b707ef-pezt1god6wj3
[Jul 12 13:06:42] DEBUG[22464] chan_sip.c: Received bye, issuing owner hangup
[Jul 12 13:06:42] DEBUG[22529] pbx.c: Extension s, priority 1 returned normally even though call was hung up
[Jul 12 13:06:42] DEBUG[22529] channel.c: Soft-Hanging up channel 'SIP/403-00000000'
[Jul 12 13:06:42] DEBUG[22529] channel.c: Hanging up channel 'SIP/403-00000000'

When Asterisk receives a BYE, that hangs up a channel. When the channel hangs up, it stops executing dialplan. That's expected behaviour. If there are any hangup handlers or an {{h}} extension, we would go execute there - but the current execution is done. We only keep executing dialplan so long as the channel is up.

The reason why your patch "worked" is because it removed the {{break}} in the PBX core that stops dialplan execution on hangup. However, that is inherently unsafe. Many, many dialplan applications would crash if they were invoked on a hungup channel, as they would attempt to manipulate a channel that was already dead. What's more, there would be no good way for us to end dialplan execution until someone finally ran out of dialplan to execute - which means most Asterisk dialplans in the world would break.

As such, I'm closing this out as Not a Bug.