Summary:ASTERISK-24015: app_transfer fails with PJSIP channels
Reporter:Private Name (falves11)Labels:
Date Opened:2014-07-09 20:59:35Date Closed:2015-02-12 14:34:43.000-0600
Versions:SVN 12.3.2 12.5.0 Frequency of
is a clone ofASTERISK-24792 CLONE - app_transfer fails with PJSIP channels
is related toASTERISK-24703 ARI: Add the ability to "transfer" (redirect) a channel
Environment:Linux Fedora 20Attachments:( 0) backtrace.txt
( 1) full_answered.txt
( 2) full_no_answer.txt
( 3) myDebugLog
( 4) pjsip_trace.txt
( 5) valgrind.core.txt
( 6) valgrind.txt
( 7) valgrind.txt
Description:When using PJSIP, the Transfer application does not behave like the when using the old SIP channel, i.e., generate 301 Redirect messages

Here is the trace
[Jul  9 21:39:29] DEBUG[47716][C-00000002]: pbx.c:4869 pbx_extension_helper: Launching 'Transfer'
   -- Executing [17274428141@redirect:30] Transfer("PJSIP/Client.", "PJSIP/17274428141;rn=+18134029999;npdi@") in new stack
[Jul  9 21:39:29] DEBUG[47716][C-00000002]: pbx.c:4869 pbx_extension_helper: Launching 'Verbose'
   -- Executing [17274428141@redirect:31] Verbose("PJSIP/Client.", "2,Transferred: 17274428141;rn=+18134029999;npdi@") in new stack
 == Transferred: 17274428141;rn=+18134029999;npdi@
   -- Auto fallthrough, channel 'PJSIP/Client.' status is 'UNKNOWN'
[Jul  9 21:39:29] DEBUG[47716][C-00000002]: channel.c:2597 ast_softhangup_nolock: Soft-Hanging (0x10) up channel 'PJSIP/Client.'
[Jul  9 21:39:29] DEBUG[47716][C-00000002]: channel.c:2753 ast_hangup: Hanging up channel 'PJSIP/Client.'
[Jul  9 21:39:29] DEBUG[47716][C-00000002]: chan_pjsip.c:1578 hangup_cause2sip: AST hangup cause 0 (no match found in PJSIP)
<--- Transmitting SIP response (369 bytes) to UDP: --->
SIP/2.0 603 Decline
v: SIP/2.0/UDP;rport;received=;branch=z9hG4bK-d8754z-22994e127365d474-1---d8754z-
f: "9544447408" <sip:9544447408@>;tag=82c82c1d
t: <sip:17274428141@>;tag=09f3a67a-f457-46d1-8d16-243478ac3859
Reason: Q.850;cause=0
l:  0
Note: it makes no difference if the endpoint has "allow_transfer" in false or true, yes or now. The behavior is identical.

This issue is a blocker for me, since I process several million redirects per day. Hence the importance. I already converted everything else and it works perfectly,
Comments:By: Matt Jordan (mjordan) 2014-07-10 06:51:27.730-0500

We require a complete debug log to help triage the issue. This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Private Name (falves11) 2014-07-10 20:29:39.386-0500

Please look in line #523, where I execute Transfer()

By: Rusty Newton (rnewton) 2014-08-04 18:21:54.210-0500

Can you attach a packet capture to accompany the log?

By: Private Name (falves11) 2014-08-04 21:13:46.205-0500

I am not an engineer, but a business man. If you give me precise instructions, I will follow them.
However, the trace shows the sip response, or lack thereof.
I cannot imagine what else can be seen in a packet capture that the SIP trace is not showing.

By: Private Name (falves11) 2014-08-20 18:15:40.158-0500

Do I need to test a patch?
I could not find it.

By: Rusty Newton (rnewton) 2014-08-20 20:44:52.526-0500

I was able to reproduce and further define the issue.

Attaching full_answered.txt showing debug and trace for a PJSIP call that is answered before calling app_transfer. Then, full_no_answer.txt showing an unanswered call hitting app_transfer.

I used the same phone (Linksys SPA942), and similar configuration in chan_sip and res_pjsip:

When using SIP channels with app_transfer:

* Call is *NOT* answered, hits Transfer, Asterisk responds with "302 Moved Temporarily"
* Call is *answered*, his Transfer, Asterisk responds with REFER and the phone responds appropriately.

When using PJSIP channels with app_transfer:

* Call is *NOT* answered, hits Transfer, Asterisk responds with "603 Decline"
* Call is *answered*, his Transfer, Asterisk responds with REFER and the phone *fails to process it*.

I noted that in the PJSIP REFER it does not include a Referred-by header and the Refer-to header does not include the domain/IP part of the address.

Dialplan used:

exten => 2000,1,Answer
same => n,Transfer(PJSIP/6002)

exten => 6001,1,Dial(PJSIP/6001)
exten => 6002,1,Dial(PJSIP/6002)

The sip/pjsip configuration itself doesn't appear to matter.

By: Private Name (falves11) 2014-08-20 20:51:39.703-0500

As noted by Rusty, the correct behavior is how the old SIP channels worked

   Call is NOT answered, hits Transfer, Asterisk responds with "302 Moved Temporarily"
   Call is answered, his Transfer, Asterisk responds with REFER and the phone responds appropriately.

By: Private Name (falves11) 2014-11-09 14:30:01.162-0600

I am ready to try a patch for this case.

By: Rusty Newton (rnewton) 2014-11-13 09:58:35.145-0600

Unfortunately there is no patch to try yet. If a developer takes on the issue you'll see updates here and a patch attached to the issue.

By: Matt Jordan (mjordan) 2015-01-18 21:35:48.202-0600

A patch for this is now up for review at https://reviewboard.asterisk.org/r/4316/, and has been attached to ASTERISK-24703.

Note that since this was fixed as a result of adding a new feature to ARI, the patch contains both bug fixes for this issue as well as the new feature. Both will get merged into the 13 branch (most likely in time for 13.3.0).

By: Private Name (falves11) 2015-01-19 20:05:17.320-0600

Version 13  does not compile pjsip
  [CC] res_pjsip/pjsip_distributor.c -> res_pjsip/pjsip_distributor.o
res_pjsip/pjsip_distributor.c: In function ‘find_dialog’:
res_pjsip/pjsip_distributor.c:145:21: error: ‘pjsip_transaction’ has no member named ‘mutex’
make[1]: *** [res_pjsip/pjsip_distributor.o] Error 1
make: *** [res] Error 2
version SVN-branch-13-r430776M
I did
git clone https://github.com/asterisk/pjproject pjproject
svn co http://svn.digium.com/svn/asterisk/branches/13 asterisk
I did compile pjproject like this
./configure --prefix=/usr --enable-shared --disable-sound --disable-resample --disable-video --disable-opencore-amr
make dep
make install
Note: I also tried downloading
svn co http://svn.pjsip.org/repos/pjproject/trunk/ pjproject-trunk
but it fails to compile with a larger number of errors.

By: Private Name (falves11) 2015-01-30 18:57:00.043-0600

I confirm that it does work as expected.

By: Private Name (falves11) 2015-02-08 08:09:38.445-0600

There is a leak handle in this patch:

lsof | awk '/asterisk/ && /FIFO/ {print $6}' | sort|uniq -c
278146 FIFO

after only
asterisk]# asterisk -rx "core show channels"
Channel              Location             State   Application(Data)
0 active channels
0 active calls
2869 calls processed

I am only using this functionality, app_transfer, not a single voice call.
After a few hours the app needs to be killed and  if you wait too much, the box needs to be restarted.

By: Private Name (falves11) 2015-02-08 15:06:46.331-0600

Valgrind core file analized wit gdb.

By: Private Name (falves11) 2015-02-08 15:08:12.484-0600

valgrind log file up to the moment when it crashed

valgrind --leak-check=full --log-file=valgrind.txt /usr/sbin/asterisk -vvvvvvgc

By: Private Name (falves11) 2015-02-08 15:10:13.610-0600

I just uploaded two valgrind trace files. I run Asterisk under valgrind in order to identify the module which is leaking handles. After a few calls, it crashed.
This was my command.
valgrind --leak-check=full --log-file=valgrind.txt /usr/sbin/asterisk -vvvvvvgc
kindly let me know if I need to do something different.

By: Private Name (falves11) 2015-02-08 15:33:17.844-0600

Valgrind core trace bt and threa all bt

By: Private Name (falves11) 2015-02-08 15:33:45.036-0600

Valgrind leak-log

By: Private Name (falves11) 2015-02-08 15:38:05.727-0600

The version used to trace is

asterisk -rx "core show uptime"
System uptime: 1 minute, 19 seconds
Last reload: 1 minute, 19 seconds
lsof | grep asterisk | grep FIFO | wc -l

By: Private Name (falves11) 2015-02-08 15:42:58.095-0600

I realize that valgrind cannot be used to debug this, since asterisk always crashes with "illegal instruction"
Any idea where to go from here?

By: Matt Jordan (mjordan) 2015-02-08 15:52:47.862-0600

I don't know, but your backtrace shows you using a {{codec_g729}} module that did not originate from Digium:

Thread 219 (LWP 21608):
#0  0x00000000338c6877 in y8_ownsDotProd_32f64f_M7 () from /usr/lib/asterisk/modules/codec_g729.so
#1  0x00000000338b0bc4 in y8_ippsDotProd_32f64f () from /usr/lib/asterisk/modules/codec_g729.so
#2  0x000000003388ab6e in DecodeGain_G729 () from /usr/lib/asterisk/modules/codec_g729.so
#3  0x000000003387dfef in apiG729FPDecode () from /usr/lib/asterisk/modules/codec_g729.so
#4  0x000000003387c146 in ?? () from /usr/lib/asterisk/modules/codec_g729.so
#5  0x00000000005e54cb in framein (pvt=0xb281ce0, f=0x33b1c620) at translate.c:408
#6  0x00000000005e5ff7 in generate_computational_cost (t=0x33b10e60, seconds=1) at translate.c:642
#7  0x00000000005e78e1 in __ast_register_translator (t=0x33b10e60, mod=0xdca69c0) at translate.c:1187
#8  0x000000003387c492 in ?? () from /usr/lib/asterisk/modules/codec_g729.so
#9  0x000000000053460e in start_resource (mod=0xdca69c0) at loader.c:980
#10 0x0000000000534f5b in load_resource_list (load_order=0xffeffefc0, global_symbols=0, mod_count=0xffeffefac) at loader.c:1178
#11 0x00000000005355ab in load_modules (preload_only=0) at loader.c:1331
#12 0x000000000045b0c3 in main (argc=2, argv=0xfff000558) at asterisk.c:4557

Given that you are using a module which is quite possibly illegal, I don't think either I or anyone who works for Digium will be responding to you any further on the mailing lists or on the issue tracker.

Good luck in your usage of Asterisk in the future.

By: Private Name (falves11) 2015-02-09 13:06:19.458-0600

After less than two minutes of traffic
lsof | grep asterisk | grep -i fifo | wc -l
with regular sip, in more or less the same time

By: Private Name (falves11) 2015-02-11 15:45:39.970-0600

after 1 minute of traffic
lsof | grep asterisk | grep -i fifo |wc -l

By: Private Name (falves11) 2015-02-12 15:07:56.002-0600

It is still not working
after 2 minutes of traffic
lsof | grep asterisk | grep -i fifo |wc -l


By: Private Name (falves11) 2015-02-12 15:14:59.836-0600

If I can be of help to find out where the source of the leak is, please give me detailed instructions. I have a script that changes from SIP to PJSIP in a few seconds, swapping directories. I believe that my app is the only one  to show the issue, since it is probably the only one that uses the new Transfer app several hundred times per second.

By: Private Name (falves11) 2015-02-13 20:45:19.278-0600

Still leaking handles, at