Summary: | ASTERISK-24015: app_transfer fails with PJSIP channels | ||||||
Reporter: | Private Name (falves11) | Labels: | |||||
Date Opened: | 2014-07-09 20:59:35 | Date Closed: | 2015-02-12 14:34:43.000-0600 | ||||
Priority: | Major | Regression? | |||||
Status: | Closed/Complete | Components: | Applications/app_transfer | ||||
Versions: | SVN 12.3.2 12.5.0 | Frequency of Occurrence | Constant | ||||
Related Issues: |
| ||||||
Environment: | Linux Fedora 20 | Attachments: | ( 0) backtrace.txt ( 1) full_answered.txt ( 2) full_no_answer.txt ( 3) myDebugLog ( 4) pjsip_trace.txt ( 5) valgrind.core.txt ( 6) valgrind.txt ( 7) valgrind.txt | ||||
Description: | When using PJSIP, the Transfer application does not behave like the when using the old SIP channel, i.e., generate 301 Redirect messages
Here is the trace {noformat} [Jul 9 21:39:29] DEBUG[47716][C-00000002]: pbx.c:4869 pbx_extension_helper: Launching 'Transfer' -- Executing [17274428141@redirect:30] Transfer("PJSIP/Client.1.1.1.1-00000002", "PJSIP/17274428141;rn=+18134029999;npdi@1.1.1.1") in new stack [Jul 9 21:39:29] DEBUG[47716][C-00000002]: pbx.c:4869 pbx_extension_helper: Launching 'Verbose' -- Executing [17274428141@redirect:31] Verbose("PJSIP/Client.1.1.1.1-00000002", "2,Transferred: 17274428141;rn=+18134029999;npdi@1.1.1.1") in new stack == Transferred: 17274428141;rn=+18134029999;npdi@1.1.1.1 -- Auto fallthrough, channel 'PJSIP/Client.1.1.1.1-00000002' status is 'UNKNOWN' [Jul 9 21:39:29] DEBUG[47716][C-00000002]: channel.c:2597 ast_softhangup_nolock: Soft-Hanging (0x10) up channel 'PJSIP/Client.1.1.1.1-00000002' [Jul 9 21:39:29] DEBUG[47716][C-00000002]: channel.c:2753 ast_hangup: Hanging up channel 'PJSIP/Client.1.1.1.1-00000002' [Jul 9 21:39:29] DEBUG[47716][C-00000002]: chan_pjsip.c:1578 hangup_cause2sip: AST hangup cause 0 (no match found in PJSIP) <--- Transmitting SIP response (369 bytes) to UDP:1.1.1.1:49260 ---> SIP/2.0 603 Decline v: SIP/2.0/UDP 1.1.1.1:49260;rport;received=1.1.1.1;branch=z9hG4bK-d8754z-22994e127365d474-1---d8754z- i: MmFjNDM4NDc2NmFhZWNiYTU2MDQ1YmNjNGVmYmMyOTY f: "9544447408" <sip:9544447408@8.26.191.189>;tag=82c82c1d t: <sip:17274428141@8.26.191.189>;tag=09f3a67a-f457-46d1-8d16-243478ac3859 CSeq: 1 INVITE Reason: Q.850;cause=0 l: 0 {noformat} Note: it makes no difference if the endpoint has "allow_transfer" in false or true, yes or now. The behavior is identical. This issue is a blocker for me, since I process several million redirects per day. Hence the importance. I already converted everything else and it works perfectly, | ||||||
Comments: | By: Matt Jordan (mjordan) 2014-07-10 06:51:27.730-0500 We require a complete debug log to help triage the issue. This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information By: Private Name (falves11) 2014-07-10 20:29:39.386-0500 Please look in line #523, where I execute Transfer() By: Rusty Newton (rnewton) 2014-08-04 18:21:54.210-0500 Can you attach a packet capture to accompany the log? By: Private Name (falves11) 2014-08-04 21:13:46.205-0500 I am not an engineer, but a business man. If you give me precise instructions, I will follow them. However, the trace shows the sip response, or lack thereof. I cannot imagine what else can be seen in a packet capture that the SIP trace is not showing. By: Private Name (falves11) 2014-08-20 18:15:40.158-0500 Do I need to test a patch? I could not find it. By: Rusty Newton (rnewton) 2014-08-20 20:44:52.526-0500 I was able to reproduce and further define the issue. Attaching full_answered.txt showing debug and trace for a PJSIP call that is answered before calling app_transfer. Then, full_no_answer.txt showing an unanswered call hitting app_transfer. I used the same phone (Linksys SPA942), and similar configuration in chan_sip and res_pjsip: When using SIP channels with app_transfer: * Call is *NOT* answered, hits Transfer, Asterisk responds with "302 Moved Temporarily" * Call is *answered*, his Transfer, Asterisk responds with REFER and the phone responds appropriately. When using PJSIP channels with app_transfer: * Call is *NOT* answered, hits Transfer, Asterisk responds with "603 Decline" * Call is *answered*, his Transfer, Asterisk responds with REFER and the phone *fails to process it*. I noted that in the PJSIP REFER it does not include a Referred-by header and the Refer-to header does not include the domain/IP part of the address. Dialplan used: {noformat} exten => 2000,1,Answer same => n,Transfer(PJSIP/6002) exten => 6001,1,Dial(PJSIP/6001) exten => 6002,1,Dial(PJSIP/6002) {noformat} The sip/pjsip configuration itself doesn't appear to matter. By: Private Name (falves11) 2014-08-20 20:51:39.703-0500 As noted by Rusty, the correct behavior is how the old SIP channels worked Call is NOT answered, hits Transfer, Asterisk responds with "302 Moved Temporarily" Call is answered, his Transfer, Asterisk responds with REFER and the phone responds appropriately. By: Private Name (falves11) 2014-11-09 14:30:01.162-0600 I am ready to try a patch for this case. By: Rusty Newton (rnewton) 2014-11-13 09:58:35.145-0600 Unfortunately there is no patch to try yet. If a developer takes on the issue you'll see updates here and a patch attached to the issue. By: Matt Jordan (mjordan) 2015-01-18 21:35:48.202-0600 A patch for this is now up for review at https://reviewboard.asterisk.org/r/4316/, and has been attached to ASTERISK-24703. Note that since this was fixed as a result of adding a new feature to ARI, the patch contains both bug fixes for this issue as well as the new feature. Both will get merged into the 13 branch (most likely in time for 13.3.0). By: Private Name (falves11) 2015-01-19 20:05:17.320-0600 Version 13 does not compile pjsip [CC] res_pjsip/pjsip_distributor.c -> res_pjsip/pjsip_distributor.o res_pjsip/pjsip_distributor.c: In function ‘find_dialog’: res_pjsip/pjsip_distributor.c:145:21: error: ‘pjsip_transaction’ has no member named ‘mutex’ make[1]: *** [res_pjsip/pjsip_distributor.o] Error 1 make: *** [res] Error 2 version SVN-branch-13-r430776M I did git clone https://github.com/asterisk/pjproject pjproject and svn co http://svn.digium.com/svn/asterisk/branches/13 asterisk I did compile pjproject like this ./configure --prefix=/usr --enable-shared --disable-sound --disable-resample --disable-video --disable-opencore-amr make dep make make install ldconfig Note: I also tried downloading svn co http://svn.pjsip.org/repos/pjproject/trunk/ pjproject-trunk but it fails to compile with a larger number of errors. By: Private Name (falves11) 2015-01-30 18:57:00.043-0600 I confirm that it does work as expected. By: Private Name (falves11) 2015-02-08 08:09:38.445-0600 There is a leak handle in this patch: lsof | awk '/asterisk/ && /FIFO/ {print $6}' | sort|uniq -c 278146 FIFO after only asterisk]# asterisk -rx "core show channels" Channel Location State Application(Data) 0 active channels 0 active calls 2869 calls processed Note: SVN-branch-13-r431583M I am only using this functionality, app_transfer, not a single voice call. After a few hours the app needs to be killed and if you wait too much, the box needs to be restarted. By: Private Name (falves11) 2015-02-08 15:06:46.331-0600 Valgrind core file analized wit gdb. By: Private Name (falves11) 2015-02-08 15:08:12.484-0600 valgrind log file up to the moment when it crashed valgrind --leak-check=full --log-file=valgrind.txt /usr/sbin/asterisk -vvvvvvgc By: Private Name (falves11) 2015-02-08 15:10:13.610-0600 I just uploaded two valgrind trace files. I run Asterisk under valgrind in order to identify the module which is leaking handles. After a few calls, it crashed. This was my command. valgrind --leak-check=full --log-file=valgrind.txt /usr/sbin/asterisk -vvvvvvgc kindly let me know if I need to do something different. By: Private Name (falves11) 2015-02-08 15:33:17.844-0600 Valgrind core trace bt and threa all bt By: Private Name (falves11) 2015-02-08 15:33:45.036-0600 Valgrind leak-log By: Private Name (falves11) 2015-02-08 15:38:05.727-0600 The version used to trace is SVN-branch-13-r431600M asterisk -rx "core show uptime" System uptime: 1 minute, 19 seconds Last reload: 1 minute, 19 seconds lsof | grep asterisk | grep FIFO | wc -l 239334 By: Private Name (falves11) 2015-02-08 15:42:58.095-0600 I realize that valgrind cannot be used to debug this, since asterisk always crashes with "illegal instruction" Any idea where to go from here? By: Matt Jordan (mjordan) 2015-02-08 15:52:47.862-0600 I don't know, but your backtrace shows you using a {{codec_g729}} module that did not originate from Digium: {code} Thread 219 (LWP 21608): #0 0x00000000338c6877 in y8_ownsDotProd_32f64f_M7 () from /usr/lib/asterisk/modules/codec_g729.so #1 0x00000000338b0bc4 in y8_ippsDotProd_32f64f () from /usr/lib/asterisk/modules/codec_g729.so #2 0x000000003388ab6e in DecodeGain_G729 () from /usr/lib/asterisk/modules/codec_g729.so #3 0x000000003387dfef in apiG729FPDecode () from /usr/lib/asterisk/modules/codec_g729.so #4 0x000000003387c146 in ?? () from /usr/lib/asterisk/modules/codec_g729.so #5 0x00000000005e54cb in framein (pvt=0xb281ce0, f=0x33b1c620) at translate.c:408 #6 0x00000000005e5ff7 in generate_computational_cost (t=0x33b10e60, seconds=1) at translate.c:642 #7 0x00000000005e78e1 in __ast_register_translator (t=0x33b10e60, mod=0xdca69c0) at translate.c:1187 #8 0x000000003387c492 in ?? () from /usr/lib/asterisk/modules/codec_g729.so #9 0x000000000053460e in start_resource (mod=0xdca69c0) at loader.c:980 #10 0x0000000000534f5b in load_resource_list (load_order=0xffeffefc0, global_symbols=0, mod_count=0xffeffefac) at loader.c:1178 #11 0x00000000005355ab in load_modules (preload_only=0) at loader.c:1331 #12 0x000000000045b0c3 in main (argc=2, argv=0xfff000558) at asterisk.c:4557 {code} Given that you are using a module which is quite possibly illegal, I don't think either I or anyone who works for Digium will be responding to you any further on the mailing lists or on the issue tracker. Good luck in your usage of Asterisk in the future. By: Private Name (falves11) 2015-02-09 13:06:19.458-0600 SVN-branch-13-r431622M After less than two minutes of traffic lsof | grep asterisk | grep -i fifo | wc -l 1892968 with regular sip, in more or less the same time 325 By: Private Name (falves11) 2015-02-11 15:45:39.970-0600 SVN-branch-13-r431698M after 1 minute of traffic lsof | grep asterisk | grep -i fifo |wc -l 1132668 Thanks. By: Private Name (falves11) 2015-02-12 15:07:56.002-0600 It is still not working after 2 minutes of traffic lsof | grep asterisk | grep -i fifo |wc -l 1628716 SVN-branch-13-r431717M By: Private Name (falves11) 2015-02-12 15:14:59.836-0600 If I can be of help to find out where the source of the leak is, please give me detailed instructions. I have a script that changes from SIP to PJSIP in a few seconds, swapping directories. I believe that my app is the only one to show the issue, since it is probably the only one that uses the new Transfer app several hundred times per second. By: Private Name (falves11) 2015-02-13 20:45:19.278-0600 Still leaking handles, at SVN-branch-13-r431734M |