Summary: | ASTERISK-28811: Crash occurs when fax session switches from T.38 to audio | ||||||
Reporter: | Alexey Vasilyev (vasilevalex) | Labels: | fax patch | ||||
Date Opened: | 2020-04-07 02:13:02 | Date Closed: | 2020-04-22 10:10:23 | ||||
Priority: | Major | Regression? | Yes | ||||
Status: | Closed/Complete | Components: | pjproject/pjsip | ||||
Versions: | 16.9.0 | Frequency of Occurrence | One Time | ||||
Related Issues: |
| ||||||
Environment: | CentOS Linux release 7.7.1908 (Core) 3.10.0-1062.4.3.el7.x86_64 | Attachments: | ( 0) ASTERISK-28811-2.diff ( 1) cisco-pbx.txt ( 2) core.28811.tar.gz ( 3) crash1-backtrace.txt ( 4) crash1-sip-trace.txt ( 5) crash2-backtrace.txt ( 6) crash2-sip-trace.txt ( 7) fax_491.txt ( 8) pbx-fax.txt ( 9) pjsip.conf (10) sip-flow-488.txt (11) sip-trace-488.txt | ||||
Description: | During sending fax from Cisco SPA112 device through several Asterisk servers, latest updated server is crashed (Asterisk 16.9.0). But we can't reproduce crash, as sometimes faxes send fine, from other Cisco SPA112 devices faxes just stopped sending (receiving works fine). After downgrade to 16.8.0 everything works fine again. | ||||||
Comments: | By: Asterisk Team (asteriskteam) 2020-04-07 02:13:03.153-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur. By: Alexey Vasilyev (vasilevalex) 2020-04-07 03:05:02.801-0500 Backtraces By: George Joseph (gjoseph) 2020-04-07 10:39:40.082-0500 Did this issue happen with earlier versions of Asterisk? If not, can you pinpoint which version the issue first appeared? By: Alexey Vasilyev (vasilevalex) 2020-04-07 12:55:41.764-0500 This first happened in version 16.9.0. Downgrading to 16.8.0 fixed the issue. The problem happened in function sip_session_refresh() in the file /usr/lib64/asterisk/modules/res_pjsip_session.so which was significantly modified from 16.8.0 to 16.9.0. By: Kevin Harwell (kharwell) 2020-04-13 13:55:57.792-0500 [~vasilevalex], could you attach your pjsip.conf file, or at least the endpoint configurations for the involved parties, along with the relevant dialplan. Also please attach an Asterisk debug log with SIP tracing enabled [1] of a good run of the scenario (one where Asterisk does not crash), and if possible (although might be hard due to the sporadic nature of the problem) similar logging for when a bad run (where Asterisk does crash). [1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information Thanks! By: Alexey Vasilyev (vasilevalex) 2020-04-14 06:58:45.270-0500 Two endpoints. Call was from pbx to fax. Due to legal reasons, I can't attach dial plan and debug log, but there is nothing special there. By: Kevin Harwell (kharwell) 2020-04-14 11:13:28.253-0500 What about a pcap then of a "good run"? Seeing the expected call flow, and SDP's would probably be helpful. I might then be able to setup a test on my end to replicate. I understand there are some things that you can't make public. If you can't attach it here would it be a problem to email that information to asteriskteam@digium.com? We could then attach it to our associated, and restricted (non public) internal issue.. By: Alexey Vasilyev (vasilevalex) 2020-04-15 08:36:21.255-0500 I attached SIP traces, that were made, when Asterisk 16.9.0 was running and later we had the crash. cisco-pbx.txt - SIP call from Cisco SPA112 to first Asterisk server (pbx1) in the chain. Then the call goes to Asterisk 16.9.0, then to another pbx2 and then to fax-server. pbx-fax is SIP trace for the same call, but for the last leg. I don't know if it can help - but all these calls were failed, Cisco could not send fax. When we downgraded server in the middle to 16.8.0. all the faxes start working again. And it looks, like the similar call crashed 16.9.0 By: Joshua C. Colp (jcolp) 2020-04-20 07:05:33.131-0500 Can you please try applying the attached patch and retrying your faxing. By: Joshua C. Colp (jcolp) 2020-04-20 10:08:05.575-0500 Here is a slightly updated version. By: Alexei Gradinari (alexei gradinari) 2020-04-20 11:37:11.848-0500 We observed the same or related crashes with 2 different types off-nominal re-negotiation. 1. 491 Another INVITE transaction in progress files: crash1-sip-trace.txt, crash1-backtrace.txt 2. 488 Not Acceptable Here files: crash2-sip-trace.txt, crash2-backtrace.txt All files were edited to remove private information about ip-addresses, usernames and caller ids, so the length values are incorrect. By: Joshua C. Colp (jcolp) 2020-04-20 11:42:22.151-0500 Are these before the current patch that is up, or with the patch applied? By: Kevin Harwell (kharwell) 2020-04-20 11:43:54.817-0500 [~alexei gradinari] Is that with or without the attached patch ([^ASTERISK-28811-2.diff]) applied? By: Alexei Gradinari (alexei gradinari) 2020-04-20 12:02:35.866-0500 Without patch. I'm compiling the asterisk with this patch right now, will run patched asterisk and let you know about further results. I uploaded 2 SIP traces and backtraces so you know that there are 2 places to crash. By: Joshua C. Colp (jcolp) 2020-04-20 12:11:31.628-0500 The core of the issue is the same for both. By: Alexey Vasilyev (vasilevalex) 2020-04-20 13:04:40.766-0500 Thanks. I'll try to test with patch tomorrow By: Alexei Gradinari (alexei gradinari) 2020-04-20 14:12:49.244-0500 With patch in case "2. 488 Not Acceptable Here" there isn't crash, but the asterisk did not send re-INVITE with sdp on 488. By: Joshua C. Colp (jcolp) 2020-04-20 14:18:09.996-0500 I'm not sure what you mean by that. Can you clarify further what you are expecting/what is seen in previous versions in comparison to this? By: Alexei Gradinari (alexei gradinari) 2020-04-20 14:46:08.697-0500 The FAX1 sends INVITE (VOICE) to asterisk, asterisk sends INVITE to FAX2 (VOICE). The FAX2 replies with 200 to asterisk, asterisk replies with 200 to FAX1. The FAX1 detects fax tone and sends re-INVITE (T.38) to asterisk, asterisk sends re-INVITE (T.38) to FAX2. The FAX2 replies 488 to asterisk, asterisk replies 488 to FAX1. The FAX1 switches back to voice as T.38 not supported and sends re-INVITE (VOICE) to asterisk, asterisk DOES NOT send re-INVITE (VOICE) to FAX2. The FAX1 sends BYE to asterisk, the asterisk send BYE to FAX2. The FAX2 replies 481 Call Leg/Transaction Does Not Exist to asterisk... I think because the asterisk didn't switch to VOICE. In version 16.9.0 without patch the asterisk always crached on 488, my files crash2-backtrace.txt and crash2-sip-trace.txt. I didn't check this scenario with version 16.8.0. The "481 Call Leg/Transaction Does Not Exist " bothers me in this scenario. I think the asterisk should switch to voice after 488. But may be this is not related issue. By: Joshua C. Colp (jcolp) 2020-04-20 14:51:12.777-0500 I don't believe that issue is related, and switching to voice in that scenario isn't required and I don't believe Asterisk has ever done so. This is because when a re-INVITE Is sent the previous SDP negotiation and state is kept and only replaced if it was successful. This means that if you send a re-INVITE and it receives a 488 then things continue on, as if the re-INVITE was never attempted. "During the session, either Alice or Bob may decide to change the characteristics of the media session. This is accomplished by sending a re-INVITE containing a new media description. This re- INVITE references the existing dialog so that the other party knows that it is to modify an existing session instead of establishing a new session. The other party sends a 200 (OK) to accept the change. The requestor responds to the 200 (OK) with an ACK. If the other party does not accept the change, he sends an error response such as 488 (Not Acceptable Here), which also receives an ACK. However, the failure of the re-INVITE does not cause the existing call to fail - the session continues using the previously negotiated characteristics." The 481 would be concerning, but I don't think anything that has been done would have changed anything there. By: Alexei Gradinari (alexei gradinari) 2020-04-21 16:49:05.366-0500 [~jcolp], I applied today both your latest patches: "fax: Fix crashes in PJSIP re-negotiation scenarios." and "stream: Enforce formats immutability and ensure formats exist.". I can confirm there no more crashes and even no more 481 on BYE. By: Alexey Vasilyev (vasilevalex) 2020-04-22 04:23:14.595-0500 I've applied only one patch ASTERISK-28811-2.diff We have tested the same scenario. Faxes from Cisco SPA works fine now. In the test we used the same device, and before patch we could not send any faxes at all. Thanks! By: Joshua C. Colp (jcolp) 2020-04-22 04:51:28.756-0500 Glad to hear it everyone! By: Friendly Automation (friendly-automation) 2020-04-22 10:10:23.672-0500 Change 14274 merged by Friendly Automation: fax: Fix crashes in PJSIP re-negotiation scenarios. [https://gerrit.asterisk.org/c/asterisk/+/14274|https://gerrit.asterisk.org/c/asterisk/+/14274] By: Friendly Automation (friendly-automation) 2020-04-22 10:10:27.916-0500 Change 14299 merged by Friendly Automation: fax: Fix crashes in PJSIP re-negotiation scenarios. [https://gerrit.asterisk.org/c/asterisk/+/14299|https://gerrit.asterisk.org/c/asterisk/+/14299] By: Friendly Automation (friendly-automation) 2020-04-22 10:15:33.032-0500 Change 14298 merged by Joshua Colp: fax: Fix crashes in PJSIP re-negotiation scenarios. [https://gerrit.asterisk.org/c/asterisk/+/14298|https://gerrit.asterisk.org/c/asterisk/+/14298] By: Alexei Gradinari (alexei gradinari) 2020-04-29 15:01:04.476-0500 [~jcolp], I was able to catch the case "491 Another INVITE transaction in progress" with version 16.10.0-rc2 (file fax_491.txt) The good news - the asterisk wasn't crashed. The bad news - T.38 re-Invite failed. Should I open a new issue? By: Joshua C. Colp (jcolp) 2020-04-29 15:49:53.125-0500 Yes, that would be a separate unrelated issue. By: Friendly Automation (friendly-automation) 2020-04-30 10:52:22.853-0500 Change 14371 merged by George Joseph: fax: Fix crashes in PJSIP re-negotiation scenarios. [https://gerrit.asterisk.org/c/asterisk/+/14371|https://gerrit.asterisk.org/c/asterisk/+/14371] |