Summary: | ASTERISK-26686: res_pjsip: Lock inversion in transport management | ||||
Reporter: | Ross Beer (rossbeer) | Labels: | pjsip | ||
Date Opened: | 2017-01-03 01:44:23.000-0600 | Date Closed: | 2018-07-09 06:56:08 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Resources/res_pjsip | ||
Versions: | 13.13.1 | Frequency of Occurrence | Frequent | ||
Related Issues: |
| ||||
Environment: | Fedora Server 23 SQLLite 3.11.0 | Attachments: | ( 0) backtrace_20160103.txt | ||
Description: | Asterisk lock inversion in the PJSIP transport management code for keeping transports alive.
Workaround is to set 'keep_alive_interval=0' | ||||
Comments: | By: Asterisk Team (asteriskteam) 2017-01-03 01:44:25.193-0600 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Joshua C. Colp (jcolp) 2017-01-03 06:12:28.134-0600 This has nothing to do with astdb. It's a lock inversion in the PJSIP transport management code we have for keeping transports alive. One thread has our lock and is trying to get a transport lock, another thread has the transport lock and is trying to get our lock. By: Ross Beer (rossbeer) 2017-01-03 08:07:18.891-0600 As a temporary fix, would the following setting resolve the issue: keep_alive_interval=0 By: Joshua C. Colp (jcolp) 2017-01-03 08:12:03.365-0600 Yes, that should disable the functionality which causes the problem. By: Ross Beer (rossbeer) 2017-02-13 10:13:30.667-0600 The temporary fix has stopped the issue, however, the underlying issue remains. By: Richard Mudgett (rmudgett) 2018-07-03 11:22:10.343-0500 Thread 70 and 71 are deadlocked. The locks involved are the pjproject transport manager group lock and the monitored_transports container lock. The deadlocking code is still present in 13.21.0 even though that code moved to a new file. By: Ross Beer (rossbeer) 2018-07-04 06:35:52.522-0500 Does the PJSIP config PJSIP_TCP_KEEP_ALIVE_INTERVAL and PJSIP_TLS_KEEP_ALIVE_INTERVAL need to be defined and set to 0 to stop PJSIP also sending keepalives every 90 seconds? According to the documentation, these values have a default of 90 and will, therefore, send keepalives also, see: http://www.pjsip.org/pjsip/docs/html/group__PJSIP__CONFIG.htm#ga02217f4919a7c575d71eed407be63d04 By: Richard Mudgett (rmudgett) 2018-07-06 14:30:07.922-0500 https://blogs.asterisk.org/2018/01/27/wanted-dead-or-alive/ By: Ross Beer (rossbeer) 2018-07-06 17:17:46.234-0500 My point exactly... Does the PJSIP implementation of the keep alive need to be disabled with the bundled version? By: Ross Beer (rossbeer) 2018-07-09 04:49:21.986-0500 Looking through the issue tracker, there is also an open ticket regarding PJSIP also sending keepalives which means that there are more keepalives sent than expected. See ASTERISK-27347 By: Friendly Automation (friendly-automation) 2018-07-09 06:56:10.944-0500 Change 9330 merged by Jenkins2: res_pjsip/pjsip_transport_management.c: Fix deadlock with transport keep alive. [https://gerrit.asterisk.org/9330|https://gerrit.asterisk.org/9330] By: Friendly Automation (friendly-automation) 2018-07-09 07:11:49.810-0500 Change 9331 merged by Joshua Colp: res_pjsip/pjsip_transport_management.c: Fix deadlock with transport keep alive. [https://gerrit.asterisk.org/9331|https://gerrit.asterisk.org/9331] By: Friendly Automation (friendly-automation) 2018-07-09 07:16:10.471-0500 Change 9332 merged by Joshua Colp: res_pjsip/pjsip_transport_management.c: Fix deadlock with transport keep alive. [https://gerrit.asterisk.org/9332|https://gerrit.asterisk.org/9332] By: Friendly Automation (friendly-automation) 2018-08-28 11:58:40.842-0500 Change 10003 merged by Kevin Harwell: res_pjsip/pjsip_transport_management.c: Fix deadlock with transport keep alive. [https://gerrit.asterisk.org/10003|https://gerrit.asterisk.org/10003] |