Summary:ASTERISK-27706: PJSIP: Deadlock shutting down subscription TCP connection and sending subscription message.
Reporter:Ross Beer (rossbeer)Labels:pjsip
Date Opened:2018-02-27 21:52:07.000-0600Date Closed:2018-04-18 17:22:05
Versions:13.19.2 GIT Frequency of
Environment:Fedora 23Attachments:( 0) core.174244-thread1.txt
( 1) core.6829-thread1.txt
( 2) Thread_103.txt
Description:A deadlock can happen when the PJSIP monitor thread is shutting down a connection oriented transport (TCP/TLS) used by a subscription at the same time as another thread tries to send something for that subscription.  The deadlock is between the pjsip monitor thread attempting to get the dialog lock and another thread sending something for that dialog when it tries to get the transport manager lock.

To verify this deadlock:
* Get a full backtrace of all threads when in deadlock.
* Search for the pjsip monitor thread.  It will be the one executing monitor_thread_exec().
* The monitor thread will be attempting to get the dlg (dialog) lock.
* Search for another thread doing something with the same dlg pointer.  It will be trying to send something.
* That thread will be attempting to get the tpmgr/mgr (transport manager) lock.
* The tpmgr/mgr pointer will be the same as the one being used by the pjsip monitor thread since it is trying to shut that transport down.
Comments:By: Asterisk Team (asteriskteam) 2018-02-27 21:52:08.075-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Joshua C. Colp (jcolp) 2018-03-01 06:21:09.526-0600

That thread isn't deadlocked.

The ast_cond_wait function gives up the lock it is provided and waits on the condition for another thread to signal it to wake up and run.

By: Ross Beer (rossbeer) 2018-03-01 06:43:55.386-0600

Ok, PJSIP stops processing when this happens. I've sent George Joseph the full core dump to attach to the internal ticket. Hopefully, this will provide more information.

By: Ross Beer (rossbeer) 2018-03-21 04:34:28.104-0500

After rolling back the change for issue ASTERISK-27568 I have not had the deadlock occur again.

By: Friendly Automation (friendly-automation) 2018-04-18 17:22:07.753-0500

Change 8703 merged by Jenkins2:
res_pjsip: Fix deadlock on reliable transport shutdown.


By: Friendly Automation (friendly-automation) 2018-04-18 17:34:20.225-0500

Change 8705 merged by Joshua Colp:
res_pjsip: Fix deadlock on reliable transport shutdown.


By: Friendly Automation (friendly-automation) 2018-04-18 17:36:11.240-0500

Change 8704 merged by Jenkins2:
res_pjsip: Fix deadlock on reliable transport shutdown.