Summary:ASTERISK-26291: res_pjsip_session: segfault on already disconnected session
Reporter:Alexei Gradinari (alexei gradinari)Labels:
Date Opened:2016-08-11 16:47:08Date Closed:2017-03-03 06:20:24.000-0600
Versions:13.10.0 Frequency of
Environment:Attachments:( 0) bt_20160812.txt
( 1) bt_full_208160811.txt
( 2) pjproject_log.txt
Description:On heavy loaded system the TCP/TLS incoming calls could be
disconnected by pjproject while these calls are being
processed by asterisk which could use the session's memory pools.
If the session in the disconnected state then the session memory
pools were already freed, so we get segfault.
Comments:By: Asterisk Team (asteriskteam) 2016-08-11 16:47:09

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Alexei Gradinari (alexei gradinari) 2016-08-11 16:47:59.017-0500

Full backtrace

By: Alexei Gradinari (alexei gradinari) 2016-08-11 16:49:21.578-0500

pjproject WARNING/ERROR log about Failed sending because of Broken pipe before segfault

By: Alexei Gradinari (alexei gradinari) 2016-08-12 15:23:28.234-0500

new segfault backtrace on handle_incoming_sdp

By: Joshua C. Colp (jcolp) 2016-08-15 05:15:04.869-0500

Per my comment on the review I think we need a full Asterisk log and full backtrace with all threads to understand how exactly the off-nominal situation happened and whether it's the appropriate fix or not.

By: Joshua C. Colp (jcolp) 2016-08-17 05:10:07.670-0500

Copy/pasting from Gerrit:
I used SIPp to stress test asterisk using TLS.
The scenario:
SIPp-sender: INVITE transport:TLS -> ASTERISK
ASTERISK: INVITE transport:TLS -> SIPp-receiver
SIPp-receiver: 200 OK with sdp -> ASTERISK
ASTERISK: 200 OK with sdp -> SIPp-sender
If SIPp-sender terminates TCP connection than
the pjproject calls on_tsx_state_changed with state PJSIP_EVENT_TRANSPORT_ERROR.
I think session_inv_on_tsx_state_changed is run on pjsip monitor thread,
at the same time there may be task in the queue of the session serializer.
So when taskprocessor execs the function new_invite,
the session is already in disconnected state.
the function session_inv_on_tsx_state_changed.

By: Joshua C. Colp (jcolp) 2016-08-17 05:11:26.839-0500

[~alexei gradinari] If you could attach what I mentioned it would be great, so that others can take a look and come to a complete solution. If not someone else will have to lab it up like you have and see.

By: Rusty Newton (rnewton) 2016-08-24 09:51:58.825-0500

Opening this up since discussion is happening in Gerrit.