Summary: | ASTERISK-25602: chan_sip deadlocks after INVITE processing while calling sip_report_security_event | ||||
Reporter: | Antonis Psaras (apsaras) | Labels: | |||
Date Opened: | 2015-12-02 03:50:26.000-0600 | Date Closed: | 2016-09-07 07:46:09 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Channels/chan_sip/General Channels/chan_sip/Security Framework | ||
Versions: | 13.5.0 | Frequency of Occurrence | Frequent | ||
Related Issues: |
| ||||
Environment: | Centos 6 64bit on XenServer VM, | Attachments: | ( 0) backtrace-threads.txt ( 1) backtrace-threads.txt ( 2) core-show-locks.txt ( 3) core-show-locks.txt ( 4) full.bz2 ( 5) full.txt ( 6) myDebugLog.txt | ||
Description: | Asterisk is freezing randomly and not responding to cli commands or SIP messages.
Restarting the service the system comes up | ||||
Comments: | By: Asterisk Team (asteriskteam) 2015-12-02 03:50:29.219-0600 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Antonis Psaras (apsaras) 2015-12-02 03:52:08.345-0600 The locks and backtrace By: Rusty Newton (rnewton) 2015-12-02 11:55:22.589-0600 We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis. Thanks! [1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information By: Rusty Newton (rnewton) 2015-12-02 11:56:05.508-0600 Thanks for the trace and core show locks. Can you also provide a debug log as described in the previous comment to demonstrate what is happening at the time of the locking situation? Are you able to reproduce the issue at will? By: Antonis Psaras (apsaras) 2015-12-03 02:02:00.725-0600 Will do next time it happens. Thank you for your quick response. By: Rusty Newton (rnewton) 2015-12-07 18:43:03.530-0600 Thanks! By: Antonis Psaras (apsaras) 2015-12-22 04:00:02.859-0600 New backtrace files and locks By: Antonis Psaras (apsaras) 2015-12-22 04:04:54.167-0600 DebugLog By: Antonis Psaras (apsaras) 2015-12-22 04:11:39.900-0600 Full Trace By: Rusty Newton (rnewton) 2015-12-22 18:58:06.275-0600 Looks like it locked up within the first few minutes after starting. Is that the case? Did the log output halt then? By: Matt Jordan (mjordan) 2015-12-24 08:10:30.037-0600 The issue is this thread: {code} === Thread ID: 0x7f8cd17f3700 LWP:32073 (do_monitor started at [28915] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 28289 handle_request_do &netlock 0x7f8cee317de0 (1) main/backtrace.c:59 __ast_bt_get_addresses() (0x467928+1D) main/lock.c:258 __ast_pthread_mutex_lock() (0x536a5b+BA) channels/chan_sip.c:28292 handle_request_do() channels/chan_sip.c:28251 sipsock_read() main/io.c:292 ast_io_wait() (0x52e197+19C) channels/chan_sip.c:28882 do_monitor() main/utils.c:1237 dummy_start() pthread_create.c:0 start_thread() :0 __clone() (0x7f8d33c158d0+6D) === ---> Lock #1 (chan_sip.c): MUTEX 8049 sip_new tmp 0x7f8d2c374400 (1) main/backtrace.c:59 __ast_bt_get_addresses() (0x467928+1D) main/lock.c:258 __ast_pthread_mutex_lock() (0x536a5b+BA) main/astobj2.c:167 __ao2_lock() (0x45c6d2+96) channels/chan_sip.c:8050 sip_new() channels/chan_sip.c:25678 handle_request_invite() channels/chan_sip.c:28110 handle_incoming() channels/chan_sip.c:28319 handle_request_do() channels/chan_sip.c:28251 sipsock_read() main/io.c:292 ast_io_wait() (0x52e197+19C) channels/chan_sip.c:28882 do_monitor() main/utils.c:1237 dummy_start() pthread_create.c:0 start_thread() :0 __clone() (0x7f8d33c158d0+6D) === ---> Lock #2 (chan_sip.c): MUTEX 8050 sip_new i 0x7f8d2c2f2a70 (1) main/backtrace.c:59 __ast_bt_get_addresses() (0x467928+1D) main/lock.c:258 __ast_pthread_mutex_lock() (0x536a5b+BA) main/astobj2.c:167 __ao2_lock() (0x45c6d2+96) channels/chan_sip.c:8051 sip_new() channels/chan_sip.c:25678 handle_request_invite() channels/chan_sip.c:28110 handle_incoming() channels/chan_sip.c:28319 handle_request_do() channels/chan_sip.c:28251 sipsock_read() main/io.c:292 ast_io_wait() (0x52e197+19C) channels/chan_sip.c:28882 do_monitor() main/utils.c:1237 dummy_start() pthread_create.c:0 start_thread() :0 __clone() (0x7f8d33c158d0+6D) === ---> Waiting for Lock #3 (pbx.c): MUTEX 12251 ast_rdlock_contexts &conlock 0x979f20 (1) main/backtrace.c:59 __ast_bt_get_addresses() (0x467928+1D) main/lock.c:258 __ast_pthread_mutex_lock() (0x536a5b+BA) main/pbx.c:12252 ast_rdlock_contexts() (0x58cfee+23) main/pbx.c:4940 pbx_extension_helper() main/pbx.c:6034 ast_spawn_extension() (0x578c1a+65) main/pbx.c:6450 __ast_pbx_run() main/pbx.c:6771 pbx_thread() main/utils.c:1237 dummy_start() pthread_create.c:0 start_thread() :0 __clone() (0x7f8d33c158d0+6D) === --- ---> Locked Here: pbx.c line 12251 (ast_rdlock_contexts) {code} We can't hold the channel/pvt lock while accessing the dialplan. Interestingly, it appears as if we're doing that not via one of the usual places - in this case, we're looking up the peer via a SIP security event due to the INVITE request, apparently after creating the channel but while the channel lock is still held: {code} Thread 48 (Thread 0x7f8cd17f3700 (LWP 32073)): #0 0x00007f8d348942e4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8d3488f5a3 in _L_lock_892 () from /lib64/libpthread.so.0 #2 0x00007f8d3488f487 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000536bef in __ast_pthread_mutex_lock (filename=0x6d612b "pbx.c", lineno=12251, func=0x6dc410 "ast_rdlock_contexts", mutex_name=0x6dc23d "&conlock", t=0x979f20) at lock.c:313 #4 0x000000000058d011 in ast_rdlock_contexts () at pbx.c:12251 #5 0x0000000000575937 in pbx_extension_helper (c=0x0, con=0x0, context=0x7f8cee31a2f0 "sipregistrations", exten=0x7f8cd17eef40 "6004", priority=1, label=0x0, callerid=0x0, action=E_MATCH, found=0x0, combined_find_spawn=0) at pbx.c:4939 #6 0x0000000000578a65 in ast_exists_extension (c=0x0, context=0x7f8cee31a2f0 "sipregistrations", exten=0x7f8cd17eef40 "6004", priority=1, callerid=0x0) at pbx.c:6008 #7 0x00007f8cee04d51c in register_peer_exten (peer=0x7f8d2c29e7b8, onoff=1) at chan_sip.c:5007 #8 0x00007f8cee083206 in reg_source_db (peer=0x7f8d2c29e7b8) at chan_sip.c:16069 #9 0x00007f8cee0d6835 in build_peer (name=0x7f8d2c3273b4 "6004", v=0x0, alt=0x0, realtime=1, devstate_only=0) at chan_sip.c:31112 #10 0x00007f8cee04e9dc in realtime_peer (newpeername=0x7f8d2c3273b4 "6004", addr=0x0, callbackexten=0x0, devstate_only=0, which_objects=2) at chan_sip.c:5450 #11 0x00007f8cee04f017 in sip_find_peer_full (peer=0x7f8d2c3273b4 "6004", addr=0x0, callbackexten=0x0, realtime=1, which_objects=2, devstate_only=0, transport=0) at chan_sip.c:5545 #12 0x00007f8cee04f0d9 in sip_find_peer (peer=0x7f8d2c3273b4 "6004", addr=0x0, realtime=1, which_objects=2, devstate_only=0, transport=0) at chan_sip.c:5584 #13 0x00007f8cee0f0900 in sip_report_security_event (p=0x7f8d2c2f2ac8, req=0x7f8cd17f21b0, res=0) at sip/security_events.c:291 #14 0x00007f8cee0c1baa in handle_incoming (p=0x7f8d2c2f2ac8, req=0x7f8cd17f21b0, addr=0x7f8cd17f2120, recount=0x7f8cd17f20cc, nounlock=0x7f8cd17f20c8) at chan_sip.c:28113 #15 0x00007f8cee0c26d8 in handle_request_do (req=0x7f8cd17f21b0, addr=0x7f8cd17f2120) at chan_sip.c:28319 {code} I don't think this would happen on a nominal path; I suspect that something off nominal happens and we don't properly unlock the channel/pvt on one of the code paths. [~apsaras]: can you provide a debug log with 'sip set debug' enabled? That would show the INVITE request and its processing. By: Asterisk Team (asteriskteam) 2016-01-07 12:00:24.055-0600 Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1]. [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines By: Rusty Newton (rnewton) 2016-01-21 09:01:57.654-0600 This was closed too early. Reopening. By: Antonis Psaras (apsaras) 2016-01-21 13:35:55.385-0600 Will provide a debug log with 'sip set debug' enabled next time it happens. Thank you By: Antonis Psaras (apsaras) 2016-01-26 10:52:45.735-0600 Please find attached a new log with 'sip set debug on' By: Antonis Psaras (apsaras) 2016-05-11 04:30:26.430-0500 Any update? By: Matt Jordan (mjordan) 2016-09-07 07:45:39.862-0500 This deadlock is being caused by usage of {{pbx_realtime}}, as shown in this part of the backtrace: {noformat} #12 0x000000000056b612 in pbx_exec (c=0x7f8d2c19e7a8, app=0x29754d0, data=0x7f8cce7f1ea0 "SIP/eps01/6007,45,b(11)") at pbx.c:1720 #13 0x00007f8cde23de70 in realtime_exec (chan=0x7f8d2c19e7a8, context=0x7f8d2c19f160 "DefaultPlan", exten=0x7f8d2c19f1b0 "6007", priority=2, callerid=0x7f8d2c0a36c0 "2111057800", data=0x296f6a1 "DefaultPlan") at pbx_realtime.c:352 #14 0x0000000000575f25 in pbx_extension_helper (c=0x7f8d2c19e7a8, con=0x0, context=0x7f8d2c19f160 "DefaultPlan", exten=0x7f8d2c19f1b0 "6007", priority=2, label=0x0, callerid=0x7f8d2c0a36c0 "2111057800", action=E_SPAWN, found=0x7f8cce7f4b00, combined_find_spawn=1) at pbx.c:5005 {noformat} Closing out as a duplicate. |