Summary: | ASTERISK-29411: Crash in pjsip_msg_find_hdr_by_name | ||||||
Reporter: | LA (learbia) | Labels: | patch pjsip | ||||
Date Opened: | 2021-04-30 12:42:43 | Date Closed: | 2022-03-30 15:13:04 | ||||
Priority: | Critical | Regression? | |||||
Status: | Closed/Complete | Components: | Resources/res_pjsip_nat | ||||
Versions: | 18.9.0 19.1.0 | Frequency of Occurrence | Constant | ||||
Related Issues: |
| ||||||
Environment: | Attachments: | ( 0) core-brief.txt ( 1) core-info.txt ( 2) core-locks.txt ( 3) core-thread1.txt ( 4) Log_full_asterisk_complete.txt ( 5) use_tdata_pool.diff | |||||
Description: | Link with original gdb https://drive.google.com/file/d/1TjoMHSIOnhJNg4Q8ytid5g0_SUzMKCrB/view?usp=sharing
| ||||||
Comments: | By: Asterisk Team (asteriskteam) 2021-04-30 12:42:43.684-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur. Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/]. By: Joshua C. Colp (jcolp) 2021-05-02 10:19:53.458-0500 We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis. Thanks! [1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information By: LA (learbia) 2021-05-02 11:22:49.127-0500 The hour of crash is Apr 30 14:21:07 VERBOSE[35982] I Started asterisk again in Apr 30 14:23:18 Attachment /var/log/asterisk/full core-info.txt core-locks.txt core-full.txt core-brief.txt By: Joshua C. Colp (jcolp) 2021-05-02 12:22:53.215-0500 The attached full log contains only verbose messages, and not the debug messages as requested. By: LA (learbia) 2021-05-03 15:19:11.508-0500 Done By: Benjamin Keith Ford (bford) 2021-05-04 11:38:12.410-0500 The logs didn't have any information on what's going on. From the backtrace, it looks like something is going on in pjproject when checking headers. Can we get a core file to examine in gdb? By: LA (learbia) 2021-05-04 12:30:08.999-0500 Yes: Link with core GDB https://drive.google.com/file/d/1TjoMHSIOnhJNg4Q8ytid5g0_SUzMKCrB/view?usp=sharing By: Benjamin Keith Ford (bford) 2021-05-04 14:02:18.123-0500 Thanks! I've uploaded the core file to our internal drive. You can remove it from yours now if you'd like. Also created an internal issue to investigate this further. By: Nathan Bruning (nathanb) 2021-06-28 06:08:15.489-0500 We believe this is the same segfault as we're seeing in ASTERISK-27952. This crash happens about once a week for our asterisk cluster. We've seen the crash happening in asterisk 15.3.0, then 16.3.0, then 16.13.0. What we've gathered so far: - During most of the crashes, we see some kind of network connectivity issues. - {{msg}} contains this header list: {{Via,From,To,Contact,Callid,Cseq,Allow,Supported,Session-Expires,Min-SE,<corrupt>}} - We have set a custom header on the call, which is not present in the header list - This happens during an outbound dial attempt We've tried reproducing by simulating packet loss, but no success so far. Any updates in https://jira.digium.com/browse/SWP-11499 perhaps? By: Joshua C. Colp (jcolp) 2021-06-28 06:14:41.718-0500 Any updates are posted on the ASTERISK issue, there is nothing on the SWP. By: LA (learbia) 2021-07-06 16:10:40.627-0500 I did more tests and this happen when I lost connection with Internet. Maybe Reverse DNS , or DNS LOST By: LA (learbia) 2021-10-22 10:32:35.182-0500 Always we have problem with connections, (DNS, Lost pacteks) we have a crash. By: LA (learbia) 2021-10-22 10:34:24.918-0500 The same problem related by @Nathan Bruning By: LA (learbia) 2022-02-02 15:26:59.100-0600 Tests with last version, we have the same problems: {noformat} !@!@!@! thread1.txt !@!@!@! $1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {33, 0 <repeats 27 times>}, _kill = {si_pid = 33, si_uid = 0}, _timer = {si_tid = 33, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 33, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 33, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x21}, _sigpoll = {si_band = 33, si_fd = 0}}} Signal Stop Print Pass to program Description SIGSEGV Yes Yes Yes Segmentation fault Thread 1 (Thread 0x7fd4f48d2700 (LWP 2756)): #0 pj_stricmp (str1=str1@entry=0x19, str2=str2@entry=0x7fd4f48d1bc0) at ../include/pj/string_i.h:216 No locals. #1 0x00007fdabbe25164 in pjsip_msg_find_hdr_by_name (msg=0x7fd88403ea80, name=name@entry=0x7fd4f48d1bc0, start=start@entry=0x0) at ../src/pjsip/sip_msg.c:362 hdr = 0x1 end = 0x7fd88403eaa8 #2 0x00007fda21e9778b in add_headers_to_message (headers=headers@entry=0x7fda220b9bc0 <request_headers>, tdata=0x7fd8843e9598) at res_pjsip/pjsip_global_headers.c:93 name = {ptr = 0x7fd9b026f1aa "Max-Forwards", slen = 12} hdr = <optimized out> iter = 0x7fd9b026f140 lock = 0x7fda220b9bc0 <request_headers> __PRETTY_FUNCTION__ = "add_headers_to_message" #3 0x00007fda21e97843 in add_request_headers (tdata=<optimized out>) at res_pjsip/pjsip_global_headers.c:105 No locals. #4 0x00007fdabbe2df76 in endpt_on_tx_msg (endpt=<optimized out>, tdata=0x7fd8843e9598) at ../src/pjsip/sip_endpoint.c:1116 status = 0 mod = 0x7fda220b8e40 <global_header_mod> #5 0x00007fdabbe35120 in pjsip_transport_send (tr=0x7058c48, tdata=0x7fd8843e9598, addr=0x7fd8843e9788, addr_len=16, token=0x7fd88403f480, cb=0x7fdabbe2f730 <stateless_send_transport_cb>) at ../src/pjsip/sip_transport.c:931 status = <optimized out> #6 0x00007fdabbe2f8d0 in stateless_send_transport_cb (token=0x7fd88403f480, tdata=0x7fd8843e9598, sent=16) at ../src/pjsip/sip_util.c:1277 cont = 1 cur_addr = 0x7fd8843e9788 cur_addr_len = 16 via = 0x7fd88403eb90 need_update_via = 1 #7 0x00007fdabbe2fc92 in stateless_send_resolver_callback (status=<optimized out>, token=0x7fd88403f480, addr=<optimized out>) at ../src/pjsip/sip_util.c:1378 stateless_data = 0x7fd88403f480 tdata = 0x7fd8843e9598 #8 0x00007fda21e7f374 in sip_resolve_invoke_user_callback (data=0x7fd884100eb8) at res_pjsip/pjsip_resolver.c:208 __FUNCTION__ = "sip_resolve_invoke_user_callback" __PRETTY_FUNCTION__ = "sip_resolve_invoke_user_callback" #9 0x000000000059e11e in ast_taskprocessor_execute (tps=tps@entry=0x7fda6c5ee4f0) at taskprocessor.c:1237 local = {local_data = 0x7fd9ac2b6420, data = 0x7fda6c5ee4f0} t = 0x7fda4c01a4a0 __PRETTY_FUNCTION__ = "ast_taskprocessor_execute" {noformat} By: N A (InterLinked) 2022-02-10 06:20:43.599-0600 FYI, this backtrace from ASTERISK-29846 appears to be related to this issue: https://issues.asterisk.org/jira/secure/attachment/61194/core-asterisk-2022-02-10T00-46-55Z-full.txt By: Kevin Harwell (kharwell) 2022-02-28 12:59:46.484-0600 The previous core dump didn't have any of the supporting Asterisk modules, so I'm unable to load it locally. Next time it crashes please use the {{ast_coredumper}} script to collect the full core file and libraries, and then host the output somewhere accessible. Here's an example of the command to run after a crash: {noformat} $ /var/lib/asterisk/scripts/ast_coredumper --tarball-coredumps --no-default-search <path_to_coredump> {noformat} Also what OS is in use? Thanks! By: LA (learbia) 2022-03-05 14:24:50.101-0600 I use debian This a full dump https://drive.google.com/file/d/1F-GO25T-LH5va9A31ujTWkvfWIYBGppV/view?usp=sharing By: Kevin Harwell (kharwell) 2022-03-09 10:48:15.148-0600 [~learbia] Thanks for the full dump it helped! I have still been unable to replicate the issue, but after some investigation I found an area of suspect in the code. Please apply the attached [^use_tdata_pool.diff] patch, and let me know if it fixes the issue for you. By: LA (learbia) 2022-03-09 11:10:34.927-0600 Ok, I will apply. Tks. By: LA (learbia) 2022-03-17 10:00:10.202-0500 Today I have a crash: traps: asterisk[14122] general protection ip:45edfc sp:7fd3c9fc43a0 error:0 in asterisk[400000+2f0000] By: Kevin Harwell (kharwell) 2022-03-17 10:39:48.340-0500 Argh was hoping that would have fixed it. Thanks for testing. When you get a chance could you run {{ast_coredumper}} against it and upload the files someplace again? By: LA (learbia) 2022-03-17 13:09:24.155-0500 Yes, I will to do it. I think it could be another problem By: Kevin Harwell (kharwell) 2022-03-17 13:23:25.161-0500 If you feel fairly confident that the problem is something else then go ahead and create a new issue for that, and add the core files/backtrace information there. As well as any relevant logging if you have it. By: Kevin Harwell (kharwell) 2022-03-23 18:01:34.320-0500 I've put the attached patch up for review as it seems like it might have fixed this problem (even if later found to not be the exact cause it's something that still needed to be fixed). Note, once the patch is submitted this issue will auto-close. If later it's found the issue is still a problem then commenting as such will reopen the issue. By: LA (learbia) 2022-03-25 11:28:58.591-0500 What do you think? {noformat} !@!@!@! thread1.txt !@!@!@! $1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {1, 0 <repeats 27 times>}, _kill = {si_pid = 1, si_uid = 0}, _timer = {si_tid = 1, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 1, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 1, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x1}, _sigpoll = {si_band = 1, si_fd = 0}}} Signal Stop Print Pass to program Description SIGSEGV Yes Yes Yes Segmentation fault Thread 1 (Thread 0x7fb5a98ef700 (LWP 5307)): #0 0x000000000045ec1c in __ao2_ref (user_data=0x5, delta=1, tag=tag@entry=0x0, file=file@entry=0x7fb682484452 "res_pjsip_session.c", line=3639, func=0x7fb682487f40 <__PRETTY_FUNCTION__.35305> "ast_sip_dialog_get_session") at astobj2.c:501 obj = 0xffffffffffffffed current_value = -2109178048 weakproxy = 0x7fb6e4245798 __PRETTY_FUNCTION__ = "__ao2_ref" #1 0x00007fb682483272 in ast_sip_dialog_get_session (dlg=<optimized out>) at res_pjsip_session.c:3639 inv_session = <optimized out> session = 0x5 __PRETTY_FUNCTION__ = "ast_sip_dialog_get_session" #2 0x00007fb6824839a2 in session_outgoing_nat_hook (tdata=0x7fb4993af628, transport=0x7fb6e4245798) at res_pjsip_session.c:5486 transport_state = 0x7fb6e4245c30 hook = 0x0 sdp_info = 0x7fb550843de8 dlg = 0x7fb550843de8 session = 0x0 __FUNCTION__ = "session_outgoing_nat_hook" __PRETTY_FUNCTION__ = "session_outgoing_nat_hook" #3 0x00007fb6822635d9 in nat_invoke_hook (obj=<optimized out>, arg=<optimized out>, flags=<optimized out>) at res_pjsip_nat.c:300 hook = <optimized out> details = <optimized out> #4 0x00000000004614dd in internal_ao2_traverse (self=self@entry=0x7fb5b13dc348, flags=flags@entry=OBJ_SEARCH_NONE, cb_fn=cb_fn@entry=0x7fb6822635c0 <nat_invoke_hook>, arg=arg@entry=0x7fb5a98eeb00, tag=tag@entry=0x0, file=file@entry=0x7fb6822649b9 "res_pjsip_nat.c", line=471, func=0x7fb682264e5e <__PRETTY_FUNCTION__.26624> "process_nat", type=AO2_CALLBACK_DEFAULT, data=0x0) at astobj2_container.c:328 match = 3 ret = 0x0 cb_default = 0x7fb6822635c0 <nat_invoke_hook> node = 0x7fb5b02d45d8 traversal_state = 0x7fb5a98ee9a0 orig_lock = AO2_LOCK_REQ_MUTEX multi_container = 0x0 multi_iterator = 0x0 #5 0x0000000000461965 in __ao2_callback (c=c@entry=0x7fb5b13dc348, flags=flags@entry=OBJ_SEARCH_NONE, cb_fn=cb_fn@entry=0x7fb6822635c0 <nat_invoke_hook>, arg=arg@entry=0x7fb5a98eeb00, tag=tag@entry=0x0, file=file@entry=0x7fb6822649b9 "res_pjsip_nat.c", line=471, func=0x7fb682264e5e <__PRETTY_FUNCTION__.26624> "process_nat") at astobj2_container.c:414 No locals. {noformat} By: Kevin Harwell (kharwell) 2022-03-25 11:41:19.820-0500 That's crashing in a different spot. It looks like the Asterisk sip session itself has gone way. It is possible though that while the attached patch might have fixed the original issue it uncovered another crash further along. Either way I'd say go ahead and create a new issue for this new crash. When you do please attach the relevant backtrace and any other information you might think would be helpful. Thanks! By: LA (learbia) 2022-03-25 14:58:23.974-0500 Thank you so much. I will create a new issue. By: Friendly Automation (friendly-automation) 2022-03-30 15:13:05.568-0500 Change 18253 merged by Friendly Automation: res_pjsip_header_funcs: wrong pool used tdata headers [https://gerrit.asterisk.org/c/asterisk/+/18253|https://gerrit.asterisk.org/c/asterisk/+/18253] By: Friendly Automation (friendly-automation) 2022-03-30 15:13:56.959-0500 Change 18273 merged by Friendly Automation: res_pjsip_header_funcs: wrong pool used tdata headers [https://gerrit.asterisk.org/c/asterisk/+/18273|https://gerrit.asterisk.org/c/asterisk/+/18273] By: Friendly Automation (friendly-automation) 2022-03-30 15:15:53.058-0500 Change 18272 merged by Kevin Harwell: res_pjsip_header_funcs: wrong pool used tdata headers [https://gerrit.asterisk.org/c/asterisk/+/18272|https://gerrit.asterisk.org/c/asterisk/+/18272] By: Friendly Automation (friendly-automation) 2022-03-30 16:09:31.675-0500 Change 18274 merged by Kevin Harwell: res_pjsip_header_funcs: wrong pool used tdata headers [https://gerrit.asterisk.org/c/asterisk/+/18274|https://gerrit.asterisk.org/c/asterisk/+/18274] |