Summary: | ASTERISK-25274: A11 SIGSEGV 'Double free or corruption' in backtrace from pj_pool_release (sip_destroy -> pj_ice_sess_destroy) | ||||
Reporter: | Dade Brandon (dade) | Labels: | |||
Date Opened: | 2015-07-22 12:37:40 | Date Closed: | 2020-01-14 11:13:45.000-0600 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | |||
Versions: | 11.18.0 | Frequency of Occurrence | Frequent | ||
Related Issues: |
| ||||
Environment: | Ubuntu 14.04.2; Linux 3.13.0-24-generic SMP; Intel E3-1231 Openssl 1.0.1f-1ubuntu2.15 (Jun 11 2015; most recent available) libsrtp0 / libsrtp0-dev 1.4.5~20130609~dfsg-1 | Attachments: | ( 0) 7-2-phx-debug-aug18c.txt.gz ( 1) 7-2-phx-fullbt-aug18c.txt ( 2) fenrir-debug-july23.txt.gz ( 3) fenrir-fullbt-jul23.txt ( 4) narvi-backtrace-july_22_2015.txt ( 5) Narvi_debug_log_jul_22_917.p.txt.gz | ||
Description: | We have the patch from ASTERISK-25103 added to trunk 11 with a few custom patches (mostly just debug messages). The following crash occurs infrequently (1-5 times per week, usually batched together and on the same server(s); based on the pattern I imagine that there is a remote factor in whether or not the crash occurs, such as a slow peer )
The full backtrace with some added print *var's attached, as well as debug log will be attached in a sec after I create this issue, below is the top chunk from the backtrace to assist with reviewing this issue. {noformat} Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) #1 __GI_abort () #2 __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7f548a7b6b28 "*** Error in `%s': %s: 0x%s ***\n") #3 malloc_printerr (ptr=<optimized out>, str=0x7f548a7b6c58 "double free or corruption (out)", action=1) #4 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) #5 default_block_free () #6 pj_pool_destroy_int () #7 cpool_release_pool () #8 pj_pool_release () #9 destroy_tdata () #10 pj_stun_session_destroy () {noformat} | ||||
Comments: | By: Asterisk Team (asteriskteam) 2015-07-22 12:37:42.389-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Dade Brandon (dade) 2015-07-22 12:47:09.800-0500 Gzip of the debug log --- this is the last five minutes before the crash (identified by asterisk starting back up on the last line) -- the spam of "No remote address on RTP instance '....' so dropping frame" is unique to this issue, noting that the call IDs and RTP instances are different - we occasionally see this on one RTP instance, but lately we've been getting this across multiple RTP instances right before a crash. By: Nicole McIntosh (atna99) 2015-07-23 16:05:36.681-0500 Another crash, looks like the same source issue. Debug and full backtrace added. By: Rusty Newton (rnewton) 2015-07-23 17:37:49.064-0500 In the case of potential memory corruption we typically need Valgrind or MALLOC_DEBUG output to make any progress. If the issue only occurs on a production system then MALLOC_DEBUG may be your only option. https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag By: Dade Brandon (dade) 2015-07-23 19:49:05.165-0500 We will need to sleep this issue for ~ a week when we can get MALLOC_DEBUG in on all servers, and then from there until the crash is reproduced. By: Rusty Newton (rnewton) 2015-07-24 09:05:08.873-0500 I'll ask a developer to take a look at it in the meantime as well. By: Mark Michelson (mmichelson) 2015-07-24 09:56:04.132-0500 I'm going to jump in here and say that MALLOC_DEBUG is not going to help here since the malloc error is down inside PJLib. MALLOC_DEBUG does not intercept those allocations. By: Rusty Newton (rnewton) 2015-07-24 18:51:18.732-0500 Dade can you post new logs when the issue occurs next, with the new logs including a SIP trace? (sip set debug on) By: Asterisk Team (asteriskteam) 2015-08-15 12:00:23.010-0500 Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1]. [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines By: Nicole McIntosh (atna99) 2015-08-18 17:16:03.310-0500 Same issue "Double free or corruption" in backtrace. Full debug with sip tracing on, also full backtrace attached. |