Summary: | ASTERISK-30445: stasis: Off-nominal channel leave causes bridge to be destroyed | ||||
Reporter: | Quality Unit (adminQU) | Labels: | |||
Date Opened: | 2023-02-28 04:30:27.000-0600 | Date Closed: | |||
Priority: | Minor | Regression? | |||
Status: | Open/New | Components: | Resources/res_stasis | ||
Versions: | 18.16.0 20.1.0 | Frequency of Occurrence | Occasional | ||
Related Issues: |
| ||||
Environment: | CentOS 7, kernel 3.10.0-1160.83.1.el7.x86_64. | Attachments: | ( 0) asterisk_dump_logs_2023-04-17_core.18037.zip ( 1) asterisk_dump_logs_2023-04-17_core.26817.zip ( 2) asterisk_dump_logs_2023-04-17-10_core.1913.zip ( 3) asterisk_incident_1000_lines_before_and_after.txt ( 4) core_asterisk.zip ( 5) module_show_output&logs_before_incident.txt.zip | ||
Description: | This started to happen after upgrade to `18.16.0` from `16.29.0`. Asterisk starting to spam logs with similar messages like:
{code:none} [2023-02-24 09:30:57.001] ERROR[11361] stasis.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f7ec4031d10 (0) [2023-02-24 09:30:57.006] ERROR[11361] : Got 14 backtrace records # 0: /usr/sbin/asterisk() [0x459ae6] # 1: /usr/sbin/asterisk(__ao2_ref+0x79) [0x459fc9] # 2: /usr/sbin/asterisk(stasis_forward_all+0x83) [0x555283] # 3: /usr/lib64/asterisk/modules/res_stasis.so(+0x107e1) [0x7f7f62a7e7e1] # 4: /usr/lib64/asterisk/modules/res_stasis.so(+0x14e2a) [0x7f7f62a82e2a] # 5: /usr/lib64/asterisk/modules/res_stasis.so(+0x12966) [0x7f7f62a80966] # 6: /usr/lib64/asterisk/modules/res_stasis.so(+0x1554e) [0x7f7f62a8354e] # 7: /usr/lib64/asterisk/modules/res_stasis.so(stasis_app_exec+0x569) [0x7f7f62a79789] # 8: /usr/lib64/asterisk/modules/app_stasis.so(+0xbd7) [0x7f7f4d6f4bd7] # 9: /usr/sbin/asterisk(pbx_exec+0xb9) [0x50db89] #10: /usr/lib64/asterisk/modules/res_ari_channels.so(+0xa528) [0x7f7f48594528] #11: /usr/sbin/asterisk() [0x57e80d] #12: /lib64/libpthread.so.0(+0x7ea5) [0x7f7fb932fea5] #13: /lib64/libc.so.6(clone+0x6d) [0x7f7fb88ff98d] {code} After around ~3 minutes of these (Asterisk during that time is already unresponsive), we have final errors before service crash: {code:none} [2023-02-24 09:32:48.153] ERROR[11361] astmm.c: Memory Allocation Failure in function topic_add_subscription at line 1214 of stasis.c {code} And, from `/var/log/messages`: {code:none} Feb 24 09:32:48 localhost kernel: asterisk[11361]: segfault at 1 ip 0000000000552827 sp 00007f7ed3e0a6b0 error 4 in asterisk[400000+2a2000] {code} It happens randomly, once, twice a day, sometime doesn't happen at all. Though, it's always happening when Asterisk is processing calls, so it's not happening during night (when asterisk is almost idle) | ||||
Comments: | By: Asterisk Team (asteriskteam) 2023-02-28 04:30:30.636-0600 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur. Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/]. By: Quality Unit (adminQU) 2023-02-28 04:31:46.040-0600 Attaching core files after crash. By: Joshua C. Colp (jcolp) 2023-02-28 04:38:57.569-0600 You stated this is in Asterisk 16.29.0, yet this is tagged against 18.16.0. Has this occurred in 18? Asterisk 16 is security fix only. By: Quality Unit (adminQU) 2023-02-28 04:44:13.799-0600 Yes, this is occurring in 18.16.0, I said that it has started after upgrade, we haven't spotted that on Asterisk 16. By: Joshua C. Colp (jcolp) 2023-02-28 04:55:23.567-0600 I misread the description. This crash is most likely a symptom of a memory leak somewhere. It's crashing because of a failure to allocate memory. We'd need memory leak information[1] to try to determine where it is leaking. Information about how exactly you use Asterisk would also be useful - what parts of Asterisk, what channel drivers, etc. [1] https://wiki.asterisk.org/wiki/display/AST/Memory+Leak+Debugging By: Quality Unit (adminQU) 2023-02-28 09:21:46.161-0600 That's interesting, I reported possible memory leak issue in https://community.asterisk.org/t/possible-memory-leak-in-asterisk-v16-29-0/94828/2, but it was low priority since adding more memory was enough to mitigate it. We use a self-written Python application, that uses ARI to control call flows, manage channels, play audio files, and much more. OpenSIPS host used as a load balancer and two Asterisk hosts are behind it, registrations processed on OpenSIPS. I'm attaching another archive with `module show` output and VERBOSE logs seconds before the incident. Please, let me know if you want me to provide more details, I'll get memory leak info in the meantime. By: Asterisk Team (asteriskteam) 2023-03-16 12:00:19.658-0500 Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1]. [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines By: Quality Unit (adminQU) 2023-04-18 04:54:37.648-0500 It was reproduced on version `20.1.0`, "Failed assertion..." happened 3 times in a row: {code:none} $ grep 'Failed assertion' full [2023-04-17 07:26:46.332] ERROR[1600] stasis.c: FRACK!, Failed assertion bad magic number 0xdeaddea for object 0x7fd6def2ca60 (0) [2023-04-17 07:26:46.334] ERROR[1600] stasis.c: FRACK!, Failed assertion bad magic number 0xdeaddea for object 0x7fd6def2ca60 (0) [2023-04-17 08:27:29.982] ERROR[17890] stasis.c: FRACK!, Failed assertion bad magic number 0xdeaddea for object 0x7f36b809f520 (0) [2023-04-17 08:27:29.983] ERROR[17890] stasis.c: FRACK!, Failed assertion bad magic number 0xdeaddea for object 0x7f36b809f520 (0) [2023-04-17 08:59:43.084] ERROR[13188] stasis.c: FRACK!, Failed assertion bad magic number 0xdeaddea for object 0x7f354c04a3d0 (0) [2023-04-17 08:59:43.085] ERROR[13188] stasis.c: FRACK!, Failed assertion bad magic number 0xdeaddea for object 0x7f354c04a3d0 (0) {code} Yet, there is nothing logged in `mmlog`: {code:none} $ sudo cat /var/log/asterisk/mmlog 1680796806 - New session 1680800798 - New session 1680801419 - New session 1680802085 - New session 1680802457 - New session 1680803107 - New session 1680805268 - New session 1680886819 - New session 1680887275 - New session 1681117108 - New session 1681124183 - New session 1681127745 - New session 1681131634 - New session 1681458248 - New session 1681716666 - New session 1681720295 - New session 1681722233 - New session {code} Asterisk was compiled with `DON'T OPTIMIZE` flag, so core dumps probably will have more info, attaching them here. By: Asterisk Team (asteriskteam) 2023-04-18 04:54:38.479-0500 This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable. By: Joshua C. Colp (jcolp) 2023-04-18 05:03:02.925-0500 The backtraces just show the trying to publish a message failed, due to the subscriber being NULL when it shouldn't be. It doesn't show how things got to that state and with the available information would most likely be impossible to determine. Providing all of the logging around the assertions (100-1000 lines, depending on usage) MAY provide further insight. Essentially have to determine the history of things. By: Joshua C. Colp (jcolp) 2023-04-18 05:09:23.876-0500 Does the memory allocation failure message still occur? By: Joshua C. Colp (jcolp) 2023-04-18 05:10:34.224-0500 And what is the output of "memory show summary"? By: Quality Unit (adminQU) 2023-04-18 05:11:04.848-0500 Was obtained with: {code:none} $ grep -A 1000 -B 1000 'Failed assertion' full > asterisk_incident_1000_lines_before_and_after.txt {code} Full log size is 1.6GB, let me know if you need more, I can try to compress/truncate it and share full. Asterisk is running with `-vvvvvdddddg` verbosity levels, so hopefully logs are as verbose as possible. By: Quality Unit (adminQU) 2023-04-18 05:16:03.540-0500 {code:none} $ sudo asterisk -x 'memory show summary' 48662696 bytes in 4049 allocations in file ../src/pj/pool_policy_malloc.c 480135 bytes in 635 allocations in file /opt/asterisk/include/asterisk/strings.h 158260 bytes in 520 allocations in file /opt/asterisk/include/asterisk/threadstorage.h 157508 bytes in 521 allocations in file abstract_jb.c 712 bytes in 1 allocations in file app_bridgewait.c 80 bytes in 1 allocations in file ari/ari_websockets.c 1057 bytes in 6 allocations in file ari/config.c 4174 bytes in 117 allocations in file ari/resource_channels.c 813 bytes in 3 allocations in file ari/resource_events.c 7730 bytes in 245 allocations in file asterisk.c 849408 bytes in 35392 allocations in file astobj2.c 12876768 bytes in 178844 allocations in file astobj2_hash.c 10800 bytes in 135 allocations in file astobj2_rbtree.c 198392 bytes in 605 allocations in file bridge.c 5760 bytes in 108 allocations in file bridge_after.c 20400 bytes in 51 allocations in file bridge_channel.c 4080 bytes in 120 allocations in file bridge_roles.c 1622200 bytes in 106 allocations in file bridge_softmix.c 1283 bytes in 4 allocations in file bucket.c 816 bytes in 4 allocations in file ccss.c 131819 bytes in 151 allocations in file cdr.c 6816 bytes in 6 allocations in file cel.c 80 bytes in 1 allocations in file chan_audiosocket.c 160 bytes in 2 allocations in file chan_bridge_media.c 175898 bytes in 321 allocations in file chan_pjsip.c 160 bytes in 2 allocations in file chan_rtp.c 35208 bytes in 552 allocations in file channel.c 105990 bytes in 269 allocations in file channel_internal_api.c 11298 bytes in 530 allocations in file cli.c 6144 bytes in 47 allocations in file codec.c 88 bytes in 1 allocations in file codec_opus_config.c 26496 bytes in 1 allocations in file codec_resample.c 322730 bytes in 2811 allocations in file config.c 107256 bytes in 701 allocations in file config_options.c 216 bytes in 2 allocations in file core_local.c 99840 bytes in 45 allocations in file core_unreal.c 1778956 bytes in 23497 allocations in file devicestate.c 418 bytes in 4 allocations in file dial.c 92160 bytes in 64 allocations in file dsp.c 14287856 bytes in 46992 allocations in file endpoints.c 41175 bytes in 277 allocations in file features_config.c 516210 bytes in 85 allocations in file file.c 28352 bytes in 443 allocations in file fixedjitterbuf.c 5143 bytes in 70 allocations in file format.c 992 bytes in 1 allocations in file format_cache.c 1573056 bytes in 12296 allocations in file format_cap.c 8136 bytes in 123 allocations in file framehook.c 712 bytes in 1 allocations in file func_dialgroup.c 87592 bytes in 1 allocations in file hashtab.c 2769 bytes in 16 allocations in file http.c 968 bytes in 1 allocations in file indications.c 4256 bytes in 2 allocations in file iostream.c 312 bytes in 5 allocations in file json.c 2296 bytes in 1 allocations in file libasteriskssl.c 74902 bytes in 463 allocations in file loader.c 16746 bytes in 7 allocations in file logger.c 294 bytes in 10 allocations in file logger_category.c 169503 bytes in 209 allocations in file manager.c 968 bytes in 36 allocations in file max_forwards.c 1096 bytes in 1 allocations in file media_cache.c 96 bytes in 2 allocations in file message.c 3776 bytes in 21 allocations in file named_locks.c 14009618 bytes in 110441 allocations in file pbx.c 494316 bytes in 392 allocations in file pbx_app.c 604576 bytes in 24179 allocations in file pbx_config.c 482240 bytes in 268 allocations in file pbx_functions.c 9288 bytes in 1 allocations in file pbx_realtime.c 600 bytes in 12 allocations in file pbx_sw.c 6998 bytes in 131 allocations in file pbx_variables.c 240 bytes in 2 allocations in file pjsip/cli_commands.c 56 bytes in 1 allocations in file presencestate.c 88 bytes in 1 allocations in file res_aeap.c 504 bytes in 2 allocations in file res_ari.c 55 bytes in 3 allocations in file res_ari_events.c 968 bytes in 1 allocations in file res_clialiases.c 24 bytes in 6 allocations in file res_format_attr_ilbc.c 336 bytes in 7 allocations in file res_format_attr_opus.c 1944 bytes in 5 allocations in file res_geolocation/geoloc_config.c 640 bytes in 4 allocations in file res_hep.c 66915 bytes in 11 allocations in file res_http_websocket.c 57317 bytes in 22 allocations in file res_musiconhold.c 754 bytes in 5 allocations in file res_odbc.c 2090 bytes in 35 allocations in file res_pjproject.c 192 bytes in 6 allocations in file res_pjsip.c 584 bytes in 2 allocations in file res_pjsip/config_auth.c 465 bytes in 2 allocations in file res_pjsip/config_global.c 88 bytes in 1 allocations in file res_pjsip/config_system.c 8540 bytes in 17 allocations in file res_pjsip/config_transport.c 11514248 bytes in 11873 allocations in file res_pjsip/location.c 352 bytes in 1 allocations in file res_pjsip/pjsip_cli.c 515499 bytes in 12344 allocations in file res_pjsip/pjsip_configuration.c 10888 bytes in 69 allocations in file res_pjsip/pjsip_distributor.c 1392 bytes in 3 allocations in file res_pjsip/pjsip_global_headers.c 25089849 bytes in 105602 allocations in file res_pjsip/pjsip_options.c 992 bytes in 1 allocations in file res_pjsip/pjsip_scheduler.c 165888 bytes in 1728 allocations in file res_pjsip/pjsip_session.c 2152 bytes in 1 allocations in file res_pjsip/pjsip_transport_events.c 2152 bytes in 1 allocations in file res_pjsip/pjsip_transport_management.c 101 bytes in 1 allocations in file res_pjsip_authenticator_digest.c 591 bytes in 48 allocations in file res_pjsip_caller_id.c 296 bytes in 6 allocations in file res_pjsip_config_wizard.c 120 bytes in 1 allocations in file res_pjsip_endpoint_identifier_ip.c 616 bytes in 1 allocations in file res_pjsip_exten_state.c 2048 bytes in 1 allocations in file res_pjsip_history.c 4232 bytes in 1 allocations in file res_pjsip_logger.c 1936 bytes in 2 allocations in file res_pjsip_mwi.c 1088 bytes in 2 allocations in file res_pjsip_outbound_registration.c 1424 bytes in 2 allocations in file res_pjsip_pubsub.c 4560 bytes in 57 allocations in file res_pjsip_sdp_rtp.c 156530 bytes in 942 allocations in file res_pjsip_session.c 1520 bytes in 19 allocations in file res_pjsip_session/pjsip_session_caps.c 1968 bytes in 41 allocations in file res_pjsip_t38.c 585929 bytes in 581 allocations in file res_rtp_asterisk.c 25 bytes in 1 allocations in file res_sorcery_astdb.c 6070 bytes in 45 allocations in file res_sorcery_config.c 6776 bytes in 7 allocations in file res_sorcery_memory.c 968 bytes in 1 allocations in file res_sorcery_memory_cache.c 124 bytes in 7 allocations in file res_sorcery_realtime.c 329604 bytes in 131 allocations in file res_srtp.c 11816 bytes in 87 allocations in file res_stasis.c 712 bytes in 1 allocations in file res_stasis_device_state.c 6227 bytes in 21 allocations in file res_stasis_playback.c 13882 bytes in 61 allocations in file res_stasis_recording.c 328 bytes in 2 allocations in file res_statsd.c 88 bytes in 1 allocations in file res_stir_shaken.c 2000 bytes in 1 allocations in file res_stir_shaken/general.c 9128 bytes in 1 allocations in file res_timing_pthread.c 15232 bytes in 136 allocations in file res_timing_timerfd.c 153927 bytes in 539 allocations in file rtp_engine.c 223424 bytes in 425 allocations in file sched.c 408 bytes in 17 allocations in file sdp_srtp.c 376 bytes in 4 allocations in file serializer.c 22260 bytes in 42 allocations in file slinfactory.c 4552043 bytes in 24483 allocations in file sorcery.c 22068831 bytes in 236249 allocations in file stasis.c 6680 bytes in 55 allocations in file stasis/app.c 4000 bytes in 20 allocations in file stasis/command.c 32696 bytes in 134 allocations in file stasis/control.c 2208 bytes in 2 allocations in file stasis/messaging.c 51192 bytes in 162 allocations in file stasis_bridges.c 3343182 bytes in 58751 allocations in file stasis_cache.c 1127904 bytes in 11749 allocations in file stasis_cache_pattern.c 367436 bytes in 1029 allocations in file stasis_channels.c 2544751 bytes in 35421 allocations in file stasis_message.c 2256720 bytes in 23503 allocations in file stasis_message_router.c 1184 bytes in 3 allocations in file stasis_state.c 37104 bytes in 2 allocations in file stdtime/localtime.c 812740 bytes in 15092 allocations in file stream.c 1672856 bytes in 49269 allocations in file stringfields.c 24835429 bytes in 12267 allocations in file strings.c 7139359 bytes in 47269 allocations in file taskprocessor.c 1820 bytes in 11 allocations in file tcptls.c 960256 bytes in 23621 allocations in file threadpool.c 2360 bytes in 140 allocations in file timing.c 1131456 bytes in 138 allocations in file translate.c 160 bytes in 2 allocations in file udptl.c 15153 bytes in 185 allocations in file utils.c 1253746 bytes in 6322 allocations in file xmldoc.c 213632459 bytes allocated in 1127831 selected allocations 213632459 bytes in all allocations 583543 bytes in deferred free large allocations 35188 bytes in deferred free small allocations 618731 bytes in deferred free allocations 214251190 bytes in all allocations and deferred free allocations {code} > Does the memory allocation failure message still occur? It does, it looks like it's tied to number of channels/calls processed. It usually happens once during the day, when we process the most calls (up to 30-40 per one Asterisk machine, we have two machines behind OpenSIPS). Sometimes it might get more often, like 2-3 times per day, sometime might not happen at all. As I noticed, it's tied to high calls volume, at least it looks like. By: Joshua C. Colp (jcolp) 2023-04-18 05:24:50.887-0500 There is an off-nominal scenario where channel departure fails, and the bridge is destroyed while it is subsequently still used: {code} [2023-04-17 08:59:43.075] DEBUG[2971] stasis/control.c: 1681721248.708: Channel is no longer in departable state [2023-04-17 08:59:43.075] DEBUG[2971] bridge.c: Bridge 1681721248.708: actually destroying stasis bridge, nobody wants it anymore [2023-04-17 08:59:43.075] DEBUG[2971] stasis_bridges.c: Update: 0x7f35400b72f8 Old: 1681721248.708 New: <none> [2023-04-17 08:59:43.075] DEBUG[2971] stasis_bridges.c: Update: 0x7f35400b72f8 Old: 1681721248.708 New: <none> [2023-04-17 08:59:43.075] DEBUG[2971] stasis.c: Destroying topic. name: bridge:all/bridge:1681721248.708, detail: [2023-04-17 08:59:43.076] DEBUG[2971] stasis.c: Topic 'bridge:all/bridge:1681721248.708': 0x7f354c04a3d0 destroyed {code} That's what is causing this. I've accepted the issue. By: Quality Unit (adminQU) 2023-04-18 05:50:23.181-0500 Great news! Do I understand correctly, we can expect a fix in the nearest minor release? By: Joshua C. Colp (jcolp) 2023-04-18 05:53:37.533-0500 There is no time frame on when a fix would occur. By: Quality Unit (adminQU) 2023-04-18 05:55:29.002-0500 I expected that answer, but had to make sure. OK then, thanks, we'll be waiting. By: Abhay Gupta (agupta) 2023-04-24 07:47:30.112-0500 We are also facing this issue and server is highly loaded . Were you able to find the sequence of event that could lead to this crash so we can try to stop the bug from hitting till a solution is found . By: Abhay Gupta (agupta) 2023-04-26 01:22:32.239-0500 We are able to generate the issue at our will and can do the testing if some patch can be suggested 1. Create bridge , put channel in stasis bridge 2. Delete the channel 3. Delete the bridge If we do a few calls in succession it gives this FRACK error in while loop and asterisk crashes after sometime . By: Abhay Gupta (agupta) 2023-04-26 04:07:16.062-0500 Also confirmed that the issue is not present in asterisk 16.30 |