Summary: | ASTERISK-22083: res_musiconhold segfault in free, in moh_scan_files | ||||||||
Reporter: | Sébastien Couture (sysreq) | Labels: | |||||||
Date Opened: | 2013-07-16 15:15:18 | Date Closed: | 2020-08-26 16:02:54 | ||||||
Priority: | Major | Regression? | No | ||||||
Status: | Closed/Complete | Components: | Resources/res_musiconhold | ||||||
Versions: | 13.11.2 14.0.2 | Frequency of Occurrence | Occasional | ||||||
Related Issues: |
| ||||||||
Environment: | Linux vgw1 2.6.32-35-generic #78-Ubuntu SMP Tue Oct 11 16:11:24 UTC 2011 x86_64 GNU/Linux | Attachments: | ( 0) backtrace.txt | ||||||
Description: | We've had Asterisk segfault with an error in res_musiconhold.c; more specifically in the 'moh_scan_files' function (line 1066):
{code} for (i = 0; i < class->total_files; i++) ast_free(class->filearray[i]); {code} I've attached a backtrace of the core dump. I'm wondering if this could be related to ASTERISK-21775. I haven't yet tried the patch mentioned in that issue. | ||||||||
Comments: | By: Rusty Newton (rnewton) 2013-07-25 16:03:32.423-0500 The crash looks different than the one in ASTERISK-21775. The segfault happens in free. There is likely memory corruption. Please follow the instructions on https://wiki.asterisk.org/wiki/display/AST/Valgrind and provide Valgrind debug. Get an Asterisk log leading up to the crash if possible. (with VERBOSE and DEBUG turned up to 5) Attach it to the issue and hit Send Back when you've got those items. By: Rusty Newton (rnewton) 2013-08-14 09:42:04.088-0500 Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested. Further information can be found at http://www.asterisk.org/developers/bug-guidelines By: Shlomi Gutman (voicenter) 2014-04-22 03:47:00.208-0500 Have this issues, it happens very rarely, so it's hard to reproduce. Asterisk 1.8.23.0 Not sure if i would be able to use high debug and verbose and valgrind on productive server. I have few core dumps i can attach if it would help By: Shlomi Gutman (voicenter) 2014-04-27 06:07:15.199-0500 occurred on 11.6-cert1 as well By: Rusty Newton (rnewton) 2014-05-22 08:29:42.568-0500 Please attach the additional traces in .txt format. Though if you can't provide the MALLOC_DEBUG output or valgrind output there may not be much we can do. By: Walter Doekes (wdoekes) 2014-05-22 08:48:38.509-0500 Okay. So that backtrace from July was useful after all: {noformat} Thread 101 (Thread 17685): #0 0x00007faf5d6de16e in ?? () from /lib/libc.so.6 #1 0x00007faf5d6673dc in ?? () from /lib/libc.so.6 #2 0x00007faf5d665e78 in free () from /lib/libc.so.6 #3 0x00007faf4876099f in moh_scan_files (class=0x353c898) at res_musiconhold.c:1066 #4 0x00007faf4876265e in local_ast_moh_start (chan=0x7faf4d526f18, mclass=0x7faf3fc19268 "vascomain", interpclass=0x0) at res_musiconhold.c:1516 #5 0x00000000004894cd in ast_moh_start (chan=0x7faf4d526f18, mclass=0x7faf3fc19268 "vascomain", interpclass=0x0) at channel.c:7964 #6 0x00007faf4772485f in queue_exec (chan=0x7faf4d526f18, data=0x7faf3fc1c9a0 "vascomain-premium,i") at app_queue.c:6228 #7 0x000000000050e1fb in pbx_exec (c=0x7faf4d526f18, app=0x7faf54ac10e0, data=0x7faf3fc1c9a0 "vascomain-premium,i") at pbx.c:1446 #8 0x00000000005184f5 in pbx_extension_helper (c=0x7faf4d526f18, con=0x0, context=0x7faf4d527470 "func-queue", exten=0x7faf4d5274c0 "s", priority=15, label=0x0, callerid=0x7faf4cf34bb0 "4504315181", action=E_SPAWN, found=0x7faf3fc1ec8c, combined_find_spawn=1) at pbx.c:4489 ... {noformat} {noformat} Thread 57 (Thread 17682): #0 0x00007faf5d6de16e in ?? () from /lib/libc.so.6 #1 0x00007faf5d6673dc in ?? () from /lib/libc.so.6 #2 0x00007faf5d665e78 in free () from /lib/libc.so.6 #3 0x00007faf4876099f in moh_scan_files (class=0x353c898) at res_musiconhold.c:1066 #4 0x00007faf4876265e in local_ast_moh_start (chan=0x7faf4d164618, mclass=0x7faf3fb9d268 "vascomain", interpclass=0x0) at res_musiconhold.c:1516 #5 0x00000000004894cd in ast_moh_start (chan=0x7faf4d164618, mclass=0x7faf3fb9d268 "vascomain", interpclass=0x0) at channel.c:7964 #6 0x00007faf4772485f in queue_exec (chan=0x7faf4d164618, data=0x7faf3fba09a0 "vascomain-premium,i") at app_queue.c:6228 #7 0x000000000050e1fb in pbx_exec (c=0x7faf4d164618, app=0x7faf54ac10e0, data=0x7faf3fba09a0 "vascomain-premium,i") at pbx.c:1446 #8 0x00000000005184f5 in pbx_extension_helper (c=0x7faf4d164618, con=0x0, context=0x7faf4d164b70 "func-queue", exten=0x7faf4d164bc0 "s", priority=15, label=0x0, callerid=0x7faf4d4377d0 "5143527328", action=E_SPAWN, found=0x7faf3fba2c8c, combined_find_spawn=1) at pbx.c:4489 #9 0x000000000051a26d in ast_spawn_extension (c=0x7faf4d164618, context=0x7faf4d164b70 "func-queue", exten=0x7faf4d164bc0 "s", priority=15, callerid=0x7faf4d4377d0 "5143527328", found=0x7faf3fba2c8c, combined_find_spawn=1) at pbx.c:5127 ... {noformat} {noformat} Thread 1 (Thread 17758): #0 0x00007faf5d61ba75 in raise () from /lib/libc.so.6 #1 0x00007faf5d61f5c0 in abort () from /lib/libc.so.6 #2 0x00007faf5d6554fb in ?? () from /lib/libc.so.6 #3 0x00007faf5d65f5b6 in ?? () from /lib/libc.so.6 #4 0x00007faf5d665e83 in free () from /lib/libc.so.6 #5 0x00007faf4876099f in moh_scan_files (class=0x353c898) at res_musiconhold.c:1066 #6 0x00007faf4876265e in local_ast_moh_start (chan=0x3923dd8, mclass=0x7faf3ee0d268 "vascomain", interpclass=0x0) at res_musiconhold.c:1516 #7 0x00000000004894cd in ast_moh_start (chan=0x3923dd8, mclass=0x7faf3ee0d268 "vascomain", interpclass=0x0) at channel.c:7964 #8 0x00007faf4772485f in queue_exec (chan=0x3923dd8, data=0x7faf3ee109a0 "vascomain-premium,i") at app_queue.c:6228 #9 0x000000000050e1fb in pbx_exec (c=0x3923dd8, app=0x7faf54ac10e0, data=0x7faf3ee109a0 "vascomain-premium,i") at pbx.c:1446 #10 0x00000000005184f5 in pbx_extension_helper (c=0x3923dd8, con=0x0, context=0x3924330 "func-queue", exten=0x3924380 "s", priority=15, label=0x0, callerid=0x3874a40 "4505158115", action=E_SPAWN, found=0x7faf3ee12c8c, combined_find_spawn=1) at pbx.c:4489 #11 0x000000000051a26d in ast_spawn_extension (c=0x3923dd8, context=0x3924330 "func-queue", exten=0x3924380 "s", priority=15, callerid=0x3874a40 "4505158115", found=0x7faf3ee12c8c, combined_find_spawn=1) at pbx.c:5127 ... {noformat} Observe how class is the same in all cases. Three threads are attempting to free the same memory at once. {noformat} for (i = 0; i < class->total_files; i++) { ast_free(class->filearray[i]); } class->total_files = 0; {noformat} This doesn't look like the same bug as ASTERISK-21775 because that memory isn't used in multiple threads at once there, according to the backtraces. Although I'm not sure. {noformat} #0 0x00002b9ac0a708e0 in moh_files_alloc (chan=0x2b9ae0071fc8, params=0x2b9ae0071268) at res_musiconhold.c:432 432 state->pos = ast_random() % class->total_files; {noformat} The fix: - add a mutex in class so we can lock the scanning/reinitialization - explain why the same memory can be used and don't do it By: Shlomi Gutman (voicenter) 2014-05-22 09:39:52.148-0500 Is there any relation to that all crashes of music on hold is happening when using from queue application? {noformat} #0 0x0000003a5ca30285 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000003a5ca30285 in raise () from /lib64/libc.so.6 #1 0x0000003a5ca31d30 in abort () from /lib64/libc.so.6 #2 0x0000003a5ca6971b in __libc_message () from /lib64/libc.so.6 #3 0x0000003a5ca711df in _int_free () from /lib64/libc.so.6 #4 0x0000003a5ca7163b in free () from /lib64/libc.so.6 #5 0x00002aaace58669d in moh_scan_files (class=0x2aaab12ed278) at res_musiconhold.c:1066 #6 0x00002aaace58add7 in local_ast_moh_start (chan=0x2aab680ac3b8, mclass=0x42692668 "moh_389", interpclass=0x0) at res_musiconhold.c:1518 #7 0x00002aaab40176d2 in say_periodic_announcement (qe=0x42692660, ringing=0) at app_queue.c:3602 #8 0x00002aaab40201c7 in wait_our_turn (chan=0x2aab680ac3b8, data=<value optimized out>) at app_queue.c:4334 #9 queue_exec (chan=0x2aab680ac3b8, data=<value optimized out>) at app_queue.c:6325 #10 0x00000000004ee7bf in pbx_exec (c=0x2aab680ac3b8, app=0x2aaaac0ddb70, data=0x42695990 "queue_10131,t,,blank,1800,,,sub-queue-onanswer") at pbx.c:1446 #11 0x00000000004fe99c in pbx_extension_helper (c=0x2aab680ac3b8, con=<value optimized out>, context=0x2aab680ac910 "asterisk-queue-blc", exten=0x2aab680ac960 "036476661", priority=17, label=0x0, callerid=0x2aab680a7310 "05000000", action=E_SPAWN, found=0x4269800c, combined_find_spawn=1) at pbx.c:4489 #12 0x000000000050198b in ast_spawn_extension (c=0x2aab680ac3b8, args=0x0) at pbx.c:5127 #13 __ast_pbx_run (c=0x2aab680ac3b8, args=0x0) at pbx.c:5230 #14 0x0000000000503e6b in pbx_thread (data=0x52e7) at pbx.c:5571 #15 0x000000000053beac in dummy_start (data=<value optimized out>) at utils.c:1075 #16 0x0000003a5d20677d in start_thread () from /lib64/libpthread.so.0 #17 0x0000003a5cad3c1d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x0000003a5ca30285 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x0000003a5ca31d30 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x0000003a5ca6971b in __libc_message () from /lib64/libc.so.6 No symbol table info available. #3 0x0000003a5ca711df in _int_free () from /lib64/libc.so.6 No symbol table info available. #4 0x0000003a5ca7163b in free () from /lib64/libc.so.6 No symbol table info available. #5 0x00002aaace58669d in moh_scan_files (class=0x2aaab12ed278) at res_musiconhold.c:1066 files_DIR = 0x2aab2802e070 files_dirent = <value optimized out> dir_path = "/var/lib/asterisk/mohmp3/moh_389/\000\000B\000\000\000\000@\021iB\000\000\000\000`#iB\000\000\000\000\035XK\000\000\000\000\000\070?DS\000hC\024 \313\304\364f\366\304\364\212\250\304\372r\306\304\354\272\347?\002\301\304\020p\"\305\034\232^\305P;\004\305@\256\033\303\030H@D\000\236A¼\201\275D\351\205ZE\216\017\260E\260\365\315E\357\370\300E\211v\205E<\005?D\323\064\227?\344\067\305D\352U\305\376\321+\305$\201\262\304V*\243\304 8_\302=qmD\274\310\361D\233\333fD\320a\025½\004\314\304\000\244\357\304!\027:\305\000m>\305,|N?\301\244\304?\254\272\304D\372\026\304X\346\307\303\360\327\020CP\370\332BC\b\260C\344\375-EhshE\317&\233E\250\022\237E,\367\262ER\343\312E<\003\304E4\363\346E\202\316\bF&\363\017F-\207\fF<\203\322Esiren14\000\334\000\n \000\000\000\000\334\000\n \000\000\000\000"... path = "/var/lib/asterisk/mohmp3/moh_389\000L9,\253*\000\000\001\000\000\000\000\000\000\000\200\002iB\000\000\000\000;\026\247\\:\000\000\000/tmp\000\000\000\200\370\016iB\000\000\000\000A\000\000\000\000\000\000\000\377\037\000\000\000\000\000\000\260L9,\253*\000\000\370\016iB", '\000' <repeats 12 times>"\266, \346B\000\000\000\000\000\300\026iB\000\000\000\000\060\025iB\000\000\000\000 \000\000\000\060\000\000\000\002\000\000\000\000\000\000\000\320\027iB\000\000\000\000\240\002iB\000\000\000\000\000\000\000\000\376\377\377\377\v\000\000\000\000\000\000\000\060\025iB\000\000\000\000\246\376\243]:", '\000' <repeats 17 times>, "`A\000\000\000\000\000\000\000\000"... filepath = "/var/lib/asterisk/mohmp3/moh_389//26905_389_12032013215341\000g729\000/var/lib/asterisk/mohmp3/moh_389//26905_389_12032013215341\000g729\000\220\362hB", '\000' <repeats 36 times>"\226, \000\000\000\n", '\000' <repeats 11 times>"\360, \361hB", '\000' <repeats 12 times>"\357, \023iB\000\000\000\000)\364hB\000\000\000\000)\364hB\000\000\000\000\200$iB\000\000\000\000b\037\002\264\252*\000\000\306"... statbuf = {st_dev = 1114186256, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, pad0 = 53, st_rdev = 227633266688, st_size = 46912605442680, st_blksize = 46913094705280, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 525996840, tv_nsec = 0}, st_ctim = {tv_sec = 8394054, tv_nsec = 1114187368}, __unused = {0, 0, 1114186256}} i = 1 __PRETTY_FUNCTION__ = "moh_scan_files" #6 0x00002aaace58add7 in local_ast_moh_start (chan=0x2aab680ac3b8, mclass=0x42692668 "moh_389", interpclass=0x0) at res_musiconhold.c:1518 mohclass = 0x2aaab12ed278 var = 0x0 res = -1 realtime_possible = 1 __PRETTY_FUNCTION__ = "local_ast_moh_start" __FUNCTION__ = "local_ast_moh_start" #7 0x00002aaab40176d2 in say_periodic_announcement (qe=0x42692660, ringing=0) at app_queue.c:3602 res = 0 now = 1397039323 __PRETTY_FUNCTION__ = "say_periodic_announcement" #8 0x00002aaab40201c7 in wait_our_turn (chan=0x2aab680ac3b8, data=<value optimized out>) at app_queue.c:4334 No locals. #9 queue_exec (chan=0x2aab680ac3b8, data=<value optimized out>) at app_queue.c:6325 res = 0 ringing = 0 user_priority = <value optimized out> max_penalty_str = <value optimized out> min_penalty_str = <value optimized out> prio = 0 qcontinue = 0 max_penalty = 0 min_penalty = 0 reason = QUEUE_UNKNOWN tries = 0 noption = 0 makeannouncement = <value optimized out> position = <value optimized out> args = {argc = 8, argv = 0x426937f8, queuename = 0x42692610 "queue_10131", options = 0x4269261c "t", url = 0x4269261e "", announceoverride = 0x4269261f "blank", queuetimeoutstr = 0x42692625 "1800", agi = 0x4269262a "", macro = 0x4269262b "", gosub = 0x4269262c "sub-queue-onanswer", rule = 0x0, position = 0x0} ---Type <return> to continue, or q <return> to quit--- qe = {parent = 0x2aaad4334fd8, moh = "moh_389", '\000' <repeats 72 times>, announce = "/var/lib/asterisk/sounds/blc/user_rec/26905_04032013170704", '\000' <repeats 4037 times>, context = '\000' <repeats 79 times>, digits = '\000' <repeats 79 times>, valid_digits = 0, pos = 2, prio = 0, last_pos_said = 0, ring_when_ringing = 0, last_periodic_announce_time = 1397039293, last_periodic_announce_sound = 0, last_pos = 0, opos = 4, handled = 0, pending = 0, max_penalty = 0, min_penalty = 0, linpos = 0, linwrapped = 0, start = 1397039113, expire = 1397040913, cancel_answered_elsewhere = 0, chan = 0x2aab680ac3b8, qe_rules = {first = 0x0, last = 0x0}, pr = 0x0, next = 0x47586660} __PRETTY_FUNCTION__ = "queue_exec" #10 0x00000000004ee7bf in pbx_exec (c=0x2aab680ac3b8, app=0x2aaaac0ddb70, data=0x42695990 "queue_10131,t,,blank,1800,,,sub-queue-onanswer") at pbx.c:1446 res = <value optimized out> u = 0x2aab2c134c60 saved_c_appl = 0x0 saved_c_data = 0x0 __PRETTY_FUNCTION__ = "pbx_exec" #11 0x00000000004fe99c in pbx_extension_helper (c=0x2aab680ac3b8, con=<value optimized out>, context=0x2aab680ac910 "asterisk-queue-blc", exten=0x2aab680ac960 "030000000000", priority=17, label=0x0, callerid=0x2aab680a7310 "0500000000", action=E_SPAWN, found=0x4269800c, combined_find_spawn=1) at pbx.c:4489 e = <value optimized out> app = 0x2aaaac0ddb70 res = <value optimized out> q = {incstack = {0x0 <repeats 128 times>}, stacklen = 0, status = 5, swo = 0x0, data = 0x0, foundcontext = 0x2aab680ac910 "asterisk-queue-blc"} passdata = "queue_10131,t,,blank,1800,,,sub-queue-onanswer\000/FDcwXzuO\000e89\000\000\000\000``iB\000\000\000\000\340giB\000\000\000\000\340giB\000\000\000\000\340giB", '\000' <repeats 12 times>"\377, \377\377\377\377\377\377\377\377giB\023\000\000\000\026\242X", '\000' <repeats 13 times>, "\030_iB\000\000\000\000\000\000\000\000\001\000\000\000\v", '\000' <repeats 31 times>, " [iB", '\000' <repeats 52 times>, " \000\000\000\000\000\000\000P\000\000\000\000\000\000\000\377\377\377s", '\000' <repeats 12 times>"\377, \377\377\377\060\000\000\000`\004\325\\:", '\000' <repeats 11 times>"\340"... matching_action = 0 __PRETTY_FUNCTION__ = "pbx_extension_helper" #12 0x000000000050198b in ast_spawn_extension (c=0x2aab680ac3b8, args=0x0) at pbx.c:5127 No locals. #13 __ast_pbx_run (c=0x2aab680ac3b8, args=0x0) at pbx.c:5230 digit = <value optimized out> invalid = <value optimized out> dst_exten = "\000\200iB\000\000\000\000\020\200iB", '\000' <repeats 12 times>, "\030\200iB", '\000' <repeats 20 times>, "\030", '\000' <repeats 47 times>, " \200iB\000\000\000\000(\200iB\000\000\000\000\060\200iB\000\000\000\000\070\200iB\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\004\000\000\000P\200iB\000\000\000\000\244\323 ]:", '\000' <repeats 27 times>"\340, )\325\\:", '\000' <repeats 11 times>"\240, \367\236 \000\000\000\000`\b\002\000\000\000\000\000\340)\325\\:\000\000\000\030\000\000\000\000\000\000\000-8\247\\:", '\000' <repeats 11 times>, "@d\030h\253*\000" pos = 5440934 found = 1 res = 0 error = 0 __PRETTY_FUNCTION__ = "__ast_pbx_run" #14 0x0000000000503e6b in pbx_thread (data=0x52e7) at pbx.c:5571 No locals. #15 0x000000000053beac in dummy_start (data=<value optimized out>) at utils.c:1075 __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {46915674203200, 4875765709081475290, 0, 3, 4096, 46915674089632, 4875765708039747770, 4875765709076383304}, __mask_was_saved = 0}}, __pad = {0x426981a0, 0x0, 0x3a5cd4d890, 0x3a5cd4d898}} __cancel_arg = 0x42698940 not_first_call = <value optimized out> ret = <value optimized out> #16 0x0000003a5d20677d in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #17 0x0000003a5cad3c1d in clone () from /lib64/libc.so.6 No symbol table info available. {noformat} By: Walter Doekes (wdoekes) 2014-05-22 09:48:34.464-0500 Shlomi are there other threads referencing the same mem? {{class=0x2aaab12ed278}} By: Shlomi Gutman (voicenter) 2014-05-25 09:22:42.308-0500 I do not have that core file anymore. But i have 3-4 others i saved. Do you mind to tell how do i check it or/and link me to good manual for "dummies" :) Thanks By: Shlomi Gutman (voicenter) 2014-05-25 09:54:52.597-0500 As i understand, the moh that is used is realtime and from cache. And it everytime rescans the files before playing. The problem is when there are a lot of calls to the same Q, it tries to rescan (free and readd ) files at the same time from several threads/calls? Maybe there should be lock added or maybe the best practice is to change array of files to save thread list ? By: Joshua C. Colp (jcolp) 2016-10-17 14:23:32.314-0500 Still applicable to later versions as well. By: Niek Klein Kromhof (nkkromhof) 2016-11-23 03:59:39.410-0600 I'm currently experiencing this issue as well on 13.8-cert2. I've disabled 'cachertclasses=yes' and restarted Asterisk but was still experiencing segfaults on initialization of MoH. It may or may not be pertinent that storage used for MoH files is a GlusterFS volume. By: Sean Bright (seanbright) 2020-08-26 16:02:54.494-0500 Technically this is the original bug and ASTERISK-28927 is the duplicate, but it is easier to just mark this as such and move on. |