Summary: | ASTERISK-15271: [patch] New music on hold patches cause asterisk + full system hard lock | ||
Reporter: | Andrew Parisio (parisioa) | Labels: | |
Date Opened: | 2009-12-03 15:24:19.000-0600 | Date Closed: | 2009-12-18 17:52:00.000-0600 |
Priority: | Critical | Regression? | Yes |
Status: | Closed/Complete | Components: | Resources/res_musiconhold |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 20091207__issue16279__2.diff.txt ( 1) 20091207__issue16279__3.diff.txt ( 2) 20091207__issue16388.diff.txt ( 3) 20091208__issue16388.diff.txt ( 4) blockingerror.jpg ( 5) photo.jpg ( 6) refs.tar.gz ( 7) refs.txt ( 8) refs2.txt ( 9) ss.jpg (10) valgrind.supp (11) valgrind.tar.gz | |
Description: | After all the changes we made to music on hold, my asterisk system now hard locks after a while. I had listened to a few streams, we hit what we thught may be a memory corruption issue in bug 16279 and we in fact did. Check out the cameraphone picture of the console of the server attached. I have no idea what this bug should be filed under so i'm going under music on hold where we were before. | ||
Comments: | By: Andrew Parisio (parisioa) 2009-12-03 15:49:47.000-0600 please note that the reproduceability always is set for the MOH crashing. i've had one hardlock. i'm currently recompiling and will post valgrind logs as requested in the prior bug thread shortly. By: Andrew Parisio (parisioa) 2009-12-03 15:51:58.000-0600 hmmm, not sure what to do with this. csgtacsip1:/usr/src/asterisk-1.6.1.svn# valgrind --suppressions=/usr/src/asterisk-1.6.1.svn/contrib/valgrind.supp --log-fd=9 asterisk -vvvvcg 9>valgrind.txt csgtacsip1:/usr/src/asterisk-1.6.1.svn# csgtacsip1:/usr/src/asterisk-1.6.1.svn# csgtacsip1:/usr/src/asterisk-1.6.1.svn# cat valgrind.txt ==32676== Memcheck, a memory error detector. ==32676== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. ==32676== Using LibVEX rev 1854, a library for dynamic binary translation. ==32676== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==32676== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework. ==32676== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. ==32676== For more details, rerun with: -v ==32676== location should start with fun: or obj: ==32676== FATAL: in suppressions file '/usr/src/asterisk-1.6.1.svn/contrib/valgrind.supp': location should start with 'fun:' or 'obj:' ==32676== exiting now. By: Atis Lezdins (atis) 2009-12-03 19:53:59.000-0600 Try with attached suppressions file. It seems to work for both 3.3 and 3.4 versions of valgrind. If unsuccessful, you may run without suppressions file, but there will be some irrelevant debug. Btw, i don't know anything about your issue, just fixing my valgrind suppressions file :) By: Andrew Parisio (parisioa) 2009-12-03 21:10:03.000-0600 new supressions file worked. thanks! i did a few listens, some reloads, moh reloads in there. if you want me to do anything else, let me know. also i enabled dont_optimize and the malloc flags as the readme says. By: Andrew Parisio (parisioa) 2009-12-04 19:03:33.000-0600 it happened again on SVN-branch-1.6.1-r232865 edit: nothing was happening either. it's 5pm on a friday and it exploded with nobody using it. By: Tilghman Lesher (tilghman) 2009-12-07 10:36:43.000-0600 Please delete your /tmp/refs file, patch, and restart Asterisk. After you either get the warning (hopefully preventing your system from crashing) or a crash, please upload your /tmp/refs file. By: Andrew Parisio (parisioa) 2009-12-07 11:23:06.000-0600 okay, done. no crashes (it randomly hard locks the system after many hours), but the stream gets killed. By: Tilghman Lesher (tilghman) 2009-12-07 11:58:20.000-0600 Could you try that again, without compressing the /tmp/refs file? It appears your upload was corrupt. By: Andrew Parisio (parisioa) 2009-12-07 12:11:48.000-0600 i recompiled a clean tree and it doesn't generate the refs file anymore. i did that because i got compile errors last time, probably from a prior patch that had been applied. By: Tilghman Lesher (tilghman) 2009-12-07 12:29:23.000-0600 Right, that means you don't have the patch applied. I've regenerated the patch without warnings, so please try using that, and grab the resulting refs file. By: Andrew Parisio (parisioa) 2009-12-07 12:36:58.000-0600 okay, patch applied and new log uploaded. it's small enough now it doesnt need to be compressed. By: Tilghman Lesher (tilghman) 2009-12-07 15:43:46.000-0600 New patch uploaded. It's still not clear to me where this leak is occurring. By: Andrew Parisio (parisioa) 2009-12-07 15:45:17.000-0600 the blank one is the one i listened too. csgtacsip1*CLI> moh show classes Class: 100368-moh Mode: files Directory: /var/lib/asterisk/moh/custom/100368/ Class: hot108jamz Mode: custom Directory: nodir Application: /usr/bin/mpg123 -q -s --mono -r 8000 -f 8192 -b 0 http://scfire-mtc-aa04.stream.aol.com:80/stream/1071 Format: slin Class: Mode: <none> Directory: <none> Format: unknown Class: default Mode: files Directory: /var/lib/asterisk/moh Class: 977rock Mode: custom Directory: nodir Application: /usr/bin/mpg123 -q -s --mono -r 8000 -f 8192 -b 0 http://scfire-mtc-aa04.stream.aol.com:80/stream/1071 Format: slin Class: Hitz Mode: custom Directory: nodir Application: /usr/bin/mpg123 -q -s --mono -r 8000 -f 8192 -b 0 http://scfire-mtc-aa02.stream.aol.com:80/stream/1074 Format: slin [Dec 7 13:46:53] ERROR[27223]: astobj2.c:116 INTERNAL_OBJ: bad magic number 0x6567656c for 0x9086c58 [Dec 7 13:46:53] ERROR[27223]: astobj2.c:116 INTERNAL_OBJ: bad magic number 0x0 for 0x9086c58 By: Tilghman Lesher (tilghman) 2009-12-07 16:32:20.000-0600 New patch uploaded. This one still contains all the debugging, in case I still haven't caught up to this, but I have fixed something in this patch. To be clear, the refs file is not helpful unless you have the words "BAD ATTEMPT" in the file (which means a crash was averted, though a bug still remains). By: Andrew Parisio (parisioa) 2009-12-07 16:36:27.000-0600 holy crap it works! i can connect multime times in succession and it works. multiple people can connect at the same time no problem. I'll hammer it a bit and see if i can break it now. By: Andrew Parisio (parisioa) 2009-12-07 16:37:05.000-0600 each time i connect to the stream i get this on the console, all 4 lines each time: [Dec 7 14:38:27] ERROR[28190]: utils.c:1173 ast_careful_fwrite: fwrite() returned error: Broken pipe [Dec 7 14:38:27] ERROR[28190]: utils.c:1173 ast_careful_fwrite: fwrite() returned error: Broken pipe [Dec 7 14:38:27] ERROR[28190]: utils.c:1173 ast_careful_fwrite: fwrite() returned error: Broken pipe [Dec 7 14:38:27] ERROR[28190]: utils.c:1173 ast_careful_fwrite: fwrite() returned error: Broken pipe By: Tilghman Lesher (tilghman) 2009-12-07 16:50:42.000-0600 I think those errors are mostly unrelated. That routine is only called from manager.c and relates to being unable to send data to a remote socket, usually because the connection has closed. While MOH certainly queues manager events, which could cause this error, if you had dead AMI sessions, the error indicates nothing wrong with MOH itself. By: Andrew Parisio (parisioa) 2009-12-07 17:07:03.000-0600 oh okay, that's probably the way i drive MOH. the phone goes to a webpage which telnets into the AMI and tells asterisk to call the phone with the select MOH stream. what is the clean way to exit from the AMI? edit: i just RTFM'd and found it Action: Logoff. thx By: Digium Subversion (svnbot) 2009-12-07 17:10:13.000-0600 Repository: asterisk Revision: 233577 U trunk/contrib/valgrind.supp ------------------------------------------------------------------------ r233577 | atis | 2009-12-07 17:10:13 -0600 (Mon, 07 Dec 2009) | 8 lines Fix compatibility with valgrind 3.3 and older. (noticed in issue ASTERISK-15271) Reported by: parisioa Patches: valgrind.supp uloaded by atis (license 242) Tested by: atis, parisioa ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233577 By: Digium Subversion (svnbot) 2009-12-07 17:54:11.000-0600 Repository: asterisk Revision: 233615 _U branches/1.6.2/ U branches/1.6.2/contrib/valgrind.supp ------------------------------------------------------------------------ r233615 | atis | 2009-12-07 17:54:11 -0600 (Mon, 07 Dec 2009) | 15 lines Merged revisions 233577 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines Fix compatibility with valgrind 3.3 and older. (noticed in issue ASTERISK-15271) Reported by: parisioa Patches: valgrind.supp uloaded by atis (license 242) Tested by: atis, parisioa ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233615 By: Digium Subversion (svnbot) 2009-12-07 17:56:05.000-0600 Repository: asterisk Revision: 233616 _U branches/1.6.1/ U branches/1.6.1/contrib/valgrind.supp ------------------------------------------------------------------------ r233616 | atis | 2009-12-07 17:56:05 -0600 (Mon, 07 Dec 2009) | 15 lines Merged revisions 233577 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines Fix compatibility with valgrind 3.3 and older. (noticed in issue ASTERISK-15271) Reported by: parisioa Patches: valgrind.supp uloaded by atis (license 242) Tested by: atis, parisioa ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233616 By: Digium Subversion (svnbot) 2009-12-07 17:57:22.000-0600 Repository: asterisk Revision: 233617 _U branches/1.6.0/ U branches/1.6.0/contrib/valgrind.supp ------------------------------------------------------------------------ r233617 | atis | 2009-12-07 17:57:21 -0600 (Mon, 07 Dec 2009) | 15 lines Merged revisions 233577 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines Fix compatibility with valgrind 3.3 and older. (noticed in issue ASTERISK-15271) Reported by: parisioa Patches: valgrind.supp uloaded by atis (license 242) Tested by: atis, parisioa ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233617 By: Digium Subversion (svnbot) 2009-12-07 18:02:44.000-0600 Repository: asterisk Revision: 233618 U branches/1.4/contrib/valgrind.supp ------------------------------------------------------------------------ r233618 | atis | 2009-12-07 18:02:43 -0600 (Mon, 07 Dec 2009) | 15 lines Merged revisions 233577 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines Fix compatibility with valgrind 3.3 and older. (noticed in issue ASTERISK-15271) Reported by: parisioa Patches: valgrind.supp uloaded by atis (license 242) Tested by: atis, parisioa ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233618 By: Digium Subversion (svnbot) 2009-12-07 18:09:34.000-0600 Repository: asterisk Revision: 233619 _U trunk/ ------------------------------------------------------------------------ r233619 | atis | 2009-12-07 18:09:34 -0600 (Mon, 07 Dec 2009) | 21 lines Blocked revisions 233618 via svnmerge ................ r233618 | atis | 2009-12-08 02:02:43 +0200 (Tue, 08 Dec 2009) | 15 lines Merged revisions 233577 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines Fix compatibility with valgrind 3.3 and older. (noticed in issue ASTERISK-15271) Reported by: parisioa Patches: valgrind.supp uloaded by atis (license 242) Tested by: atis, parisioa ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233619 By: Digium Subversion (svnbot) 2009-12-08 12:22:45.000-0600 Repository: asterisk Revision: 233718 U trunk/res/res_musiconhold.c ------------------------------------------------------------------------ r233718 | tilghman | 2009-12-08 12:22:45 -0600 (Tue, 08 Dec 2009) | 8 lines Find another ref leak and change how we manage module references. (closes issue ASTERISK-15271) Reported by: parisioa Patches: 20091208__issue16388.diff.txt uploaded by tilghman (license 14) Tested by: parisioa, tilghman Review: https://reviewboard.asterisk.org/r/442/ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233718 By: Digium Subversion (svnbot) 2009-12-08 12:28:19.000-0600 Repository: asterisk Revision: 233729 _U branches/1.6.0/ U branches/1.6.0/res/res_musiconhold.c ------------------------------------------------------------------------ r233729 | tilghman | 2009-12-08 12:28:18 -0600 (Tue, 08 Dec 2009) | 15 lines Merged revisions 233718 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r233718 | tilghman | 2009-12-08 12:22:44 -0600 (Tue, 08 Dec 2009) | 8 lines Find another ref leak and change how we manage module references. (closes issue ASTERISK-15271) Reported by: parisioa Patches: 20091208__issue16388.diff.txt uploaded by tilghman (license 14) Tested by: parisioa, tilghman Review: https://reviewboard.asterisk.org/r/442/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233729 By: Digium Subversion (svnbot) 2009-12-08 12:31:04.000-0600 Repository: asterisk Revision: 233730 _U branches/1.6.1/ U branches/1.6.1/res/res_musiconhold.c ------------------------------------------------------------------------ r233730 | tilghman | 2009-12-08 12:31:03 -0600 (Tue, 08 Dec 2009) | 15 lines Merged revisions 233718 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r233718 | tilghman | 2009-12-08 12:22:44 -0600 (Tue, 08 Dec 2009) | 8 lines Find another ref leak and change how we manage module references. (closes issue ASTERISK-15271) Reported by: parisioa Patches: 20091208__issue16388.diff.txt uploaded by tilghman (license 14) Tested by: parisioa, tilghman Review: https://reviewboard.asterisk.org/r/442/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233730 By: Digium Subversion (svnbot) 2009-12-08 12:33:55.000-0600 Repository: asterisk Revision: 233731 _U branches/1.6.2/ U branches/1.6.2/res/res_musiconhold.c ------------------------------------------------------------------------ r233731 | tilghman | 2009-12-08 12:33:54 -0600 (Tue, 08 Dec 2009) | 15 lines Merged revisions 233718 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r233718 | tilghman | 2009-12-08 12:22:44 -0600 (Tue, 08 Dec 2009) | 8 lines Find another ref leak and change how we manage module references. (closes issue ASTERISK-15271) Reported by: parisioa Patches: 20091208__issue16388.diff.txt uploaded by tilghman (license 14) Tested by: parisioa, tilghman Review: https://reviewboard.asterisk.org/r/442/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233731 By: Digium Subversion (svnbot) 2009-12-08 12:35:15.000-0600 Repository: asterisk Revision: 233718 U trunk/res/res_musiconhold.c ------------------------------------------------------------------------ r233718 | tilghman | 2009-12-08 12:22:44 -0600 (Tue, 08 Dec 2009) | 8 lines Find another ref leak and change how we manage module references. (closes issue ASTERISK-15271, closes issue ASTERISK-15170, closes issue ASTERISK-15273) Reported by: parisioa Patches: 20091208__issue16388.diff.txt uploaded by tilghman (license 14) Tested by: parisioa, tilghman Review: https://reviewboard.asterisk.org/r/442/ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=233718 By: Andrew Parisio (parisioa) 2009-12-09 18:50:54.000-0600 A few hours ago my system hard locked again and quit pinging / responding. I attached the screenshot from the console again. These hardlocks started happening with the MOH patches. Note that the screenshot references an Asterisk Blocking state. A few minutes ago i got about 30 of these lines at my console: [Dec 9 16:45:47] WARNING[3419]: res_musiconhold.c:620 monmp3thread: Unable to send a SIGHUP to MOH process?!!: No such process [Dec 9 16:45:50] WARNING[3419]: res_musiconhold.c:620 monmp3thread: Unable to send a SIGHUP to MOH process?!!: No such process By: Tilghman Lesher (tilghman) 2009-12-10 12:41:05.000-0600 There is literally no reason why a user process should cause a CPU lockup. Ever. This is a kernel bug. We may have inadvertently triggered it, but it's still a kernel bug. By: Tilghman Lesher (tilghman) 2009-12-10 12:53:18.000-0600 I would really recommend a kernel upgrade at this point. By: Andrew Parisio (parisioa) 2009-12-10 14:36:52.000-0600 Linux csgtacsip1 2.6.26-2-amd64 #1 SMP Wed Aug 19 07:01:17 UTC 2009 x86_64 GNU/Linux Not that old. By: Tilghman Lesher (tilghman) 2009-12-10 17:36:40.000-0600 Kernel 2.6.32 was just released. By: Tilghman Lesher (tilghman) 2009-12-10 17:47:02.000-0600 Combing through similar reports shows that the kernel scheduler was improved in 2.6.27 to avoid this exact situation. By: Andrew Parisio (parisioa) 2009-12-10 17:50:23.000-0600 ok, donwloading 2.6.32 now to compile / test. will report back next week. By: Tilghman Lesher (tilghman) 2009-12-16 23:08:24.000-0600 Has the new kernel scheduler relieved your lockup? By: Andrew Parisio (parisioa) 2009-12-16 23:09:47.000-0600 i haven't seen it happen since last week (it happened twice in a week), and i haven't gone to 2.6.32 yet either, so i have no idea. i'm a little worried, i need to find time to install 2.6.32. By: Tilghman Lesher (tilghman) 2009-12-17 11:32:04.000-0600 Honestly, just upgrading to 2.6.27 should be sufficient. However, given that you are no longer experiencing this, it's difficult to see why we need to keep this open. |