Summary: | ASTERISK-11707: Started to crash every 2-3 hours | ||
Reporter: | Private Name (falves11) | Labels: | |
Date Opened: | 2008-03-23 17:28:40 | Date Closed: | 2008-05-15 11:42:30 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) blowup.txt ( 1) blowup1.txt ( 2) crassssh.txt ( 3) malloc_debug.txt ( 4) valgrind_4-4-08-2-14PM.log ( 5) valgrind_btfull_new_3.txt ( 6) valgrind_core_4-4-08-3-15PM.txt ( 7) valgrind_core_5.zip ( 8) valgrind_core1.txt ( 9) valgrind_core2.txt (10) valgrind_core3.txt (11) valgrind_crash_new (12) valgrind_crash_new_1 (13) valgrind_crash_new_2 (14) valgrind_crash_new_3.zip (15) valgrind_txt_3.txt (16) valgrind.txt (17) valgrind1.txt (18) valgrind2.txt (19) valgrind3.txt (20) valgrind5.txt | |
Description: | I upgraded yersterday to version SVN-trunk-r110578M and it crashes every few hours. The box had not seen action and started to be used. | ||
Comments: | By: Private Name (falves11) 2008-03-24 11:56:01 Please help, it crashed almost 100 times today. By: Mark Michelson (mmichelson) 2008-03-25 11:07:01 This is an issue of memory corruption, so valgrind output would be helpful. Please see doc/valgrind.txt By: Private Name (falves11) 2008-03-27 20:50:22 When I start valgrind with this command: valgrind --log-file-exactly=valgrind.txt asterisk -cg 2>malloc_debug.txt .....SIP channel loading... .......................[Mar 27 21:40:52] ERROR[1442]: pbx.c:2448 ast_func_read: Function STRFTIME not registered Is this normal???? By: Mark Michelson (mmichelson) 2008-03-28 10:03:43 This means that you are attempting to use the STRFTIME function but it did not get registered. This function is registered in the func_strings module. Please be sure that you are loading this module. Another issue is that the timing could be off somewhat. You may be attempting to use the STRFTIME function prior to its being loaded. If you only see the message once, I would assume that that this is what is happening. Thank you for the valgrind output! By: Digium Subversion (svnbot) 2008-03-28 11:32:27 Repository: asterisk Revision: 111662 U trunk/channels/chan_sip.c U trunk/include/asterisk/strings.h ------------------------------------------------------------------------ r111662 | mmichelson | 2008-03-28 11:32:25 -0500 (Fri, 28 Mar 2008) | 9 lines The copy_request function did not take into account the necessary null terminator for the string to be copied into. This resulted in parse_request reading invalid memory beyond the end of the string, and in some cases led to crashes. Thanks to falves11 for providing the valgrind output which led to the closure of this issue. (closes issue ASTERISK-11707) Reported by: falves11 ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=111662 By: Digium Subversion (svnbot) 2008-03-28 11:33:23 Repository: asterisk Revision: 111663 _U branches/1.6.0/ ------------------------------------------------------------------------ r111663 | mmichelson | 2008-03-28 11:33:21 -0500 (Fri, 28 Mar 2008) | 16 lines Blocked revisions 111662 via svnmerge ........ r111662 | mmichelson | 2008-03-28 11:36:59 -0500 (Fri, 28 Mar 2008) | 9 lines The copy_request function did not take into account the necessary null terminator for the string to be copied into. This resulted in parse_request reading invalid memory beyond the end of the string, and in some cases led to crashes. Thanks to falves11 for providing the valgrind output which led to the closure of this issue. (closes issue ASTERISK-11707) Reported by: falves11 ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=111663 By: Private Name (falves11) 2008-03-28 12:05:06 I downloaded the latest release vis SVN, hoping to catch 111663, but I only gte this in main/version.c. static const char asterisk_version[] = "SVN-trunk-r111662"; Is this normal or there is no revision 111663 yet available for downloading? My system keeps restarting. By: Mark Michelson (mmichelson) 2008-03-28 12:18:39 As of this moment, the latest revision to trunk is 111662, so there is no 111663 revision to check out. Re-closing. By: Private Name (falves11) 2008-03-28 12:35:44 There is something wrongh with revision 111662. I cannot get a single call through.If somebody wants to access the system and try to see what happens, please contact me at falves1@hotmail.com. Nothing changed on my end. By: Private Name (falves11) 2008-03-28 13:00:39 I installed the new version SVN-trunk-r111721 on two boxes and no call gets through and I get no verbose messages of the calls arriving, just == Using SIP RTP CoS mark 5 == Using UDPTL CoS mark 5 no matter what verbose level I choose. By: Mark Michelson (mmichelson) 2008-03-28 13:25:09 Okay, Yes, there is something definitely wrong here. I'll get this fixed. By: Digium Subversion (svnbot) 2008-03-28 14:58:57 Repository: asterisk Revision: 111811 U trunk/channels/chan_sip.c ------------------------------------------------------------------------ r111811 | mmichelson | 2008-03-28 14:58:49 -0500 (Fri, 28 Mar 2008) | 11 lines This time the fix is proper for issue 12284. I have tested it thoroughly and found that valgrind no longer complains and that calls do complete correctly. The fix is along the same lines as before: Make sure the final null terminator gets copied into the new sip_request's data pointer. Without it, parse_request will read and potentially write past the end of the string, causing potential crashes. (closes issue ASTERISK-11707...for real this time!) reported by falves11 ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=111811 By: Digium Subversion (svnbot) 2008-03-28 14:59:27 Repository: asterisk Revision: 111812 _U branches/1.6.0/ ------------------------------------------------------------------------ r111812 | mmichelson | 2008-03-28 14:59:25 -0500 (Fri, 28 Mar 2008) | 18 lines Blocked revisions 111811 via svnmerge ........ r111811 | mmichelson | 2008-03-28 15:03:16 -0500 (Fri, 28 Mar 2008) | 11 lines This time the fix is proper for issue 12284. I have tested it thoroughly and found that valgrind no longer complains and that calls do complete correctly. The fix is along the same lines as before: Make sure the final null terminator gets copied into the new sip_request's data pointer. Without it, parse_request will read and potentially write past the end of the string, causing potential crashes. (closes issue ASTERISK-11707...for real this time!) reported by falves11 ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=111812 By: Private Name (falves11) 2008-03-28 15:41:57 It does not crash but it restarts itself inside valgrind. In fact, I am going keep running on valgrind until the final release. I am attaching valgrind2.txt. By: Private Name (falves11) 2008-03-28 15:55:01 I took out of valgrind. It is restarting itself after a few minutes. I think that valgrind does not lety it crash. I don't know what to do. Can somebody log into my box and see if we can figure it out? please contact at falves1@otmail.com By: Abhay Gupta (agupta) 2008-03-28 22:22:20 Till now it was more beacuse of socket error of the OS . Now that it is resolved will give the correct picture if the crash occurs . By: Private Name (falves11) 2008-04-01 10:44:10 It keeps crashing. This morning it crashef twice. I uploaded the valgrind core bt full and thread apply all bt full. Also I uploaded the file valgrind.txt By: Digium Subversion (svnbot) 2008-04-01 12:16:53 Repository: asterisk Revision: 112138 U branches/1.4/main/dns.c ------------------------------------------------------------------------ r112138 | mmichelson | 2008-04-01 12:16:52 -0500 (Tue, 01 Apr 2008) | 10 lines Initialize the __res_state structure used for dns purposes to all 0's prior to using it. This is due to valgrind's complaints on issue ASTERISK-11707 as well as an excerpt found in "Description" portion of the online man page found here: http://www.iti.cs.tu-bs.de/cgi-bin/UNIXhelp/man-cgi?res_nquery+3RESOLV (pertains to issue ASTERISK-11707 but does not necessarily close it) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=112138 By: Digium Subversion (svnbot) 2008-04-01 12:18:38 Repository: asterisk Revision: 112148 _U trunk/ U trunk/main/dns.c ------------------------------------------------------------------------ r112148 | mmichelson | 2008-04-01 12:18:38 -0500 (Tue, 01 Apr 2008) | 18 lines Merged revisions 112138 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r112138 | mmichelson | 2008-04-01 12:21:21 -0500 (Tue, 01 Apr 2008) | 10 lines Initialize the __res_state structure used for dns purposes to all 0's prior to using it. This is due to valgrind's complaints on issue ASTERISK-11707 as well as an excerpt found in "Description" portion of the online man page found here: http://www.iti.cs.tu-bs.de/cgi-bin/UNIXhelp/man-cgi?res_nquery+3RESOLV (pertains to issue ASTERISK-11707 but does not necessarily close it) ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=112148 By: Digium Subversion (svnbot) 2008-04-01 12:20:30 Repository: asterisk Revision: 112157 _U branches/1.6.0/ U branches/1.6.0/main/dns.c ------------------------------------------------------------------------ r112157 | mmichelson | 2008-04-01 12:20:30 -0500 (Tue, 01 Apr 2008) | 26 lines Merged revisions 112148 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r112148 | mmichelson | 2008-04-01 12:23:19 -0500 (Tue, 01 Apr 2008) | 18 lines Merged revisions 112138 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r112138 | mmichelson | 2008-04-01 12:21:21 -0500 (Tue, 01 Apr 2008) | 10 lines Initialize the __res_state structure used for dns purposes to all 0's prior to using it. This is due to valgrind's complaints on issue ASTERISK-11707 as well as an excerpt found in "Description" portion of the online man page found here: http://www.iti.cs.tu-bs.de/cgi-bin/UNIXhelp/man-cgi?res_nquery+3RESOLV (pertains to issue ASTERISK-11707 but does not necessarily close it) ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=112157 By: Mark Michelson (mmichelson) 2008-04-01 12:23:29 I committed a fix which should silence valgrind's problems regarding dns searches. However, I'm not certain that this will solve the crash for which you posted your latest backtrace. Usually, when a crash happens in the poll() system function, the problem has to do with exceeding the maximum open file limit. If you use ulimit -n to increase this limit, do you still experience the crashes? By: Private Name (falves11) 2008-04-01 12:29:57 I always have had the ulimits settings below. I think it is plenty. So the crash does not seem related. [root@sipserver ~]# ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited max nice (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 400000 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 400000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 max rt priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 268288 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited By: Mark Michelson (mmichelson) 2008-04-01 15:12:48 Yes that number should be plenty. Does the crash still happen after upgrading to revision 112148? By: Abhay Gupta (agupta) 2008-04-01 21:34:19 putnopvut , please note that this server is running virtualisation server . I am not sure as to what happen if you increase the limits of a container beyond the levels set in base kernel . Does this still become effective . I am sure that the problem faced by falves11 is still related to OS and not with the asterisk code . By: Mark Michelson (mmichelson) 2008-04-03 09:53:13 Okay, since it is assumed that this is an OS problem, I am going to suspend this issue. If it is discovered that this is an Asterisk issue after all, please feel free to reopen. By: Private Name (falves11) 2008-04-03 17:18:44 It keps crashing although a a higher volume. The ulimit files is not an issue. The manufacturer of the OS says that the mother OS has a 262000 files limit, and we are using 0.2%. My individual container has plenty of room. I am attaching two new valgrind captures. By: Private Name (falves11) 2008-04-03 17:59:49 the version of Trunk correspomding to the crash is 112289 (valgrind_core_5.zip) By: Private Name (falves11) 2008-04-03 19:09:53 the latest file "crashhhhh.txt" happenned when I had starrted asterisk with -vvvgc, so no valgrind information. By: Private Name (falves11) 2008-04-04 13:12:21 It keeps blowing... By: Private Name (falves11) 2008-04-08 16:45:15 Dear Gentlment Is there any way to know if we have qa chance to fix this inestability, or we need a deeper research? I offer full access to my box for the Asterisk developers. If I concentrate the traffic in one box, it blows up. By: Mark Michelson (mmichelson) 2008-05-15 11:42:15 This appears to be the same crash experienced in 12463, which was closed. I'm going to close this too. |