[Home]

Summary:ASTERISK-11707: Started to crash every 2-3 hours
Reporter:Private Name (falves11)Labels:
Date Opened:2008-03-23 17:28:40Date Closed:2008-05-15 11:42:30
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) blowup.txt
( 1) blowup1.txt
( 2) crassssh.txt
( 3) malloc_debug.txt
( 4) valgrind_4-4-08-2-14PM.log
( 5) valgrind_btfull_new_3.txt
( 6) valgrind_core_4-4-08-3-15PM.txt
( 7) valgrind_core_5.zip
( 8) valgrind_core1.txt
( 9) valgrind_core2.txt
(10) valgrind_core3.txt
(11) valgrind_crash_new
(12) valgrind_crash_new_1
(13) valgrind_crash_new_2
(14) valgrind_crash_new_3.zip
(15) valgrind_txt_3.txt
(16) valgrind.txt
(17) valgrind1.txt
(18) valgrind2.txt
(19) valgrind3.txt
(20) valgrind5.txt
Description:I upgraded yersterday to version SVN-trunk-r110578M and it crashes every  few hours. The box had not seen action and started to be used.
Comments:By: Private Name (falves11) 2008-03-24 11:56:01

Please help, it crashed almost 100 times today.

By: Mark Michelson (mmichelson) 2008-03-25 11:07:01

This is an issue of memory corruption, so valgrind output would be helpful. Please see doc/valgrind.txt

By: Private Name (falves11) 2008-03-27 20:50:22

When I start valgrind with this command:
valgrind --log-file-exactly=valgrind.txt asterisk -cg 2>malloc_debug.txt

.....SIP channel loading...
.......................[Mar 27 21:40:52] ERROR[1442]: pbx.c:2448 ast_func_read: Function STRFTIME not registered

Is this normal????

By: Mark Michelson (mmichelson) 2008-03-28 10:03:43

This means that you are attempting to use the STRFTIME function but it did not get registered. This function is registered in the func_strings module. Please be sure that you are loading this module.

Another issue is that the timing could be off somewhat. You may be attempting to use the STRFTIME function prior to its being loaded. If you only see the message once, I would assume that that this is what is happening.

Thank you for the valgrind output!

By: Digium Subversion (svnbot) 2008-03-28 11:32:27

Repository: asterisk
Revision: 111662

U   trunk/channels/chan_sip.c
U   trunk/include/asterisk/strings.h

------------------------------------------------------------------------
r111662 | mmichelson | 2008-03-28 11:32:25 -0500 (Fri, 28 Mar 2008) | 9 lines

The copy_request function did not take into account the necessary null terminator
for the string to be copied into. This resulted in parse_request reading invalid
memory beyond the end of the string, and in some cases led to crashes. Thanks
to falves11 for providing the valgrind output which led to the closure of this issue.

(closes issue ASTERISK-11707)
Reported by: falves11


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=111662

By: Digium Subversion (svnbot) 2008-03-28 11:33:23

Repository: asterisk
Revision: 111663

_U  branches/1.6.0/

------------------------------------------------------------------------
r111663 | mmichelson | 2008-03-28 11:33:21 -0500 (Fri, 28 Mar 2008) | 16 lines

Blocked revisions 111662 via svnmerge

........
r111662 | mmichelson | 2008-03-28 11:36:59 -0500 (Fri, 28 Mar 2008) | 9 lines

The copy_request function did not take into account the necessary null terminator
for the string to be copied into. This resulted in parse_request reading invalid
memory beyond the end of the string, and in some cases led to crashes. Thanks
to falves11 for providing the valgrind output which led to the closure of this issue.

(closes issue ASTERISK-11707)
Reported by: falves11


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=111663

By: Private Name (falves11) 2008-03-28 12:05:06

I downloaded the latest release vis SVN, hoping to catch 111663, but I only gte this in main/version.c.
static const char asterisk_version[] = "SVN-trunk-r111662";

Is this normal or there is no revision 111663 yet available for downloading?
My system keeps restarting.

By: Mark Michelson (mmichelson) 2008-03-28 12:18:39

As of this moment, the latest revision to trunk is 111662, so there is no 111663 revision to check out. Re-closing.

By: Private Name (falves11) 2008-03-28 12:35:44

There is something wrongh with revision 111662. I cannot get a single call through.If somebody wants to access the system and try to see what happens, please contact me at falves1@hotmail.com. Nothing changed on my end.

By: Private Name (falves11) 2008-03-28 13:00:39

I installed the new version SVN-trunk-r111721 on two boxes and no call gets through and I get no verbose messages of the calls arriving, just
== Using SIP RTP CoS mark 5
 == Using UDPTL CoS mark 5
no matter what verbose level I choose.

By: Mark Michelson (mmichelson) 2008-03-28 13:25:09

Okay, Yes, there is something definitely wrong here. I'll get this fixed.

By: Digium Subversion (svnbot) 2008-03-28 14:58:57

Repository: asterisk
Revision: 111811

U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r111811 | mmichelson | 2008-03-28 14:58:49 -0500 (Fri, 28 Mar 2008) | 11 lines

This time the fix is proper for issue 12284. I have tested it thoroughly and found
that valgrind no longer complains and that calls do complete correctly.

The fix is along the same lines as before: Make sure the final null terminator gets copied
into the new sip_request's data pointer. Without it, parse_request will read and potentially
write past the end of the string, causing potential crashes.

(closes issue ASTERISK-11707...for real this time!)
reported by falves11


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=111811

By: Digium Subversion (svnbot) 2008-03-28 14:59:27

Repository: asterisk
Revision: 111812

_U  branches/1.6.0/

------------------------------------------------------------------------
r111812 | mmichelson | 2008-03-28 14:59:25 -0500 (Fri, 28 Mar 2008) | 18 lines

Blocked revisions 111811 via svnmerge

........
r111811 | mmichelson | 2008-03-28 15:03:16 -0500 (Fri, 28 Mar 2008) | 11 lines

This time the fix is proper for issue 12284. I have tested it thoroughly and found
that valgrind no longer complains and that calls do complete correctly.

The fix is along the same lines as before: Make sure the final null terminator gets copied
into the new sip_request's data pointer. Without it, parse_request will read and potentially
write past the end of the string, causing potential crashes.

(closes issue ASTERISK-11707...for real this time!)
reported by falves11


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=111812

By: Private Name (falves11) 2008-03-28 15:41:57

It does not crash but it restarts itself inside valgrind. In fact, I am going keep running on valgrind until the final release. I am attaching valgrind2.txt.



By: Private Name (falves11) 2008-03-28 15:55:01

I took out of valgrind. It is restarting itself after a few minutes. I think that valgrind does not lety it crash. I don't know what to do. Can somebody log into my box and see if we can figure it out? please contact at falves1@otmail.com

By: Abhay Gupta (agupta) 2008-03-28 22:22:20

Till now it was more beacuse of socket error of the OS . Now that it is resolved will give the correct picture if the crash occurs .

By: Private Name (falves11) 2008-04-01 10:44:10

It keeps crashing. This morning it crashef twice. I uploaded the valgrind core bt full and thread apply all bt full. Also I uploaded the file valgrind.txt

By: Digium Subversion (svnbot) 2008-04-01 12:16:53

Repository: asterisk
Revision: 112138

U   branches/1.4/main/dns.c

------------------------------------------------------------------------
r112138 | mmichelson | 2008-04-01 12:16:52 -0500 (Tue, 01 Apr 2008) | 10 lines

Initialize the __res_state structure used for dns purposes
to all 0's prior to using it. This is due to valgrind's complaints
on issue ASTERISK-11707 as well as an excerpt found in "Description" portion
of the online man page found here:

http://www.iti.cs.tu-bs.de/cgi-bin/UNIXhelp/man-cgi?res_nquery+3RESOLV

(pertains to issue ASTERISK-11707 but does not necessarily close it)


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=112138

By: Digium Subversion (svnbot) 2008-04-01 12:18:38

Repository: asterisk
Revision: 112148

_U  trunk/
U   trunk/main/dns.c

------------------------------------------------------------------------
r112148 | mmichelson | 2008-04-01 12:18:38 -0500 (Tue, 01 Apr 2008) | 18 lines

Merged revisions 112138 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r112138 | mmichelson | 2008-04-01 12:21:21 -0500 (Tue, 01 Apr 2008) | 10 lines

Initialize the __res_state structure used for dns purposes
to all 0's prior to using it. This is due to valgrind's complaints
on issue ASTERISK-11707 as well as an excerpt found in "Description" portion
of the online man page found here:

http://www.iti.cs.tu-bs.de/cgi-bin/UNIXhelp/man-cgi?res_nquery+3RESOLV

(pertains to issue ASTERISK-11707 but does not necessarily close it)


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=112148

By: Digium Subversion (svnbot) 2008-04-01 12:20:30

Repository: asterisk
Revision: 112157

_U  branches/1.6.0/
U   branches/1.6.0/main/dns.c

------------------------------------------------------------------------
r112157 | mmichelson | 2008-04-01 12:20:30 -0500 (Tue, 01 Apr 2008) | 26 lines

Merged revisions 112148 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r112148 | mmichelson | 2008-04-01 12:23:19 -0500 (Tue, 01 Apr 2008) | 18 lines

Merged revisions 112138 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r112138 | mmichelson | 2008-04-01 12:21:21 -0500 (Tue, 01 Apr 2008) | 10 lines

Initialize the __res_state structure used for dns purposes
to all 0's prior to using it. This is due to valgrind's complaints
on issue ASTERISK-11707 as well as an excerpt found in "Description" portion
of the online man page found here:

http://www.iti.cs.tu-bs.de/cgi-bin/UNIXhelp/man-cgi?res_nquery+3RESOLV

(pertains to issue ASTERISK-11707 but does not necessarily close it)


........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=112157

By: Mark Michelson (mmichelson) 2008-04-01 12:23:29

I committed a fix which should silence valgrind's problems regarding dns searches. However, I'm not certain that this will solve the crash for which you posted your latest backtrace. Usually, when a crash happens in the poll() system function, the problem has to do with exceeding the maximum open file limit. If you use ulimit -n to increase this limit, do you still experience the crashes?

By: Private Name (falves11) 2008-04-01 12:29:57

I always have had the ulimits settings below. I think it is plenty. So the crash does not seem related.
[root@sipserver ~]# ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
max nice                        (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 400000
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 400000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
max rt priority                 (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 268288
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

By: Mark Michelson (mmichelson) 2008-04-01 15:12:48

Yes that number should be plenty. Does the crash still happen after upgrading to revision 112148?

By: Abhay Gupta (agupta) 2008-04-01 21:34:19

putnopvut , please note that this server is running virtualisation server . I am not sure as to what happen if you increase the limits of a container beyond the levels set in base kernel . Does this still become effective .

I am sure that the problem faced by falves11 is still related to OS and not with the asterisk code .

By: Mark Michelson (mmichelson) 2008-04-03 09:53:13

Okay, since it is assumed that this is an OS problem, I am going to suspend this issue. If it is discovered that this is an Asterisk issue after all, please feel free to reopen.

By: Private Name (falves11) 2008-04-03 17:18:44

It keps crashing although a a higher volume. The ulimit files is not an issue. The manufacturer of the OS says that the mother OS has a 262000 files limit, and we are using 0.2%. My individual container has plenty of room. I am attaching two new valgrind captures.

By: Private Name (falves11) 2008-04-03 17:59:49

the version of Trunk correspomding to the crash is 112289 (valgrind_core_5.zip)

By: Private Name (falves11) 2008-04-03 19:09:53

the latest file "crashhhhh.txt" happenned when I had starrted asterisk with
-vvvgc, so no valgrind information.

By: Private Name (falves11) 2008-04-04 13:12:21

It keeps blowing...

By: Private Name (falves11) 2008-04-08 16:45:15

Dear Gentlment
Is there any way to know if we have qa chance to fix this inestability, or we need a deeper research? I offer full access to my box for the Asterisk developers. If I concentrate the traffic in one box, it blows up.

By: Mark Michelson (mmichelson) 2008-05-15 11:42:15

This appears to be the same crash experienced in 12463, which was closed. I'm going to close this too.