|Summary:||ASTERISK-19923: Asterisk crashing due to memory corruptions in chan_sip/voicemail|
|Reporter:||Dan Delaney (drdelaney)||Labels:|
|Date Opened:||2012-05-29 14:52:45||Date Closed:||2012-06-21 12:38:30|
|Environment:||CentOS 5.4 . Intel(R) Xeon(R) CPU E31220 @ 3.10GHz, 4g RAM. asterisk 184.108.40.206-RC1||Attachments:||( 0) core.10435.backtrace.txt|
( 1) core.10435.console.txt
( 2) core.32003.backtrace.txt
( 3) core.32003.console.txt
( 4) core.3467.backtrace.txt
( 5) core.3467.console.txt
( 6) core.3698.backtrace.txt
( 7) core.5849.backtrace.txt
( 8) core.7477.backtrace.txt
( 9) core.7477.console.txt
|Description:||Intermittent crashes while using voicemail. Have been unable to intentionally reproduce. Backtraces and consoles will be attached.|
|Comments:||By: Dan Delaney (drdelaney) 2012-05-29 15:19:33.270-0500|
all backtraces attached, and consoles for some matching backtraces.
By: Rusty Newton (rnewton) 2012-05-30 16:01:12.148-0500
Please provide any applicable configuration files. voicemail.conf and config files for any channels that have related mailboxes, such as sip.conf.
By: Dan Delaney (drdelaney) 2012-05-30 18:05:21.296-0500
skip_sip.conf is the sip configuration for known problematic extensions.
skip_voicemail.conf is the vm configuration.
Entries for the pin, domains, ips and sip secrets have been changed for security reasons.
By: Dan Delaney (drdelaney) 2012-05-30 18:08:18.870-0500
attached requested information.
By: Rusty Newton (rnewton) 2012-06-04 09:27:32.336-0500
Your backtrace appears to contain memory corruption and we require valgrind output in order to move this issue forward. Please see https://wiki.asterisk.org/wiki/display/AST/Valgrind for more information about how to produce debugging information. Thanks!
By: Dan Delaney (drdelaney) 2012-06-04 11:23:27.957-0500
Using 220.127.116.11-rc2, on a different machine, was not able to get it to crash under valgrind. I was able to reproduce on same machine with RC1, however, it was sporadic.
I will attempt to gather data on the production system today and see if I can gather the crashed memory info.
Attached is the non-crashed valgind data.
By: Dan Delaney (drdelaney) 2012-06-04 22:32:29.291-0500
attached is a copy of the valgrind data.
i was not able to get the process to crash, however my test call did end once (not sure if it was related or not)
this is with 18.104.22.168-rc2
the only way known to reproduce is to listen to a bunch of voicemail files, and it will randomly crash. In this instance i listened to about 36 voicemails.
Will try to obtain crash data again if needed. since this is a production system I have to do testing after hours due to strain valgrind puts on system.
By: Dan Delaney (drdelaney) 2012-06-04 22:33:22.257-0500
attached a few valgrind data dumps. was not crashing in tests, however the attached data may help.
By: Dan Delaney (drdelaney) 2012-06-12 15:40:04.700-0500
I have been attempting to further replicate this. Using valgrind makes this almost impossible, as asterisk can barely process the calls. I am using the details from the above link, and CPU goes to 100% from memcheck.
A symptom of this shows up as a large number of new messages, when only two exist, then it crashes afterwards.
As of this time, all I can do is provide more crash reports, and not valgrind memory dumps (unless the non-crashed mem dumps are helping).
If there are any other known tweaks to get this working without a high cpu load, I can attempt to run this in a production environment to get the needed data.
By: Kinsey Moore (kmoore) 2012-06-13 13:43:36.588-0500
Would you mind giving the attached patch a try? It was created for another issue, but this seems to be very similar.
By: Dan Delaney (drdelaney) 2012-06-13 14:26:17.820-0500
Also something to note is we have upgraded to 22.214.171.124rc2 and 126.96.36.199rc1 respectfully and still seeing same issue. This patch applies cleanly to 14.0rc1. Will roll out to system and check if it resolves issue.
By: Kinsey Moore (kmoore) 2012-06-15 13:13:40.008-0500
How are things looking on your end with the patch applied?
By: Dan Delaney (drdelaney) 2012-06-15 14:07:46.182-0500
So far so good. The patch has been applied cleaning to 14rc1 and 13.0 stable (we had to downgrade due to a bug with the parking groups).
There has been no crashes yet. I would like to keep this open over the weekend as the customer this is fixing stated they do more VM work on Mondays. I have not been able to reproduce this on an other system on demand. I would say if theres no updates by Monday we are good.
By: Matt Jordan (mjordan) 2012-06-15 15:32:26.411-0500
Reading your last comment - is there a bug in 188.8.131.52-rc1 with respect to parking? If so, would you mind opening another JIRA issue?
By: Dan Delaney (drdelaney) 2012-06-15 16:10:26.161-0500
Opened up at ASTERISK-20012
By: Dan Delaney (drdelaney) 2012-06-19 11:03:15.256-0500
so far this appears to have fixed the issue. I am going to keep an eye on it some more as the customer sometimes went a day or two without issues.
By: Julian Yap (jyap) 2012-06-21 00:31:53.063-0500
I have fully tested this patch on a system which had this issue on certified-asterisk-1.8.11-cert2 as well as Asterisk 184.108.40.206. This patch has been tested and is working fine on a production system running: certified-asterisk-1.8.11-cert2
By: Dan Delaney (drdelaney) 2012-06-21 11:24:30.123-0500
has been working for almost a week with no issues or crashes. i would say this can be closed.
By: Kinsey Moore (kmoore) 2012-06-21 12:21:26.763-0500
Added link to relevant issue.
By: Dan Delaney (drdelaney) 2012-06-21 12:38:30.992-0500
the patch resolved the issue