[Home]

Summary:ASTERISK-09250: coredump in 1.2.16 in app_voicemail.c:5565
Reporter:dtyoo (dtyoo)Labels:
Date Opened:2007-04-12 08:49:54Date Closed:2007-05-08 01:26:12
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_voicemail
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 1.2.18-vm-crash-050407.txt
( 1) 20070507__bug9527.diff.txt
( 2) crash_bt_no_optimizations.txt
( 3) crash_server1.txt
( 4) crash_server2.txt
( 5) possible_crash_scenario.txt
Description:Every 2 to 3 weeks at least one of our asterisk servers crashes.  We have a farm of asterisk servers that run at reasonably high load during the day (50-100 concurrent calls on any server at any given time).  It is not something that we can re-produce at will, but it has been pretty reliably happening over the last 6 months at least (all on the 1.2.X branch).

I have backtraces from 2 core dumps from 2 different servers that both seem to point in the same direction.  Asterisk in our production environment is compiled with optimizations turned on, so I'm not sure how useful these backtraces will be.  I am going to turn off optimizations on at least some subset of our servers to attempt to capture a better trace info the next time this happens.

This issue looks like it could be similar to 9103, but the bt seems to point in a different direction, so I thought it best to open a new bug.  I am happy to provide any additional info needed.

****** ADDITIONAL INFORMATION ******

Asterisk 1.2.16, CentOS 4.4, Dell 1950
Comments:By: dtyoo (dtyoo) 2007-04-20 07:44:08

I have a new backtrace produced from a crash where optimizations were turned off.  It looks similar to the previous ones I had reported, but I'm attaching it just in case someone else can learn anything else from it.

By: dtyoo (dtyoo) 2007-04-20 08:02:10

Just one more note.  This is definitely happened when one of our users was checking voicemail.  We are making use of unison to synchronize voicemail messages across different servers.  Perhaps there is some sort of condition where unison is making changes that asterisk is not expecting?

By: Tilghman Lesher (tilghman) 2007-04-20 08:05:19

Given that there have been 6 changes to app_voicemail since the release of 1.2.16, it is not entirely unreasonable to ask you to upgrade to the latest SVN 1.2 to see if this is still an issue.

By: callguy (callguy) 2007-04-20 11:46:02

The same issue exists in 1.2 trunk.

We've looked into this a bit more, i noticed you test if the variable is not null, free it, but do not stamp the free'd values to null.
(so the second time the block of code is ran, it will break.

typically:

vms.deleted = calloc(vmu->maxmsg, sizeof(int));

...

if (vms.deleted)
  free(vms.deleted);


should be:

if (vms.deleted) {
  free(vms.deleted);
  vms.deleted = NULL;
}

By: Jason Parker (jparker) 2007-04-20 15:32:52

vms is local to the vm_execmain function.  Once it leaves that function, vms goes away.  There is nothing after the free()'s that would still be using vms.

Though, I will agree that your backtrace does, in fact, show that it's one of those two free()'s that are causing the crashes.

Are you able to reproduce this reliably?  Can you determine a sequence of events (likely going through certain parts of the menu, in a certain order) that cause this?  That would make it a bit easier to find.

By: callguy (callguy) 2007-04-22 21:20:15

We are able to reproduce this reliably (we see a frequency of roughly once every 2-3 weeks per occurrence, per server) as all of our backtraces point to the same thing.

The only commonality is it does appear that it is happening during voicemail deletion, but beyond that the specific sequence is unclear.

By: dtyoo (dtyoo) 2007-04-25 14:29:32

Corydon76-

Are all the changes you were alluding to contained in the new 1.2.18 release?  We try not to run svn trunk in our production environment if at all possible.  The thought being that the named releases are being tested more thoroughly and are thus more stable / less risky.

By: Tilghman Lesher (tilghman) 2007-04-25 16:46:21

Yes, those changes are in 1.2.18, and I was not suggesting that you upgrade to SVN trunk, only SVN 1.2 (which is the same branch from which releases are generated).

By: mustardman (mustardman) 2007-05-02 16:01:33

callguy,

Have you confirmed that your changes rectify the problem?  I had an Asterisk server crash today using v1.2.13 and it seem to be the exact same problem you see.  Happened during a second voicemail deletion.

By: callguy (callguy) 2007-05-02 16:04:15

we haven't experienced it yet in 1.2.18, but i'd say that's inconclusive - not enough time has passed. Can you post the exact steps that you used to reproduce - if we have that we'll test against 1.2.18.

By: mustardman (mustardman) 2007-05-02 16:33:15

I uploaded the "potential crash scenario.txt" log file.  The spaces are where I deleted irrelevant log information.

I am not on site so I cannot tell you the exact key sequence while in front of a phone but it appears relatively easy to figure that out from the log file.

The last line is the very last entry before Asterisk stopped logging/responding.  I'm not running a debugger so the log file is the best I have.



By: callguy (callguy) 2007-05-04 16:25:30

We've confirmed that this problem still exists in 1.2.18. I've uploaded the file 1.2.19-vm-crash-050407.txt, which shows the bt & bt full for this. We do believe that it is either the exact, or very similar scenario to what mustardman reported that is causing this.

By: dtyoo (dtyoo) 2007-05-07 18:08:06

Corydon76-

I have been analyzing mustardman's log file to try to get to the bottom of this.  I setup a test environment exactly in such a way to replicate the steps described in the log.  I am now able to consistently reproduce the issue.  The issue happens when you delete a message in one folder, then switch folders, then undelete the message in the new folder.  The undelete option is not announced after you switch folders, but you can still hit 7 to undelete the message in the new folder.  Then delete the message again and hit # to exit.  Asterisk will crash when you exit.

Here is an example set of steps that re-produce the issue:

Initial Setup:  2 new messages, empty folder 7 (Cust3).

1.  Login
2.  1 for new messages
3.  9 to save, save into folder 7
4.  6 for next message, 7 to delete
5.  * for menu, 2 to switch folder, change to folder 7.
6.  7 (unannouced) to undelete the message.
7.  1 to play message
8.  7 to delete the message
9.  # to exit.  Asterisk will crash at this point.

By: mustardman (mustardman) 2007-05-07 20:27:24

Nice work dtyoo!

By: Tilghman Lesher (tilghman) 2007-05-07 22:13:55

dtyoo:  thank you for your research.  This is not something I would have found otherwise, because it involved using DTMF sequences that were not listed as choices.

See patch, which fixes the issue.  Please test and confirm the fix.

By: callguy (callguy) 2007-05-07 22:48:41

Corydon76-

We tested the patch and it does resolve this issue. Thanks for the help getting this fixed.

By: callguy (callguy) 2007-05-07 22:51:36

Corydon76-

One additional note, I believe the same bug exists in the 1.4 branch as well, do you think you could patch app_voicemail.c in both branches when you commit?

By: Tilghman Lesher (tilghman) 2007-05-08 01:26:12

Committed, revisions 63359, 63360, 63361.