[Home]

Summary:ASTERISK-04145: [patch] Voicemail Synchronization Issue
Reporter:ssljivic (ssljivic)Labels:
Date Opened:2005-05-12 05:27:50Date Closed:2008-01-15 15:41:21.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_voicemail
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) log.txt
( 1) vm-sync-issue-1.0.diff.txt
( 2) vm-sync-issue-1.1.patch
Description:It happens that concurrent calls overwrite each others voicemail files.

I have generated 40 concurrent calls to the voicemail application. I have received 40 email notifications, but the INBOX folder had only 38 messages.

This issue was treated in bug with ID 0003394, but it seems that it has not been resolved.

I'm uploading the log file for this test.

Regards,
Stojan Sljivic
Comments:By: ssljivic (ssljivic) 2005-05-12 07:17:35

I have found the error.
It is never checked whether the path lock was obtained and there are several situations when the function ast_lock_path() will return without successful lock. One of them is if 5 second timeout occur.

That is what happened in my test (since Asterisk was overloaded).

I'll fix this and upload the patch soon.

Regards,
Stojan Sljivic

By: ssljivic (ssljivic) 2005-05-13 06:55:32

I have uploaded the patch that resolves this issue.
My disclaimer is on file.

Regards,
Stojan Sljivic

By: Mark Spencer (markster) 2005-05-14 19:26:20

Why would the lock take over 5 seconds to free?

By: Kevin P. Fleming (kpfleming) 2005-05-14 20:14:51

There are some issues with your patch, but first, I'd agree with Mark that there shouldn't be any reason why it should take more than 5 seconds to take the lock in the voicemail spool directory for the mailbox.

There are no long-running operations in app_voicemail during the time that the lock is held, unless the system that Asterisk is running on is severely loaded and the filesystem is taking long periods of time to respond to file creation/removal.

With that said, I don't disagree with the patch on principle; if the lock can fail to be taken (as it can for other reasons besides just a timeout), then the code should take that into account rather than ignoring it.

Mark, what do you say? Should I work with him to get his patch cleaned up for merging?

By: Mark Spencer (markster) 2005-05-15 19:05:23

I'm happy with checking the lock, but are we sure that's really the cause and that there isn't some other sort of race, potentially?

By: Kevin P. Fleming (kpfleming) 2005-05-15 19:50:00

It is theoretically possible, but not very likely, as his log trace specifically shows locking failures due to the file already existing (and the only way ast_lock_path() can report that is if it times out).

Stojan: what platform/operating system/filesystem are you running this on?

By: ssljivic (ssljivic) 2005-05-16 03:46:38

Hi,

We are using the Fedora Core 3 (kernel 2.6). File system is ext3.
The machine we use is an old PIII 800MHz, 512 MB RAM.

Regarding the issue it occurs when the Asterisk is overloaded and cannot get the lock within the 5 sec interval. Since this machine is not some high performance computer that is exactly what happened with 40 concurrent calls.

Regards,
Stojan Sljivic

By: Mark Spencer (markster) 2005-05-25 13:05:56

This needs a functionality test and code review before i can put this in.

By: Michael Jerris (mikej) 2005-05-25 13:05:58

We need code review and functionality test on this one.

By: ssljivic (ssljivic) 2005-06-10 09:48:59

Hi,

What is the current status of this patch?
Will it be included in the next STABLE release?

Regards,
Stojan Sljivic

By: Michael Jerris (mikej) 2005-06-10 10:18:58

The current status is the same as when mark and I last commented.  We need functionality testing on this, and people to comment that they have been testing this without any unforseen issues.  As for stable, inclusion will be left up to the stable maintainer, but I see no reason this would not be merged into stable as well, however as we have already had a freeze for 1.0.8RC this will likely not make it into 1.0.8.  Please find some people to run this patch on their systems and comment on the bug.

By: Michael Jerris (mikej) 2005-06-19 13:22:16

We still need testers on this.  Anyone?

By: Boris Zolotarev (boriszolotarev) 2005-06-27 05:23:27

Hi,

I tested this on Intel Celeron (600Mhz) using SIP/RTP stress test tool.
With 40 simultaneous calls to voicemail box the machine was overloaded and some of voicemails were not saved well (voicemail file was overwritten with next voicemail).
Then I applied the patch and there was no overwritten files anymore.

However, the same test, with even more simultaneous calls went fine on Intel P4 on 3.2Mhz with SCSI discs.
I sent 200 calls to asterisk VM box (5 calls per second) and 166 voicemails were saved well while remaining got timeout.
There was no overwritten files on this (stronger) machine although I did not apply patch on this machine.

Cheers,
Boris Zolotarev



By: Michael Jerris (mikej) 2005-07-05 22:29:21

This is ready to go in but no longer applies to cvs head.  Could you please get an updated patch for head so that this can get in.

Thanks

By: ssljivic (ssljivic) 2005-07-07 07:01:25

I will update patch for the latest HEAD and post it soon.

By: ssljivic (ssljivic) 2005-07-11 08:25:20

Hi,

I have uploaded the patch against latest HEAD version (CVS HEAD 07/11/05-08:53:54).

Regards,
Stojan Sljivic

By: Michael Jerris (mikej) 2005-07-11 08:27:09

kevin-  This one was ready to go pending patch update, now updated.

By: Kevin P. Fleming (kpfleming) 2005-07-11 20:56:53

Committed to CVS HEAD, thanks!

By: twisted (twisted) 2005-07-13 13:06:15

The patch commited created locking issues - reopening.  

We have a customer who was getting a certain mailbox, (141) with the following scenario:

log in with password, immediate disconnect, however the code was sent through the goto for "out".  This made me dig further and find that it was returning with the ERROR_LOCK_PATH result, which, should not happen, since the box failed even when it was simply just created, and also when nothing else was locking it.

I rolled the voicemail code back on the box in question from 1.224 to 1.222 (pre-locking change) to get it resolved.  

Mark requested a code review/testing phase - WHY didn't this happen before it was commited? ...kp?

By: Kevin P. Fleming (kpfleming) 2005-07-13 15:19:40

Yeah, this patch was badly broken. It should be fixed up now, sorry for the breakage.

By: Digium Subversion (svnbot) 2008-01-15 15:40:50.000-0600

Repository: asterisk
Revision: 6091

U   trunk/apps/app_voicemail.c

------------------------------------------------------------------------
r6091 | kpfleming | 2008-01-15 15:40:50 -0600 (Tue, 15 Jan 2008) | 2 lines

fix voicemail path locking problems (bug ASTERISK-4145)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=6091

By: Digium Subversion (svnbot) 2008-01-15 15:41:21.000-0600

Repository: asterisk
Revision: 6125

U   trunk/app.c
U   trunk/apps/app_voicemail.c
U   trunk/include/asterisk/app.h

------------------------------------------------------------------------
r6125 | kpfleming | 2008-01-15 15:41:20 -0600 (Tue, 15 Jan 2008) | 2 lines

fix up lock breakage from bug ASTERISK-4145

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=6125