[Home]

Summary:ASTERISK-15564: [patch] Multiple segfaults in leave_voicemail at app_voicemail.c:4451 Asterisk 1.4.29
Reporter:David Brillert (aragon)Labels:
Date Opened:2010-02-02 13:08:38.000-0600Date Closed:2011-06-07 14:01:02
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_voicemail
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20100205__issue16754.diff.txt
( 1) asteriskclir245044.txt
( 2) asterisklog02122010.txt
( 3) gdb_r245044.txt
( 4) segfault02122010.txt
( 5) valgrind02122010.txt
( 6) valgrindr245044.txt
Description:Asterisk 1.4.27 was locking at leave_voicemail and after upgrading to 1.4.29 in a desperate attempt to clear the locking Asterisk 1.4.29 started segfaulting every 2 or 3 minutes.  I had to downgrade to 1.4.27 to prevent the segfaults.
Here is the evidence of the lock in the ast CLI
app.c: Failed to lock path '/var/spool/asterisk/voicemail/customer/3329/INBOX': File exists.
The permissions on this folder look normal.
Under 1.4.27 core show locks was useless, core show channels was useless.
When voicemail directories start locking htop showed CPU utilization at 300 where normal load is about 1 on 1.4.27

Attaching Asterisk CLI history.
Ran 1.4.29 under Valgrind for a few minutes until it crashed so I could upload valgrind capture.
Uploading gdb file with bt, bt full, thread apply all bt full
I believe that while Asterisk was locked on mailbox leaving a message in mailbox 3329 caused the segfault.  Other mailboxes display similar locking symptoms in Asterisk CLI.



****** ADDITIONAL INFORMATION ******

Summary:
*1.4.27 is plagued by voicemail folder locking but does not crash, but CPU goes crazy.
*1.4.29 seems to experience same locking but segfaults.
Comments:By: David Brillert (aragon) 2010-02-02 13:20:00.000-0600

I am seeing .lock files with 0600 permissions in various voicemail box paths.
I have never seen .lock files in voicemail directories before...
The time stamp of .lock coincides with the time I upgraded to *1.4.29
Example:
/var/spool/asterisk/voicemail/customer/3329/INBOX/.lock

Edit: I forgot to post the full path to .lock file



By: David Brillert (aragon) 2010-02-02 15:05:11.000-0600

When the .lock file exists in the mailbox it is impossible to login to that voicemail box with voicemail main.  Once the .lock file is deleted from the voicemail directory then the voicemail box can be accessed.
What would cause a .lock file of 0 bytes to be written to a mailbox directory and why would not it not be deleted? The .lock file was written by 1.4.29 but not by 1.4.27.

By: David Brillert (aragon) 2010-02-05 08:23:32.000-0600

2.6.9-78.0.8.ELsmp #1 SMP Wed Nov 19 20:05:04 EST 2008 i686 i686 i386 GNU/Linux
CentOS 4.7
Dual Pentium Xeon 2.4GHz
2GB ECC RAM



By: David Brillert (aragon) 2010-02-05 09:25:18.000-0600

This is unbelievably easy to reproduce:
create a .lock file in any voicemail inbox folder like
/var/spool/asterisk/voicemail/default/6010/INBOX/.lock
Try to leave a message in mailbox 6010.
After you hangup to finish leaving message in 6010 Asterisk will segfault every time.

Attaching GDB to running Asterisk PID spits out some output like:
Attaching to program: /usr/sbin/asterisk, process 13064

warning: .dynamic section for "/lib/librt.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib/libvorbis.so.0" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib/libvorbisenc.so.2" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib/libodbc.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib/libcrypt.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib/libnsl.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib/libssl.so.6" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib/libodbcinst.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib/libkrb5.so.3" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib/libk5crypto.so.3" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib/libselinux.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib/libsepol.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib/libpopt.so.0" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib/libutil.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib/libbz2.so.1" is not at the expected address

The Asterisk CLI during the test looks like:
   -- Executing [*980@default-super:1] Answer("SIP/6010-00000000", "") in new stack
   -- Executing [*980@default-super:2] Wait("SIP/6010-00000000", "1") in new stack
lab*CLI>
   -- Executing [*980@default-super:3] Read("SIP/6010-00000000", "VMBOX|vm-extension") in new stack
   -- <SIP/6010-00000000> Playing 'vm-extension' (language 'en')
Really destroying SIP dialog 'cfa1a9b3-673aee68-91ed215@192.168.30.188' Method: SUBSCRIBE
   -- User entered '6010'
   -- Executing [*980@default-super:4] GotoIf("SIP/6010-00000000", "0?10") in new stack
   -- Executing [*980@default-super:5] Dial("SIP/6010-00000000", "Local/6010@default-local-voicemail") in new stack
   -- Called 6010@default-local-voicemail
   -- Executing [6010@default-local-voicemail:1] VoiceMail("Local/6010@default-local-voicemail-d3b2,2", "6010@default|b") in new stack
   -- Local/6010@default-local-voicemail-d3b2,1 answered SIP/6010-00000000
   -- <Local/6010@default-local-voicemail-d3b2,2> Playing '/var/spool/asterisk/voicemail/default/6010/busy' (language 'en')
Really destroying SIP dialog '494937892@192.168.30.90' Method: REGISTER
Really destroying SIP dialog 'e208bd1f-b2f77ce8-f9d6ce3d@192.168.30.194' Method: SUBSCRIBE
[Feb  5 10:22:34] WARNING[17703]: channel.c:946 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/6010@default-local-voicemail-d3b2,2
[Feb  5 10:22:35] WARNING[17703]: channel.c:946 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/6010@default-local-voicemail-d3b2,2
 == Spawn extension (default-super, *980, 5) exited non-zero on 'SIP/6010-00000000'
Extension Changed 6010[default-local] new state Idle for Notify User 6010
Extension Changed 6010[default-local] new state Idle for Notify User 6000
[Feb  5 10:22:37] WARNING[17710]: app.c:1048 ast_lock_path: Failed to lock path '/var/spool/asterisk/voicemail/default/6010/INBOX': File exists
[Feb  5 10:22:37] WARNING[17710]: file.c:764 ast_readaudio_callback: Failed to write frame
   -- <Local/6010@default-local-voicemail-d3b2,2> Playing 'beep' (language 'en')
   -- Recording the message
   -- x=0, open writing:  /var/spool/asterisk/voicemail/default/6010/tmp/zLJiMc format: wav49, 0x9a24b98
   -- User hung up
   -- Recording was 0 seconds long but needs to be at least 4 - abandoning

The interesting part of the CLI is that the actual duration of time spent recording the voicemail was about 10 seconds of spoken message and CLI reports 0 seconds.  Then the crash is immediate.

Same results in CentOS 5.4 with kernel 2.6.18-92.1.22.el5 #1 SMP

By: David Brillert (aragon) 2010-02-05 09:47:15.000-0600

channel.c:946 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/6010@default-local-voicemail-d3b2,2

These warnings are generated constantly while the caller is speaking into the voicemail box to leave a message in a voicemail INBOX with a .lock file in its directory.
For this reason I think this bug is a result of the fix for bug ASTERISK-14558 so I think this is a regression.  Since this bug is also results in a segfault every time Linux or Asterisk (I don't know which) writes a .lock file to a mailbox INBOX folder and a caller tries to leave a message in that folder I think this issue should also be a block.  I cannot put 1.4.29 into production because of this issue.

I have no idea why the .lock is written in the first place but manually creating the .lock file is enough to easily reproduce the segfault.

By: David Brillert (aragon) 2010-02-05 11:44:07.000-0600

Also if I record voicemail in /var/spool/asterisk/voicemail/default/3329/INBOX/.lock and wait until I see st_queue_frame: Exceptionally long voice queue length queuing to Local/6010@default-local-voicemail and then type core show locks then no locks appear in core show locks.

By: David Brillert (aragon) 2010-02-05 13:52:38.000-0600

Corydon76 requested on irc bugs channel that I test r245044.
Asterisk still crashes on presence of .lock file in INBOX folder.
I don't know what can cause the .lock file to be written and I don't think that issue was probably fixed in r245044 so when I re-tested r245044 I left the .lock file in the INBOX and Asterisk still segfaults.
I am uploading a new gdb capture to report as gdb_r245044.txt and a new valgrind capture as valgrindr245044.txt and asterisk cli as asteriskclir245044.txt



By: Tilghman Lesher (tilghman) 2010-02-05 15:03:14.000-0600

Okay, I have no idea how the userdata portion within astobj2 could ever be NULL, but that should fix it.  In other news, I had the two arguments reversed in the definition in app_voicemail, but that shouldn't affect anything.

By: David Brillert (aragon) 2010-02-05 15:26:19.000-0600

tilghman: Very cool, no crash with your patch but how to get rid of those pesky .lock files once they get written by whatever?  No crash, but I can't leave any messages in the mailbox and I still get those pesky "Exceptionally long voice queue length queuing" warnings because Asterisk can't write to inbox.

i.e.

Connected to Asterisk SVN-branch-1.4-r245044M currently running on lab (pid = 30690)
Verbosity is at least 3
Core debug is at least 3
   -- Executing [*980@default-super:1] Answer("SIP/6010-00000000", "") in new stack
   -- Executing [*980@default-super:2] Wait("SIP/6010-00000000", "1") in new stack
   -- Executing [*980@default-super:3] Read("SIP/6010-00000000", "VMBOX|vm-extension") in new stack
   -- <SIP/6010-00000000> Playing 'vm-extension' (language 'en')
   -- User entered '6010'
   -- Executing [*980@default-super:4] GotoIf("SIP/6010-00000000", "0?10") in new stack
   -- Executing [*980@default-super:5] Dial("SIP/6010-00000000", "Local/6010@default-local-voicemail") in new stack
   -- Called 6010@default-local-voicemail
   -- Executing [6010@default-local-voicemail:1] VoiceMail("Local/6010@default-local-voicemail-849a,2", "6010@default|b") in new stack
   -- Local/6010@default-local-voicemail-849a,1 answered SIP/6010-00000000
   -- <Local/6010@default-local-voicemail-849a,2> Playing '/var/spool/asterisk/voicemail/default/6010/busy' (language 'en')
Really destroying SIP dialog '1315406759-46424-1@BJC.BGI.DA.BJH' Method: REGISTER
[Feb  5 16:22:34] WARNING[31358]: channel.c:946 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/6010@default-local-voicemail-849a,2
[Feb  5 16:22:35] WARNING[31358]: channel.c:946 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/6010@default-local-voicemail-849a,2
[Feb  5 16:22:36] WARNING[31358]: channel.c:946 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/6010@default-local-voicemail-849a,2
[Feb  5 16:22:37] WARNING[31389]: app.c:1048 ast_lock_path: Failed to lock path '/var/spool/asterisk/voicemail/default/6010/INBOX': File exists
   -- <Local/6010@default-local-voicemail-849a,2> Playing 'beep' (language 'en')
   -- Recording the message
   -- x=0, open writing:  /var/spool/asterisk/voicemail/default/6010/tmp/Be2YXH format: wav49, 0x99ea678
   -- User ended message by pressing #
   -- <Local/6010@default-local-voicemail-849a,2> Playing 'auth-thankyou' (language 'en')
   -- <Local/6010@default-local-voicemail-849a,2> Playing 'vm-review' (language 'en')
   -- Saving message as is
   -- <Local/6010@default-local-voicemail-849a,2> Playing 'vm-msgsaved' (language 'en')
 == Spawn extension (default-super, *980, 5) exited non-zero on 'SIP/6010-00000000'
[Feb  5 16:22:53] WARNING[31389]: app.c:1048 ast_lock_path: Failed to lock path '/var/spool/asterisk/voicemail/default/6010/INBOX': File exists
[Feb  5 16:22:53] ERROR[31389]: app_voicemail.c:4347 leave_voicemail: Couldn't lock directory /var/spool/asterisk/voicemail/default/6010/INBOX.  Voicemail will be lost.



By: David Brillert (aragon) 2010-02-08 12:15:54.000-0600

tilghman: BTW all segfault data is uploaded with DON'T_OPTIMIZED checked in compile options so I don't know why sccp module shows values optimized out. And I'm not using any sccp in my configuration.

By: David Brillert (aragon) 2010-02-08 13:29:06.000-0600

tilghman:

Here is my hypothesis of what happened:
.lock is created by app_voicemail.c on count_messages function ...
-create a .lock
-look for all .txt files and increment by 1 the counter
-unlock / remove the lock file
-return result
-called when someone open his voicemail or when someone leave a message on a mailbox
-.lock files are written constantly but in some case (probably because of CPU load) the .lock file was not removed
-next access to INBOX caused crash because of orphaned .lock file

You mentioned you "have no idea how the userdata portion within astobj2 could ever be NULL".
I suspect that the null is because nothing was able to delete the orphaned .lock file in the mailbox due to CPU load.  It would be nice to see some sanity check to auto delete the .lock file if it shouldn't be there rather than a null... (If you have something you want me to test that does this then I will happily test).

Anyway, your patch prevents the segfault in the worst case scenario and I have no proof that the huge load which started all of this was caused by Asterisk. Therefore I'm happy if you commit your patch and close out this bug report.  I'll open another bug report if I find that Asterisk is causing the high CPU load.

By: David Brillert (aragon) 2010-02-11 08:32:33.000-0600

Tilghman:
I found root cause of the intense CPU load and the locking which caused this huge load.  I opened another bug report for this root cause which is ASTERISK-15611
The intense locking is in manager.c

By: Tilghman Lesher (tilghman) 2010-03-04 23:02:52.000-0600

I believe this issue is actually now fixed by the patch in ASTERISK-15706, which has been already committed.  Could you test the current branch without the patch in this issue and verify whether it indeed fixes the issue for you?

By: David Brillert (aragon) 2010-03-05 08:44:00.000-0600

Tested Asterisk SVN-branch-1.4-r250786

To reproduce the crash I must create a dummy .lock file into INBOX folder.  This is because the .lock file somehow got written to the INBOX and was never deleted.
/var/spool/asterisk/voicemail/default/6002/INBOX/.lock

I use feature code *980 to leave a message.

Now when I leave a message in that folder I no longer experience a crash.
I do get the dreaded "Exceptionally long voice queue length..." WARNING and the message is not saved because Asterisk cannot lock the folder.

[Mar  5 09:26:07] WARNING[30797]: channel.c:946 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/6002@default-local-voicemail-ac8f,2
[Mar  5 09:26:08] WARNING[30831]: app.c:1048 ast_lock_path: Failed to lock path '/var/spool/asterisk/voicemail/default/6002/INBOX': File exists
   -- <Local/6002@default-local-voicemail-ac8f,2> Playing 'beep' (language 'en')
   -- Recording the message
   -- x=0, open writing:  /var/spool/asterisk/voicemail/default/6002/tmp/k4n7pE format: wav49, 0xa25d838
Really destroying SIP dialog '077d6ef72e28ed814d08cdd744c93772@192.168.30.254' Method: OPTIONS

[Mar  5 09:26:24] WARNING[30831]: app.c:1048 ast_lock_path: Failed to lock path '/var/spool/asterisk/voicemail/default/6002/INBOX': File exists
[Mar  5 09:26:24] ERROR[30831]: app_voicemail.c:4349 leave_voicemail: Couldn't lock directory /var/spool/asterisk/voicemail/default/6002/INBOX.  Voicemail will be lost.

 == Spawn extension (default-super, *980, 5) exited non-zero on 'SIP/6010-00000003'
Extension Changed 6010[default-local] new state Idle for Notify User 6000
[Mar  5 09:26:19] WARNING[30831]: file.c:764 ast_readaudio_callback: Failed to write frame
   -- <Local/6002@default-local-voicemail-ac8f,2> Playing 'vm-goodbye' (language 'en')
Really destroying SIP dialog 'f077f17e-15da597d-cac1a50c@192.168.30.194' Method:BYE