Summary:ASTERISK-22540: WARNING[2324] asterisk.c: Fork failed: Cannot allocate memory in /var/log/asterisk/messages then segfault
Reporter:David Brillert (aragon)Labels:
Date Opened:2013-09-16 10:47:07Date Closed:2013-09-26 12:14:17
Versions:11.5.1 Frequency of
duplicatesASTERISK-23616 Big memory leak in logger.c
is related toASTERISK-22560 Memory leak in logger.c
Environment:SVN r398885Attachments:( 0) gdb-c_core_asterisk.txt
( 1) messages.rar
( 2) valgrind.rar
Description:Asterisk eats memory until it crashes.

Attached valgrind log and bt full from core file.
Comments:By: David Brillert (aragon) 2013-09-16 10:48:41.100-0500

Asterisk did not crash under Valgrind.

By: Matt Jordan (mjordan) 2013-09-16 11:00:37.279-0500

Your valgrind log makes it look like you were running Asterisk 11. What version were you running?

A large number of the memory leaks look to have occurred when a dialplan reload was issued. What type of dialplan are you using? Did the crash occur following a reload?

By: David Brillert (aragon) 2013-09-16 11:29:04.613-0500

There was a reload approximately 1 minute before the segfault.

svn info
Path: .
URL: http://svn.digium.com/svn/asterisk/branches/11
Repository Root: http://svn.digium.com/svn/asterisk
Repository UUID: f38db490-d61c-443f-a65b-d21fe96a405b
Revision: 398939
Node Kind: directory
Schedule: normal
Last Changed Author: newtonr
Last Changed Rev: 398885
Last Changed Date: 2013-09-11 20:02:37 -0400 (Wed, 11 Sep 2013)

I am uploading verbose debug log of dialplan reload.

By: David Brillert (aragon) 2013-09-16 11:42:44.410-0500

reload info in messages file attachment

By: Richard Mudgett (rmudgett) 2013-09-16 13:40:47.514-0500

FYI: Please use *.txt files not *.rar files since you have to download the *.rar file to view it.

By: David Brillert (aragon) 2013-09-16 13:52:25.901-0500

Sorry, but the messages file with verbose and debug enabled was huge unless I compress it (11.5 MB)
Valgrind.txt was over 3 MB

But I get the hint...

By: David Brillert (aragon) 2013-09-16 14:24:32.969-0500

Why am I the assignee on this bug report?

By: David Woolley (davidw) 2013-09-17 05:55:40.775-0500

You were an assignee because feedback was being awaited from you.  You cleared that when you asked why.

By: Rusty Newton (rnewton) 2013-09-17 13:23:02.657-0500

David Brillert, Did this happen after you upgraded from a previous revision to 398939 ?  If so, what verison were you revision were you using previous?

By: David Brillert (aragon) 2013-09-17 18:30:27.905-0500

Rev 398939 was installed to combat memory leaks we were seeing when we were already at Rev 397604
We were hoping commit 398103 was going to solve those leaks.
So I presume to answer your question, I don't know which commit may have caused the leaks or if this is a current regression.

#0  0x0000003d3fc45071 in sqlite3VdbeMemRelease () from /usr/lib64/libsqlite3.so.0
No symbol table info available.
#1  0x0000003d3fc45821 in sqlite3VdbeMemSetStr () from /usr/lib64/libsqlite3.so.0
No symbol table info available.
#2  0x0000003d3fc3c5d8 in sqlite3Error () from /usr/lib64/libsqlite3.so.0
No symbol table info available.
#3  0x0000003d3fc41f39 in ?? () from /usr/lib64/libsqlite3.so.0
No symbol table info available.
#4  0x0000003d3fc41ff8 in ?? () from /usr/lib64/libsqlite3.so.0
No symbol table info available.
#5  0x0000000000493fc1 in ast_db_gettree ()
No symbol table info available.
#6  0x00002b5c1dd995b3 in process_clearcache (ignore=<value optimized out>) at pbx_dundi.c:2208
       db_entry = 0x2b5bf91e9cd0
       db_tree = 0x0
       now = 1379297393
       __PRETTY_FUNCTION__ = "process_clearcache"

The core dump leads me to believe the problem is in sqlite3 and the last commit according to the logs that touched sqlite3 was

Revision 375763 - Directory Listing
Modified Sun Nov 4 01:17:25 2012 UTC (10 months, 1 week ago) by mjordan

Properly finalize prepared SQLite3 statements to prevent memory leak

The AstDB uses prepared SQLite3 statements to retrieve data from the SQLite3
database.  These statements should be finalized during Asterisk shutdown so
that the SQLite3 database can be properly closed.  Failure to finalize the
statements results in a memory leak and a failure when closing the database.

This patch fixes those issues by ensuring that all prepared statements are
properly finalized at shutdown.

(closes issue ASTERISK-20647)
Reported by: Corey Farrell
 astdb-sqlite3_close.patch uploaded by Corey Farrell (license 5909)

Merged revisions 375761 from http://svn.asterisk.org/svn/asterisk/branches/10

By: David Brillert (aragon) 2013-09-17 18:31:25.563-0500

feedback entered

By: David Brillert (aragon) 2013-09-17 18:32:56.500-0500

David Wolley:
I entered feedback and am still assignee
I entered a comment and am still assignee.

I don't get it.

By: Richard Mudgett (rmudgett) 2013-09-17 18:51:06.375-0500

It is a quirk of the issue tracker.  It seems to be loath to have issues unassigned.

When feedback is requested, the issue is assigned to the requested person.  If the issue was previously unassigned it does not unassign the issue when the issue sent back.

By: David Brillert (aragon) 2013-09-20 08:50:44.577-0500

This is a really service affecting problem, so if there is anything I can do to move this forward then I need to know what that is.

*CLI> core show uptime
System uptime: 7 minutes, 19 seconds
Last reload: 7 minutes, 19 seconds

And went from 0.9% ram to 63.8%.

No reloads and only 1 outgoing call was placed.
The memory leak happens when the system is idle.

By: David Brillert (aragon) 2013-09-24 08:22:25.136-0500

A couple of updates:

1. Upgraded the server to SVN r399516 and since then no more 'Fork failed: Cannot allocate memory' warnings and no new segfault as of 4 days.

2. /var/log/asterisk/security shows the server under some sort of attack which may be responsible for the problems.  Thousands of requests per second and 1 GB per day of logs in /var/log/asterisk/security and they look like this (IP's scrubbed):

[2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-127489",Severity="Informationa
[2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-147487",Severity="Informationa
[2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-165780",Severity="Informationa
[2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-188493",Severity="Informationa
[2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-209056",Severity="Informationa
[2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-228397",Severity="Informationa
[2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-246259",Severity="Informationa

By: Joel Vandal (joel_vandal) 2013-09-24 12:42:34.813-0500

We added a new rules on Fail2ban in order to block the attack that cause server to crash.

We will try to get SIP logs next time we got flooded .. and check if we can easily reproduce the flood, but look easy to crash any servers.

By: Matt Jordan (mjordan) 2013-09-24 13:06:10.326-0500

This is a really service affecting problem, so if there is anything I can do to move this forward then I need to know what that is.
*CLI> core show uptime
System uptime: 7 minutes, 19 seconds
Last reload: 7 minutes, 19 seconds
And went from 0.9% ram to 63.8%.
No reloads and only 1 outgoing call was placed.

The memory leak happens when the system is idle.

Since you were also testing the log system changes for remote consoles, I have a feeling you ran smack into ASTERISK-22560. I'm guessing that with the system updates, you also updated Asterisk to the latest SVN?

If not, definitely grab the patch for commit 399513 and apply it to your system:

svn diff -c 399513 https://origsvn.digium.com/svn/asterisk/branches/11

By: David Brillert (aragon) 2013-09-24 13:13:11.427-0500

We already include that fix since it was committed to our SVN r399516 install.
That's probably what made some of the memleaks disappear and allowed us to focus on the remote attack as the culprit.
Now this is starting to look like a different bug report and I should think about closing this one out.  Since SVN r399516 has only been in for 4 days I'll leave it open for a while longer in case the segfault does come back.

By: Matt Jordan (mjordan) 2013-09-24 14:41:35.409-0500

Sounds good; just update the ticket if the original "really bad" mem leak crops back up.

By: David Brillert (aragon) 2013-09-26 12:14:17.445-0500

Haven't seen the leak or crash since updating to svn r399516