Summary: | ASTERISK-22540: WARNING[2324] asterisk.c: Fork failed: Cannot allocate memory in /var/log/asterisk/messages then segfault | ||||||
Reporter: | David Brillert (aragon) | Labels: | |||||
Date Opened: | 2013-09-16 10:47:07 | Date Closed: | 2013-09-26 12:14:17 | ||||
Priority: | Major | Regression? | |||||
Status: | Closed/Complete | Components: | General | ||||
Versions: | 11.5.1 | Frequency of Occurrence | |||||
Related Issues: |
| ||||||
Environment: | SVN r398885 | Attachments: | ( 0) gdb-c_core_asterisk.txt ( 1) messages.rar ( 2) valgrind.rar | ||||
Description: | Asterisk eats memory until it crashes.
Attached valgrind log and bt full from core file. | ||||||
Comments: | By: David Brillert (aragon) 2013-09-16 10:48:41.100-0500 Asterisk did not crash under Valgrind. By: Matt Jordan (mjordan) 2013-09-16 11:00:37.279-0500 Your valgrind log makes it look like you were running Asterisk 11. What version were you running? A large number of the memory leaks look to have occurred when a dialplan reload was issued. What type of dialplan are you using? Did the crash occur following a reload? By: David Brillert (aragon) 2013-09-16 11:29:04.613-0500 There was a reload approximately 1 minute before the segfault. svn info Path: . URL: http://svn.digium.com/svn/asterisk/branches/11 Repository Root: http://svn.digium.com/svn/asterisk Repository UUID: f38db490-d61c-443f-a65b-d21fe96a405b Revision: 398939 Node Kind: directory Schedule: normal Last Changed Author: newtonr Last Changed Rev: 398885 Last Changed Date: 2013-09-11 20:02:37 -0400 (Wed, 11 Sep 2013) I am uploading verbose debug log of dialplan reload. By: David Brillert (aragon) 2013-09-16 11:42:44.410-0500 reload info in messages file attachment By: Richard Mudgett (rmudgett) 2013-09-16 13:40:47.514-0500 FYI: Please use *.txt files not *.rar files since you have to download the *.rar file to view it. By: David Brillert (aragon) 2013-09-16 13:52:25.901-0500 Sorry, but the messages file with verbose and debug enabled was huge unless I compress it (11.5 MB) Valgrind.txt was over 3 MB But I get the hint... By: David Brillert (aragon) 2013-09-16 14:24:32.969-0500 Why am I the assignee on this bug report? By: David Woolley (davidw) 2013-09-17 05:55:40.775-0500 You were an assignee because feedback was being awaited from you. You cleared that when you asked why. By: Rusty Newton (rnewton) 2013-09-17 13:23:02.657-0500 David Brillert, Did this happen after you upgraded from a previous revision to 398939 ? If so, what verison were you revision were you using previous? By: David Brillert (aragon) 2013-09-17 18:30:27.905-0500 Rev 398939 was installed to combat memory leaks we were seeing when we were already at Rev 397604 We were hoping commit 398103 was going to solve those leaks. So I presume to answer your question, I don't know which commit may have caused the leaks or if this is a current regression. #0 0x0000003d3fc45071 in sqlite3VdbeMemRelease () from /usr/lib64/libsqlite3.so.0 No symbol table info available. #1 0x0000003d3fc45821 in sqlite3VdbeMemSetStr () from /usr/lib64/libsqlite3.so.0 No symbol table info available. #2 0x0000003d3fc3c5d8 in sqlite3Error () from /usr/lib64/libsqlite3.so.0 No symbol table info available. #3 0x0000003d3fc41f39 in ?? () from /usr/lib64/libsqlite3.so.0 No symbol table info available. #4 0x0000003d3fc41ff8 in ?? () from /usr/lib64/libsqlite3.so.0 No symbol table info available. #5 0x0000000000493fc1 in ast_db_gettree () No symbol table info available. #6 0x00002b5c1dd995b3 in process_clearcache (ignore=<value optimized out>) at pbx_dundi.c:2208 db_entry = 0x2b5bf91e9cd0 db_tree = 0x0 now = 1379297393 __PRETTY_FUNCTION__ = "process_clearcache" The core dump leads me to believe the problem is in sqlite3 and the last commit according to the logs that touched sqlite3 was Revision 375763 - Directory Listing Modified Sun Nov 4 01:17:25 2012 UTC (10 months, 1 week ago) by mjordan Properly finalize prepared SQLite3 statements to prevent memory leak The AstDB uses prepared SQLite3 statements to retrieve data from the SQLite3 database. These statements should be finalized during Asterisk shutdown so that the SQLite3 database can be properly closed. Failure to finalize the statements results in a memory leak and a failure when closing the database. This patch fixes those issues by ensuring that all prepared statements are properly finalized at shutdown. (closes issue ASTERISK-20647) Reported by: Corey Farrell patches: astdb-sqlite3_close.patch uploaded by Corey Farrell (license 5909) ........ Merged revisions 375761 from http://svn.asterisk.org/svn/asterisk/branches/10 By: David Brillert (aragon) 2013-09-17 18:31:25.563-0500 feedback entered By: David Brillert (aragon) 2013-09-17 18:32:56.500-0500 David Wolley: I entered feedback and am still assignee I entered a comment and am still assignee. I don't get it. By: Richard Mudgett (rmudgett) 2013-09-17 18:51:06.375-0500 It is a quirk of the issue tracker. It seems to be loath to have issues unassigned. When feedback is requested, the issue is assigned to the requested person. If the issue was previously unassigned it does not unassign the issue when the issue sent back. By: David Brillert (aragon) 2013-09-20 08:50:44.577-0500 This is a really service affecting problem, so if there is anything I can do to move this forward then I need to know what that is. *CLI> core show uptime System uptime: 7 minutes, 19 seconds Last reload: 7 minutes, 19 seconds And went from 0.9% ram to 63.8%. No reloads and only 1 outgoing call was placed. The memory leak happens when the system is idle. By: David Brillert (aragon) 2013-09-24 08:22:25.136-0500 A couple of updates: 1. Upgraded the server to SVN r399516 and since then no more 'Fork failed: Cannot allocate memory' warnings and no new segfault as of 4 days. 2. /var/log/asterisk/security shows the server under some sort of attack which may be responsible for the problems. Thousands of requests per second and 1 GB per day of logs in /var/log/asterisk/security and they look like this (IP's scrubbed): [2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-127489",Severity="Informationa l",Service="SIP",EventVersion="1",AccountID="sip:ATTACKERIP:9",SessionID="0x3de88648",LocalAddress="IPV4/UDP/SERVERIP/5060",Remot eAddress="IPV4/UDP/ATTACKERIP/9",Challenge="63654509" [2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-147487",Severity="Informationa l",Service="SIP",EventVersion="1",AccountID="sip:ATTACKERIP:9",SessionID="0x3de90138",LocalAddress="IPV4/UDP/SERVERIP/5060",Remot eAddress="IPV4/UDP/ATTACKERIP/9",Challenge="3b0da3ef" [2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-165780",Severity="Informationa l",Service="SIP",EventVersion="1",AccountID="sip:ATTACKERIP:9",SessionID="0x3de97c28",LocalAddress="IPV4/UDP/SERVERIP/5060",Remot eAddress="IPV4/UDP/ATTACKERIP/9",Challenge="2ba8f0a7" [2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-188493",Severity="Informationa l",Service="SIP",EventVersion="1",AccountID="sip:ATTACKERIP:9",SessionID="0x3de9f718",LocalAddress="IPV4/UDP/SERVERIP/5060",Remot eAddress="IPV4/UDP/ATTACKERIP/9",Challenge="2ef10a9f" [2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-209056",Severity="Informationa l",Service="SIP",EventVersion="1",AccountID="sip:ATTACKERIP:9",SessionID="0x3dea7208",LocalAddress="IPV4/UDP/SERVERIP/5060",Remot eAddress="IPV4/UDP/ATTACKERIP/9",Challenge="05ed0e4a" [2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-228397",Severity="Informationa l",Service="SIP",EventVersion="1",AccountID="sip:ATTACKERIP:9",SessionID="0x3deaecf8",LocalAddress="IPV4/UDP/SERVERIP/5060",Remot eAddress="IPV4/UDP/ATTACKERIP/9",Challenge="61d0cfbb" [2013-09-14 23:47:26] SECURITY[32620] res_security_log.c: SecurityEvent="ChallengeSent",EventTV="1379224046-246259",Severity="Informationa l",Service="SIP",EventVersion="1",AccountID="sip:ATTACKERIP:9",SessionID="0x3deb67e8",LocalAddress="IPV4/UDP/SERVERIP/5060",Remot eAddress="IPV4/UDP/ATTACKERIP/9",Challenge="4af6412c" By: Joel Vandal (joel_vandal) 2013-09-24 12:42:34.813-0500 We added a new rules on Fail2ban in order to block the attack that cause server to crash. We will try to get SIP logs next time we got flooded .. and check if we can easily reproduce the flood, but look easy to crash any servers. By: Matt Jordan (mjordan) 2013-09-24 13:06:10.326-0500 {quote} This is a really service affecting problem, so if there is anything I can do to move this forward then I need to know what that is. *CLI> core show uptime System uptime: 7 minutes, 19 seconds Last reload: 7 minutes, 19 seconds And went from 0.9% ram to 63.8%. No reloads and only 1 outgoing call was placed. The memory leak happens when the system is idle. {quote} Since you were also testing the log system changes for remote consoles, I have a feeling you ran smack into ASTERISK-22560. I'm guessing that with the system updates, you also updated Asterisk to the latest SVN? If not, definitely grab the patch for commit 399513 and apply it to your system: {noformat} svn diff -c 399513 https://origsvn.digium.com/svn/asterisk/branches/11 {noformat} By: David Brillert (aragon) 2013-09-24 13:13:11.427-0500 We already include that fix since it was committed to our SVN r399516 install. That's probably what made some of the memleaks disappear and allowed us to focus on the remote attack as the culprit. Now this is starting to look like a different bug report and I should think about closing this one out. Since SVN r399516 has only been in for 4 days I'll leave it open for a while longer in case the segfault does come back. By: Matt Jordan (mjordan) 2013-09-24 14:41:35.409-0500 Sounds good; just update the ticket if the original "really bad" mem leak crops back up. By: David Brillert (aragon) 2013-09-26 12:14:17.445-0500 Haven't seen the leak or crash since updating to svn r399516 |