Summary: | ASTERISK-26516: pjsip: Memory corruption with possible memory leak. | ||||||||
Reporter: | Richard Mudgett (rmudgett) | Labels: | |||||||
Date Opened: | 2016-10-28 11:18:20 | Date Closed: | 2016-11-16 23:20:39.000-0600 | ||||||
Priority: | Major | Regression? | |||||||
Status: | Closed/Complete | Components: | Channels/chan_pjsip Resources/res_pjsip | ||||||
Versions: | 13.11.2 | Frequency of Occurrence | Occasional | ||||||
Related Issues: |
| ||||||||
Environment: | Attachments: | ( 0) 0007_jira_asterisk_26516_v13_free_mem_fix.patch ( 1) 2016-10-28.tgz ( 2) cli-and-gdb-2016-10-27.tgz ( 3) jira_asterisk_26344_v13_committed_fix.patch ( 4) jira_asterisk_26344_v13_debuging.patch ( 5) jira_asterisk_26516_test_fix_v13.patch ( 6) jira_asterisk_26516_v13_debugging_v2.patch ( 7) jira_asterisk_26516_v13_debugging.patch ( 8) screenlog-2016-11-01.tgz ( 9) screenlog-2016-11-02.tgz (10) screenlog-2016-11-03.tgz (11) screenlog-2016-11-10.tgz (12) screenlog-2016-11-14.tgz | |||||||
Description: | This is a continuation of ASTERISK-26344 and ASTERISK-26387 since the original crashes are fixed with patches added to pjproject. For the setup description look at ASTERISK-26344. | ||||||||
Comments: | By: Richard Mudgett (rmudgett) 2016-10-28 18:18:35.269-0500 [~tuxian] - The valgrind logs for run 2 and 3 show who is corrupting memory. The dns_a_callback() is using the passed in query pointer which was allocated by a destroyed tdata's memory pool. Now I just need to figure out why the tdata object is getting destroyed before the dns_a_callback() is called to complete the tdata's DNS query. By: Richard Mudgett (rmudgett) 2016-10-31 18:41:29 [^jira_asterisk_26344_v13_committed_fix.patch] - This is the actual patch that got committed for ASTERISK-26344. Apply by {{patch -p1 -i <patch-file-name>}} [^jira_asterisk_26516_v13_debugging.patch] - This is an updated debug message patch that replaces [^jira_asterisk_26344_v13_debuging.patch]. Apply by {{patch -p1 -i <patch-file-name>}} [^0007_jira_asterisk_26516_v13_free_mem_fix.patch] - Copy this file into the third-party/pjproject/patches directory. [~tuxian] I'm fairly certain I found the primary cause of the memory corruption. Ironically, it turns out to be a hole in the original fix patch on ASTERISK-26344 which was fixed in the patch that got committed. However, I think there is another potential for corruption that is fixed by [^0007_jira_asterisk_26516_v13_free_mem_fix.patch]. # From a clean workspace apply the three patches # Run your DNS test with MALLOC_DEBUG. It should be able to go overnight. I expect the memory leak was another symptom of the bug that was also causing the primary memory corruption. Please test and report back. Thanks. By: Ian Gilmour (tuxian) 2016-11-01 10:06:23.782-0500 Hi Richard, The patches supplied didn't apply cleanly so I updated to the current origin/13 branch which appears to already have the jira_asterisk_26344_v13_committed_fix.patch. The other patches then applied cleanly. Leaving res_pjsip_output_registration.c with: {noformat} schedule_registration(state->client_state, 3 + 1); {noformat} gives me a system that still appears to be leaking memory. mmlog simply has: {noformat} 1478001617 - New session {noformat} But grepping CLI output after running "memory show summary" & "memory show allocations" occassionally through the test run gives me: {noformat} grep -e "bytes in all allocations and deferred free allocations" -e "allocations in file.*pool_policy_malloc.c" ~/screenlog.0 6656448 bytes in 682 allocations in file ../src/pj/pool_policy_malloc.c 13437987 bytes in all allocations and deferred free allocations 6697408 bytes in 722 allocations in file ../src/pj/pool_policy_malloc.c 13571471 bytes in all allocations and deferred free allocations 6728128 bytes in 752 allocations in file ../src/pj/pool_policy_malloc.c 13577326 bytes in all allocations and deferred free allocations 6733248 bytes in 757 allocations in file ../src/pj/pool_policy_malloc.c 13605876 bytes in all allocations and deferred free allocations 6738368 bytes in 762 allocations in file ../src/pj/pool_policy_malloc.c 13613843 bytes in all allocations and deferred free allocations 6755776 bytes in 776 allocations in file ../src/pj/pool_policy_malloc.c 13573691 bytes in all allocations and deferred free allocations 6760896 bytes in 781 allocations in file ../src/pj/pool_policy_malloc.c 13624265 bytes in all allocations and deferred free allocations 13607107 bytes in all allocations and deferred free allocations 13650533 bytes in all allocations and deferred free allocations 6791616 bytes in 811 allocations in file ../src/pj/pool_policy_malloc.c 13670895 bytes in all allocations and deferred free allocations 13716032 bytes in all allocations and deferred free allocations 13673257 bytes in all allocations and deferred free allocations 13698627 bytes in all allocations and deferred free allocations 6853056 bytes in 871 allocations in file ../src/pj/pool_policy_malloc.c 13726484 bytes in all allocations and deferred free allocations 13720457 bytes in all allocations and deferred free allocations 6991296 bytes in 1006 allocations in file ../src/pj/pool_policy_malloc.c 13851289 bytes in all allocations and deferred free allocations 13851545 bytes in all allocations and deferred free allocations 13934849 bytes in all allocations and deferred free allocations 13933169 bytes in all allocations and deferred free allocations {noformat} I highlighted the pool_policy_malloc.c above as this seems to be the file that shows a steady increase in usage. I've attached the CLI output as [^screenlog-2016-11-01.tgz]. By: Friendly Automation (friendly-automation) 2016-11-01 14:36:46.118-0500 Change 4270 had a related patch set uploaded by Richard Mudgett: bundled pjproject: Fix DNS write to freed memory. [https://gerrit.asterisk.org/4270|https://gerrit.asterisk.org/4270] By: Friendly Automation (friendly-automation) 2016-11-01 14:37:05.533-0500 Change 4271 had a related patch set uploaded by Richard Mudgett: bundled pjproject: Fix DNS write to freed memory. [https://gerrit.asterisk.org/4271|https://gerrit.asterisk.org/4271] By: Friendly Automation (friendly-automation) 2016-11-01 14:37:16.457-0500 Change 4272 had a related patch set uploaded by Richard Mudgett: bundled pjproject: Fix DNS write to freed memory. [https://gerrit.asterisk.org/4272|https://gerrit.asterisk.org/4272] By: Richard Mudgett (rmudgett) 2016-11-01 18:24:40.105-0500 Teluu has accepted the [^0007_jira_asterisk_26516_v13_free_mem_fix.patch] and it will be merged into the v13+ branches in a few hours. The increasing memory may be a red herring if it levels out after awhile. Pjproject caches released memory pools to be reused the next time a pool is needed. You should do a longer duration test to see if the memory keeps growing. By: Friendly Automation (friendly-automation) 2016-11-02 01:22:13.339-0500 Change 4272 merged by zuul: bundled pjproject: Fix DNS write to freed memory. [https://gerrit.asterisk.org/4272|https://gerrit.asterisk.org/4272] By: Friendly Automation (friendly-automation) 2016-11-02 01:22:16.075-0500 Change 4271 merged by zuul: bundled pjproject: Fix DNS write to freed memory. [https://gerrit.asterisk.org/4271|https://gerrit.asterisk.org/4271] By: Friendly Automation (friendly-automation) 2016-11-02 05:24:59.974-0500 Change 4270 merged by Joshua Colp: bundled pjproject: Fix DNS write to freed memory. [https://gerrit.asterisk.org/4270|https://gerrit.asterisk.org/4270] By: Ian Gilmour (tuxian) 2016-11-02 11:58:53.312-0500 Hi Richard, yes I realise pjproject caches released memory pools to be reused the next time a pool is needed, but this test only re-registers 5 TLS users every couple of mins, so I would have thought it would reach a stable state reasonably quickly, and well before a couple of hours - no? Running the same system I built yesterday for 9 hours today (see [^screenlog-2016-11-02.tgz]) I see the cache pools growing: {noformat} grep -e "bytes in all allocations and deferred free allocations" -e "allocations in file.*pool_policy_malloc.c" -e "cachpool.*Total" screenlog.0 6640576 bytes in 673 allocations in file ../src/pj/pool_policy_malloc.c 13189225 bytes in all allocations and deferred free allocations 13189497 bytes in all allocations and deferred free allocations 6813632 bytes in 839 allocations in file ../src/pj/pool_policy_malloc.c 13656403 bytes in all allocations and deferred free allocations 13656395 bytes in all allocations and deferred free allocations 13755992 bytes in all allocations and deferred free allocations 6926272 bytes in 928 allocations in file ../src/pj/pool_policy_malloc.c 13755980 bytes in all allocations and deferred free allocations 7065248 bytes in 1015 allocations in file ../src/pj/pool_policy_malloc.c 13927279 bytes in all allocations and deferred free allocations 13927293 bytes in all allocations and deferred free allocations 7298496 bytes in 1227 allocations in file ../src/pj/pool_policy_malloc.c 14128893 bytes in all allocations and deferred free allocations 14168989 bytes in all allocations and deferred free allocations 7393728 bytes in 1320 allocations in file ../src/pj/pool_policy_malloc.c 14251689 bytes in all allocations and deferred free allocations 14250852 bytes in all allocations and deferred free allocations 7788992 bytes in 1641 allocations in file ../src/pj/pool_policy_malloc.c 14654274 bytes in all allocations and deferred free allocations 14654285 bytes in all allocations and deferred free allocations 8027072 bytes in 1806 allocations in file ../src/pj/pool_policy_malloc.c 14876070 bytes in all allocations and deferred free allocations 14876084 bytes in all allocations and deferred free allocations 8444352 bytes in 2084 allocations in file ../src/pj/pool_policy_malloc.c 15328742 bytes in all allocations and deferred free allocations 15331609 bytes in all allocations and deferred free allocations cachpool Total 7300064 of 8496320 (85 %) used! 8728000 bytes in 2271 allocations in file ../src/pj/pool_policy_malloc.c 15614381 bytes in all allocations and deferred free allocations 15614392 bytes in all allocations and deferred free allocations cachpool Total 7405112 of 8688320 (85 %) used! cachpool Total 7405496 of 8690368 (85 %) used! 8789440 bytes in 2331 allocations in file ../src/pj/pool_policy_malloc.c 15668738 bytes in all allocations and deferred free allocations 15668722 bytes in all allocations and deferred free allocations cachpool Total 7416632 of 8749760 (84 %) used! 8852928 bytes in 2393 allocations in file ../src/pj/pool_policy_malloc.c 15732482 bytes in all allocations and deferred free allocations 15732493 bytes in all allocations and deferred free allocations cachpool Total 7428536 of 8813248 (84 %) used! {noformat} and {noformat} grep -e ":13:0" -A 8 screenlog.2 | sed 's/[[:space:]]*$//' | grep -e "top -" -e "21121" top - 07:13:07 up 20 days, 23:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 53m 13m S 18.4 0.7 0:03.80 asterisk top - 08:13:02 up 21 days, 12 min, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 58m 13m S 0.4 0.7 0:20.59 asterisk top - 09:13:02 up 21 days, 1:12, 0 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 59m 13m S 0.4 0.8 0:38.15 asterisk top - 10:13:05 up 21 days, 2:12, 0 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 61m 13m S 0.4 0.8 0:54.69 asterisk top - 11:13:00 up 21 days, 3:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 62m 13m S 0.4 0.8 1:12.00 asterisk top - 11:13:09 up 21 days, 3:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 62m 13m S 0.3 0.8 1:12.03 asterisk top - 12:13:03 up 21 days, 4:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 63m 13m S 0.2 0.8 1:28.45 asterisk top - 13:13:06 up 21 days, 5:12, 0 users, load average: 0.00, 0.01, 0.00 21121 iang 20 0 2777m 63m 13m S 0.4 0.8 1:46.21 asterisk top - 14:13:02 up 21 days, 6:12, 6 users, load average: 0.00, 0.09, 0.13 21121 iang 20 0 2777m 64m 13m S 0.2 0.8 2:03.18 asterisk top - 15:13:06 up 21 days, 7:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 65m 13m S 0.6 0.8 2:19.99 asterisk top - 16:13:07 up 21 days, 8:12, 6 users, load average: 0.05, 0.01, 0.00 21121 iang 20 0 2777m 65m 13m S 0.6 0.8 2:38.45 asterisk {noformat} I'll leave it running overnight. By: Ian Gilmour (tuxian) 2016-11-03 03:16:09.340-0500 24+ hours later and the pjproject cache pool is still growing: {noformat} 6640576 bytes in 673 allocations in file ../src/pj/pool_policy_malloc.c 13189225 bytes in all allocations and deferred free allocations 13189497 bytes in all allocations and deferred free allocations 6813632 bytes in 839 allocations in file ../src/pj/pool_policy_malloc.c 13656403 bytes in all allocations and deferred free allocations 13656395 bytes in all allocations and deferred free allocations 13755992 bytes in all allocations and deferred free allocations 6926272 bytes in 928 allocations in file ../src/pj/pool_policy_malloc.c 13755980 bytes in all allocations and deferred free allocations 7065248 bytes in 1015 allocations in file ../src/pj/pool_policy_malloc.c 13927279 bytes in all allocations and deferred free allocations 13927293 bytes in all allocations and deferred free allocations 7298496 bytes in 1227 allocations in file ../src/pj/pool_policy_malloc.c 14128893 bytes in all allocations and deferred free allocations 14168989 bytes in all allocations and deferred free allocations 7393728 bytes in 1320 allocations in file ../src/pj/pool_policy_malloc.c 14251689 bytes in all allocations and deferred free allocations 14250852 bytes in all allocations and deferred free allocations 7788992 bytes in 1641 allocations in file ../src/pj/pool_policy_malloc.c 14654274 bytes in all allocations and deferred free allocations 14654285 bytes in all allocations and deferred free allocations 8027072 bytes in 1806 allocations in file ../src/pj/pool_policy_malloc.c 14876070 bytes in all allocations and deferred free allocations 14876084 bytes in all allocations and deferred free allocations 8444352 bytes in 2084 allocations in file ../src/pj/pool_policy_malloc.c 15328742 bytes in all allocations and deferred free allocations 15331609 bytes in all allocations and deferred free allocations cachpool Total 7300064 of 8496320 (85 %) used! 8728000 bytes in 2271 allocations in file ../src/pj/pool_policy_malloc.c 15614381 bytes in all allocations and deferred free allocations 15614392 bytes in all allocations and deferred free allocations cachpool Total 7405112 of 8688320 (85 %) used! cachpool Total 7405496 of 8690368 (85 %) used! 8789440 bytes in 2331 allocations in file ../src/pj/pool_policy_malloc.c 15668738 bytes in all allocations and deferred free allocations 15668722 bytes in all allocations and deferred free allocations cachpool Total 7416632 of 8749760 (84 %) used! 8852928 bytes in 2393 allocations in file ../src/pj/pool_policy_malloc.c 15732482 bytes in all allocations and deferred free allocations 15732493 bytes in all allocations and deferred free allocations cachpool Total 7428536 of 8813248 (84 %) used! 9724864 bytes in 3017 allocations in file ../src/pj/pool_policy_malloc.c 16593522 bytes in all allocations and deferred free allocations 16593514 bytes in all allocations and deferred free allocations cachpool Total 7821800 of 9681600 (80 %) used! 10305984 bytes in 3427 allocations in file ../src/pj/pool_policy_malloc.c 17144668 bytes in all allocations and deferred free allocations 17179679 bytes in all allocations and deferred free allocations cachpool Total 8092448 of 10264768 (78 %) used! 13430720 bytes in 5626 allocations in file ../src/pj/pool_policy_malloc.c 20263597 bytes in all allocations and deferred free allocations 20284056 bytes in all allocations and deferred free allocations cachpool Total 9554320 of 13394112 (71 %) used! {noformat} top confirms it: {noformat} top - 07:13:07 up 20 days, 23:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 53m 13m S 18.4 0.7 0:03.80 asterisk top - 08:13:02 up 21 days, 12 min, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 58m 13m S 0.4 0.7 0:20.59 asterisk top - 09:13:02 up 21 days, 1:12, 0 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 59m 13m S 0.4 0.8 0:38.15 asterisk top - 10:13:05 up 21 days, 2:12, 0 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 61m 13m S 0.4 0.8 0:54.69 asterisk top - 11:13:00 up 21 days, 3:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 62m 13m S 0.4 0.8 1:12.00 asterisk top - 11:13:09 up 21 days, 3:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 62m 13m S 0.3 0.8 1:12.03 asterisk top - 12:13:03 up 21 days, 4:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 63m 13m S 0.2 0.8 1:28.45 asterisk top - 13:13:06 up 21 days, 5:12, 0 users, load average: 0.00, 0.01, 0.00 21121 iang 20 0 2777m 63m 13m S 0.4 0.8 1:46.21 asterisk top - 14:13:02 up 21 days, 6:12, 6 users, load average: 0.00, 0.09, 0.13 21121 iang 20 0 2777m 64m 13m S 0.2 0.8 2:03.18 asterisk top - 15:13:06 up 21 days, 7:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 65m 13m S 0.6 0.8 2:19.99 asterisk top - 16:13:07 up 21 days, 8:12, 6 users, load average: 0.05, 0.01, 0.00 21121 iang 20 0 2777m 65m 13m S 0.6 0.8 2:38.45 asterisk top - 17:13:02 up 21 days, 9:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 65m 13m S 0.6 0.8 2:56.43 asterisk top - 18:13:05 up 21 days, 10:12, 0 users, load average: 0.06, 0.02, 0.00 21121 iang 20 0 2777m 65m 13m S 0.2 0.8 3:13.08 asterisk top - 19:13:03 up 21 days, 11:12, 6 users, load average: 0.01, 0.04, 0.00 21121 iang 20 0 2777m 67m 13m S 0.4 0.9 3:29.53 asterisk top - 20:13:06 up 21 days, 12:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 68m 13m S 0.4 0.9 3:46.76 asterisk top - 21:13:01 up 21 days, 13:12, 6 users, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 70m 13m S 0.6 0.9 4:04.42 asterisk top - 22:13:04 up 21 days, 14:12, 1 user, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 70m 13m S 0.6 0.9 4:22.45 asterisk top - 23:13:08 up 21 days, 15:12, 1 user, load average: 0.00, 0.01, 0.00 21121 iang 20 0 2777m 73m 13m S 0.4 0.9 4:39.52 asterisk top - 00:13:02 up 21 days, 16:12, 1 user, load average: 0.00, 0.03, 0.01 21121 iang 20 0 2777m 75m 13m S 0.6 1.0 4:56.93 asterisk top - 01:13:05 up 21 days, 17:12, 1 user, load average: 0.02, 0.06, 0.03 21121 iang 20 0 2777m 75m 13m S 0.4 1.0 5:14.31 asterisk top - 02:13:08 up 21 days, 18:12, 1 user, load average: 0.04, 0.11, 0.05 21121 iang 20 0 2777m 78m 13m S 0.6 1.0 5:30.85 asterisk top - 03:13:03 up 21 days, 19:12, 1 user, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 81m 13m S 0.7 1.0 5:47.84 asterisk top - 04:13:06 up 21 days, 20:12, 1 user, load average: 0.01, 0.01, 0.00 21121 iang 20 0 2777m 81m 13m S 0.7 1.0 6:05.10 asterisk top - 05:13:01 up 21 days, 21:12, 1 user, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 83m 13m S 0.3 1.1 6:22.42 asterisk top - 06:13:05 up 21 days, 22:12, 1 user, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 83m 13m S 0.4 1.1 6:39.28 asterisk top - 07:13:08 up 21 days, 23:12, 1 user, load average: 0.00, 0.00, 0.00 21121 iang 20 0 2777m 85m 13m S 0.6 1.1 6:57.48 asterisk {noformat} Full CLI log in [^screenlog-2016-11-03.tgz] (and mmlog has no entries other than the initial timestamp). By: Richard Mudgett (rmudgett) 2016-11-09 13:07:57.134-0600 [^jira_asterisk_26516_v13_debugging_v2.patch] - Replaces the earlier [^jira_asterisk_26516_v13_debugging.patch]. The new version removes the grp_lock logging and adds logging for memory pools. Hopefully, this will give enough information to determine which pool is leaking. From an updated and clean workspace install [^jira_asterisk_26516_v13_debugging.patch] and your sip_outbound_registration_perform() {{schedule_registration(state->client_state, (ast_random() % 10) + 1);}} change. A log capture of 30 min should be enough to have enough leaked memory to identify what pool is leaking. There is a recent fix that is likely to affect the memory leak which should be merged by the time you are able to test. https://gerrit.asterisk.org/#/c/4351/ By: Ian Gilmour (tuxian) 2016-11-10 10:15:20.730-0600 [^screenlog-2016-11-10.tgz] was generated using the current branch 13 trunk + your v2 patch. Memory usage still looks to be creeping up... {noformat} grep -e "bytes in all allocations and deferred free allocations" -e "allocations in file.*pool_policy_malloc.c" -e "cachpool.*used_cnt" ~/screenlog.0 6643136 bytes in 676 allocations in file ../src/pj/pool_policy_malloc.c 13473977 bytes in all allocations and deferred free allocations 13474266 bytes in all allocations and deferred free allocations cachpool Capacity=52480, max_capacity=1048576, used_cnt=40 6669248 bytes in 695 allocations in file ../src/pj/pool_policy_malloc.c 13509783 bytes in all allocations and deferred free allocations 13509767 bytes in all allocations and deferred free allocations cachpool Capacity=63232, max_capacity=1048576, used_cnt=50 6710208 bytes in 735 allocations in file ../src/pj/pool_policy_malloc.c 13571270 bytes in all allocations and deferred free allocations 13572012 bytes in all allocations and deferred free allocations cachpool Capacity=63232, max_capacity=1048576, used_cnt=90 6725568 bytes in 750 allocations in file ../src/pj/pool_policy_malloc.c 13572492 bytes in all allocations and deferred free allocations 13572506 bytes in all allocations and deferred free allocations cachpool Capacity=63232, max_capacity=1048576, used_cnt=105 6740928 bytes in 765 allocations in file ../src/pj/pool_policy_malloc.c 13627074 bytes in all allocations and deferred free allocations 13627085 bytes in all allocations and deferred free allocations cachpool Capacity=63232, max_capacity=1048576, used_cnt=120 6797248 bytes in 820 allocations in file ../src/pj/pool_policy_malloc.c 13685842 bytes in all allocations and deferred free allocations 13684196 bytes in all allocations and deferred free allocations cachpool Capacity=63232, max_capacity=1048576, used_cnt=175 6822848 bytes in 845 allocations in file ../src/pj/pool_policy_malloc.c 13676524 bytes in all allocations and deferred free allocations 13674824 bytes in all allocations and deferred free allocations cachpool Capacity=63232, max_capacity=1048576, used_cnt=200 {noformat} and... {noformat} rep -e ":.0:0" -A 8 ~/screenlog.2 | sed 's/[[:space:]]*$//' | grep -e "top -" -e 18993 top - 15:00:02 up 29 days, 6:59, 6 users, load average: 0.02, 0.18, 0.22 18993 iang 20 0 2776m 53m 13m S 0.3 0.7 0:04.30 asterisk top - 15:10:05 up 29 days, 7:09, 6 users, load average: 0.00, 0.02, 0.10 18993 iang 20 0 2776m 57m 13m S 0.4 0.7 0:07.62 asterisk top - 15:20:00 up 29 days, 7:19, 6 users, load average: 0.00, 0.00, 0.04 18993 iang 20 0 2776m 57m 13m S 0.4 0.7 0:10.38 asterisk top - 15:20:09 up 29 days, 7:19, 6 users, load average: 0.00, 0.00, 0.03 18993 iang 20 0 2776m 57m 13m S 0.4 0.7 0:10.42 asterisk top - 15:30:04 up 29 days, 7:29, 6 users, load average: 0.00, 0.00, 0.00 18993 iang 20 0 2776m 58m 13m S 0.6 0.7 0:13.56 asterisk top - 15:40:07 up 29 days, 7:39, 6 users, load average: 0.00, 0.00, 0.00 18993 iang 20 0 2776m 58m 13m S 0.6 0.7 0:16.52 asterisk top - 15:50:02 up 29 days, 7:49, 6 users, load average: 0.00, 0.00, 0.00 18993 iang 20 0 2776m 58m 13m S 0.6 0.7 0:19.34 asterisk top - 16:00:05 up 29 days, 7:59, 6 users, load average: 0.00, 0.00, 0.00 18993 iang 20 0 2776m 59m 13m S 0.4 0.8 0:22.53 asterisk {noformat} By: Richard Mudgett (rmudgett) 2016-11-10 19:15:51.199-0600 The 'auth_cli0x7fff' pool seems to be leaked as it is never released but repeatedly created. By: Alexei Gradinari (alexei gradinari) 2016-11-11 08:20:48.177-0600 The 'pept0xf3cf10' pool increased rapidly. This poll is used for ast_pjsip_endpoint which never released. {noformat} [2016-11-10 14:58:30.049] DEBUG[18993]: pjproject:0 <?>: pept0xf3cf10 Pool: created, size=4096 ... [2016-11-10 14:58:30.553] DEBUG[18993]: pjproject:0 <?>: pept0xf3cf10 Pool: 584 bytes requested, resizing pool by 4000 bytes (used=5413984, cap=5416096) {noformat} By: Richard Mudgett (rmudgett) 2016-11-11 18:23:41.518-0600 [~alexei gradinari] The 'pept0xf3cf10' pool is used to configure pjproject for operation, grows only on Asterisk startup, must remain for the life of the Asterisk execution, and the log does not capture Asterisk being shut down. The problem is that memory usage is growing *after* startup. That pool is not causing the leak. [~tuxian] The 'auth_cli0x????' pools appear to be leaking. A new pool is created in the authentication cache for each different authentication realm. The MALLOC_DEBUG log showed a growth of 179,712 bytes allocated to pjproject pools between the first summary and the last in the log. That memory corresponds to about 175 auth_cli pools and matches closely with the 184 remaining 'auth_cli0x7fff' pools. No other remaining pools created after startup represent close to that much memory. There are also 180 of these lines. {noformat} [2016-11-10 16:02:54.489] DEBUG[19397]: pjproject:0 <?>: sip_auth_clien ...Unable to set auth for tdta0x7fffd4024740: can not find credential for dev.dev.acme.com/Digest {noformat} I'd like to see a SIP message log of the registrations (pjsip set logger on) to see what the realm strings are and verify if the realms are changing. I'll likely need to create another debugging patch to track why these pools are created. I do appreciate your patience and ability to obtain the needed information. By: Ian Gilmour (tuxian) 2016-11-14 05:50:28.250-0600 [^screenlog-2016-11-14.tgz] has pjsip logging enabled. It also has a breakpoint added on the return from pjsip_endpt_create_pool() in sip_endpoint.c (line:675). I added the following commands to be run on the breakpoint being hit: {noformat} (gdb) comm 6 Type commands for breakpoint(s) 6, one per line. End with a line saying just "end". >print *pool >bt >cont >end (gdb) {noformat} So now we can see the full pool name of all these 'auth_cli0xffff' pools. Hopefully this, together with the pjsip logger enabled, gives a bit more info as to why these pools are being created and never released. By: Richard Mudgett (rmudgett) 2016-11-14 15:37:12.697-0600 [^jira_asterisk_26516_test_fix_v13.patch] - This is an Asterisk patch that should fix the memory pool leak. PJPROJECT 2.5.5 introduced an API change that resulted in Asterisk leaking the memory pools. Please test to make sure there aren't any more issues. By: Ian Gilmour (tuxian) 2016-11-16 04:04:16.895-0600 Hi Richard, that last patch appears to do the job. Memory peaks and remains constant for the next 24 hours. {noformat} grep -e ":42:0" -A 8 ~/screenlog.2 | sed 's/[[:space:]]*$//' | grep -e "top -" -e 2033 top - 08:42:00 up 34 days, 41 min, 6 users, load average: 0.12, 0.12, 0.16 2033 iang 20 0 2787m 54m 13m S 0.4 0.7 0:04.19 asterisk top - 08:42:09 up 34 days, 41 min, 6 users, load average: 0.10, 0.11, 0.16 2033 iang 20 0 2787m 54m 13m S 0.3 0.7 0:04.22 asterisk top - 09:42:04 up 34 days, 1:41, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m S 0.6 0.8 0:21.79 asterisk top - 10:42:05 up 34 days, 2:41, 6 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m S 0.4 0.8 0:39.74 asterisk top - 11:42:00 up 34 days, 3:41, 6 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m S 0.3 0.8 0:58.17 asterisk top - 11:42:09 up 34 days, 3:41, 6 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m S 0.6 0.8 0:58.22 asterisk top - 12:42:03 up 34 days, 4:41, 6 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 62m 13m S 0.6 0.8 1:16.65 asterisk top - 13:42:07 up 34 days, 5:41, 6 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 62m 13m S 0.6 0.8 1:35.97 asterisk top - 14:42:01 up 34 days, 6:41, 6 users, load average: 0.02, 0.11, 0.09 2033 iang 20 0 2787m 62m 13m S 0.4 0.8 1:54.32 asterisk top - 15:42:05 up 34 days, 7:41, 6 users, load average: 0.11, 0.08, 0.02 2033 iang 20 0 2787m 62m 13m S 0.9 0.8 2:13.25 asterisk top - 16:42:00 up 34 days, 8:41, 6 users, load average: 0.04, 0.02, 0.00 2033 iang 20 0 2787m 61m 13m S 0.6 0.8 2:31.44 asterisk top - 16:42:09 up 34 days, 8:41, 6 users, load average: 0.03, 0.02, 0.00 2033 iang 20 0 2787m 61m 13m S 0.3 0.8 2:31.47 asterisk top - 17:42:03 up 34 days, 9:41, 6 users, load average: 0.04, 0.01, 0.00 2033 iang 20 0 2787m 61m 13m S 0.3 0.8 2:50.06 asterisk top - 18:42:06 up 34 days, 10:41, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 62m 13m S 0.6 0.8 3:07.97 asterisk top - 19:42:00 up 34 days, 11:41, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 62m 13m S 0.7 0.8 3:25.87 asterisk top - 19:42:09 up 34 days, 11:41, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 62m 13m S 0.4 0.8 3:25.91 asterisk top - 20:42:04 up 34 days, 12:41, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 62m 13m S 0.6 0.8 3:43.51 asterisk top - 21:42:07 up 34 days, 13:41, 6 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m S 0.7 0.8 4:01.59 asterisk top - 22:42:01 up 34 days, 14:41, 0 users, load average: 0.01, 0.02, 0.00 2033 iang 20 0 2787m 61m 13m S 0.7 0.8 4:19.05 asterisk top - 23:42:05 up 34 days, 15:41, 0 users, load average: 0.06, 0.03, 0.00 2033 iang 20 0 2787m 61m 13m S 0.4 0.8 4:36.24 asterisk top - 00:42:00 up 34 days, 16:41, 0 users, load average: 0.31, 0.17, 0.11 2033 iang 20 0 2787m 61m 13m S 0.7 0.8 4:53.69 asterisk top - 00:42:09 up 34 days, 16:41, 0 users, load average: 0.34, 0.18, 0.11 2033 iang 20 0 2787m 61m 13m S 0.4 0.8 4:53.73 asterisk top - 01:42:03 up 34 days, 17:41, 0 users, load average: 0.00, 0.03, 0.02 2033 iang 20 0 2787m 61m 13m S 0.4 0.8 5:11.26 asterisk top - 02:42:07 up 34 days, 18:41, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m S 0.3 0.8 5:28.71 asterisk top - 03:42:01 up 34 days, 19:41, 0 users, load average: 0.02, 0.03, 0.00 2033 iang 20 0 2787m 61m 13m S 0.4 0.8 5:46.11 asterisk top - 04:42:05 up 34 days, 20:41, 0 users, load average: 0.00, 0.02, 0.00 2033 iang 20 0 2787m 61m 13m S 0.4 0.8 6:03.62 asterisk top - 05:42:00 up 34 days, 21:41, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m S 0.6 0.8 6:20.73 asterisk top - 05:42:09 up 34 days, 21:41, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m S 0.6 0.8 6:20.78 asterisk top - 06:42:03 up 34 days, 22:41, 0 users, load average: 0.07, 0.03, 0.00 2033 iang 20 0 2787m 61m 13m S 0.6 0.8 6:38.54 asterisk top - 07:42:06 up 34 days, 23:41, 0 users, load average: 0.03, 0.01, 0.00 2033 iang 20 0 2787m 61m 13m S 0.4 0.8 6:55.97 asterisk top - 08:42:00 up 35 days, 41 min, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m S 0.6 0.8 7:13.20 asterisk top - 08:42:09 up 35 days, 41 min, 0 users, load average: 0.00, 0.00, 0.00 2033 iang 20 0 2787m 61m 13m T 0.6 0.8 7:13.25 asterisk top - 09:42:05 up 35 days, 1:41, 6 users, load average: 1.00, 0.97, 0.67 2033 iang 20 0 2787m 61m 13m S 0.6 0.8 7:32.41 asterisk {noformat} Many thanks. By: Friendly Automation (friendly-automation) 2016-11-16 23:20:40.496-0600 Change 4442 merged by zuul: res_pjsip_outbound_authenticator_digest.c: Fix memory pool leak. [https://gerrit.asterisk.org/4442|https://gerrit.asterisk.org/4442] By: Friendly Automation (friendly-automation) 2016-11-16 23:39:36.164-0600 Change 4443 merged by zuul: res_pjsip_outbound_authenticator_digest.c: Fix memory pool leak. [https://gerrit.asterisk.org/4443|https://gerrit.asterisk.org/4443] By: Friendly Automation (friendly-automation) 2016-11-17 04:57:03.675-0600 Change 4444 merged by Joshua Colp: res_pjsip_outbound_authenticator_digest.c: Fix memory pool leak. [https://gerrit.asterisk.org/4444|https://gerrit.asterisk.org/4444] |