[Home]

Summary:ASTERISK-16005: [patch] Memory leak in manager.c
Reporter:Florent Chauveau (mav3rick)Labels:
Date Opened:2010-04-23 06:02:23Date Closed:2010-06-08 07:41:04
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/ManagerInterface
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 1.summary.txt
( 1) 2.summary.txt
( 2) 20100510__issue17234.diff.txt
( 3) 20100517__issue17234__trunk.diff.txt
( 4) 3.summary.txt
( 5) bt.txt
( 6) valgrind.txt
Description:I detected a memory leak in manager.c, on latest Asterisk 1.6.2.7-rc2.
Leak is reproductible in 1.6.2.6.

  6622144 bytes in 30993 allocations in file 'manager.c'
  7100873 bytes in 33065 allocations in file 'manager.c'
  7352505 bytes in 34160 allocations in file 'manager.c'
  7636791 bytes in 35405 allocations in file 'manager.c'
  8039968 bytes in 37165 allocations in file 'manager.c'
  9430346 bytes in 43322 allocations in file 'manager.c'

I'm sending Originate (async) AMI commands using the built-in HTTP server.

Attached: valgrind run with --leak-check=full --show-reachable=yes, and "memory show allocations". Asterisk was compiled with MALLOC_DEBUG and DONT_OPTIMIZE.

Comments:By: Paul Belanger (pabelanger) 2010-04-23 08:07:05

Changed severity

By: Tilghman Lesher (tilghman) 2010-05-07 14:48:49

Valgrind output doesn't actually show any real leaks; they're all part of the startup and registration.

By: Florent Chauveau (mav3rick) 2010-05-07 14:51:55

Agreed. But the leak is real, Asterisk is taking more than 100mb per hour with regular HTTP/AMI calls.

By: Tilghman Lesher (tilghman) 2010-05-07 15:14:50

Are you sending AMI events over the regular HTTP server or the HTTPS server?

By: Florent Chauveau (mav3rick) 2010-05-07 15:18:09

I'm sending async AMI Originate commands over the HTTP server (port 8088).

There are no AMI events involved.

By: Tilghman Lesher (tilghman) 2010-05-07 15:21:13

Do you have any output for 'manager show eventq'?  Also, the output for 'memory show allocations manager.c' would be helpful, as it's not clear what is leaking memory.

By: Florent Chauveau (mav3rick) 2010-05-07 15:27:40

I'll send you those output next week, when I'll be back at work. Thanks :)

By: Tilghman Lesher (tilghman) 2010-05-10 16:44:24

I've now learned that this is not actually a bug but rather a failure on the HTTP client part to query for events.  The HTTP client may not only send actions, but must periodically query for any events which may have accumulated on the server side with the Action: WaitEvent action (optional Timeout header, specified in seconds).  The response to this action will be a list of Events.

These accumulated events are what are consuming memory.  Once all connected clients have received the cached events, the events are freed within 5 seconds.

This is also why valgrind does not show these in its list of leaked memory.  When Asterisk shuts down, all HTTP clients are destroyed and the list of outstanding events is freed.



By: Florent Chauveau (mav3rick) 2010-05-14 09:26:34

This is what we tried so far:
- we tried to call the WaitEvent action, events are successfully returned to the HTTP client but not freed in the event queue,
- setting "Events=off" in the Login action does not work,
- removing read rights in the AMI user does not work.

"manager show eventq" shows a LOT OF events with "usecount: 0".

Aren't they supposed to be freed by "purge_event" ? If I correctly read the function comment :

"" Purge unused events. Remove elements from the head as long as their usecount is 0 and there is a next element. ""

I made sure the function is called by adding ast_log, it is every few seconds.

But the queue is not freed.

Maybe there is a bug after all ?

asterisk -rx 'memory show summary manager.c' :

     3420 bytes in 3 allocations in function 'action_originate' of 'manager.c'
      136 bytes in 4 allocations in function '__init_manager' of 'manager.c'
     1608 bytes in 67 allocations in function 'ast_manager_register2' of 'manager.c'
     1313 bytes in 21 allocations in function 'astman_get_variables' of 'manager.c'
   558400 bytes in 200 allocations in function 'generic_http_callback' of 'manager.c'
  6700264 bytes in 31041 allocations in function 'append_event' of 'manager.c'
7265141 bytes allocated in 31336 allocations


The two "growing" allocations are (from "memory show allocations manager.c") :

     2792 bytes allocated in generic_http_callback at line  3833 of manager.c
      166 bytes allocated in         append_event at line  3277 of manager.c

both lines appear a few thousand times.

The fastest growing allocations are the event queue.

If there is no bug, how can we "flush" the manager queue ?

By: Florent Chauveau (mav3rick) 2010-05-14 09:29:14

Regarding the last comment, line numbers are from Asterisk 1.6.2.7 (release).

By: Tilghman Lesher (tilghman) 2010-05-17 14:11:30

No, the current implementation does not delete events unless they are at the BEGINNING of the list AND have a usecount of 0.

This IS working as expected, and the way to flush the queue is to do an 'Action: WaitEvent' every few seconds from each AMI HTTP connection.  You have to advance the 'seen' events on each connection before Asterisk will delete the queued events.

That said, I agree that needing to do this is not optimal, so I've created a patch that will set up a maximum time that events will be cached, which is set to twice the HTTP timeout.  This ensures that if you don't query for the events in the time that you need to make another request to keep your connection open, the events thusly not queried will expire.



By: David Woolley (davidw) 2010-05-21 10:03:12

There appear to be two functions, LOCK_EVENTS and UNLOCK_EVENTS that still use the old list class.  Both of them appear to be unreachable, even in the unpatched code.

By: Tilghman Lesher (tilghman) 2010-05-21 10:20:07

davidw:  that's because they're commented out.

By: Florent Chauveau (mav3rick) 2010-06-01 08:48:57

Hello. We just tested the patch on Asterisk 1.6.2.7-rc2 and 1.6.2.7 (release).

So far, so good.

Memory is successfully freed. Events are flushed out.

The only issue left is that while we send http ami requests (30 requests per minute), memory is growing faster than freed. We have to stop sending ami requests to see the memory being freed.

I guess a way to prevent this is to lower the maximum time events will be cached ?

By: Tilghman Lesher (tilghman) 2010-06-01 11:27:37

The way to prevent this is actually for you to do an Action: WaitEvent on each HTTP connection, at regular intervals.  Failing that, that we cache events for 2 minutes maximum, by default, does not seem like a problem.

By: Digium Subversion (svnbot) 2010-06-01 11:41:00

Repository: asterisk
Revision: 266682

U   trunk/main/manager.c

------------------------------------------------------------------------
r266682 | tilghman | 2010-06-01 11:41:00 -0500 (Tue, 01 Jun 2010) | 16 lines

Eliminate stale manager events after a set interval, even if AMI clients don't query for them.

Actions (or failures to act) by external clients should not cause memory leaks
in Asterisk, especially when those continued leaks could cause Asterisk to
misbehave later.

(closes issue ASTERISK-16005)
Reported by: mav3rick
Patches:
      20100510__issue17234.diff.txt uploaded by tilghman (license 14)
      20100517__issue17234__trunk.diff.txt uploaded by tilghman (license 14)
Tested by: mav3rick, davidw

(closes issue ASTERISK-16120)
Reported by: davidw

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=266682

By: Digium Subversion (svnbot) 2010-06-01 11:42:03

Repository: asterisk
Revision: 266683

_U  branches/1.6.2/
U   branches/1.6.2/main/manager.c

------------------------------------------------------------------------
r266683 | tilghman | 2010-06-01 11:42:02 -0500 (Tue, 01 Jun 2010) | 23 lines

Merged revisions 266682 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r266682 | tilghman | 2010-06-01 11:41:00 -0500 (Tue, 01 Jun 2010) | 16 lines
 
 Eliminate stale manager events after a set interval, even if AMI clients don't query for them.
 
 Actions (or failures to act) by external clients should not cause memory leaks
 in Asterisk, especially when those continued leaks could cause Asterisk to
 misbehave later.
 
 (closes issue ASTERISK-16005)
  Reported by: mav3rick
  Patches:
        20100510__issue17234.diff.txt uploaded by tilghman (license 14)
        20100517__issue17234__trunk.diff.txt uploaded by tilghman (license 14)
  Tested by: mav3rick, davidw
 
 (closes issue ASTERISK-16120)
  Reported by: davidw
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=266683

By: Florent Chauveau (mav3rick) 2010-06-03 04:21:10

After more testing, Asterisk keeps crashing (segfault). Sometimes immediately after launch, sometimes after a few hours.

Tested on Asterisk 1.6.2.7-rc2 and 1.6.2.7 (release).

Crash can happen even if Asterisk is unused (0 calls). There might be some AMI http commands, I'm not sure.



By: Florent Chauveau (mav3rick) 2010-06-03 04:22:10

core dump is at http://dl.free.fr/iVl8aa9rG/core.19263

By: David Woolley (davidw) 2010-06-03 05:40:21

The raw core dump is useless without the matching Asterisk binaries (fully unstripped and including all modules) and system library binaries.  You've given us a choice of two versions of Asterisk, and haven't told us which options you selected or what OS you are using!

That's why the bug reporting guidelines request the gdb backtraces.

By: Paul Belanger (pabelanger) 2010-06-03 08:07:13

Correct, we will need a backtrace (see below), then upload it to this issue.
--
Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory.

Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt.

When complete, attach that file to this issue report. Thanks!

By: Florent Chauveau (mav3rick) 2010-06-05 11:11:09

Hello,

I just uploaded "bt.txt" that contains backtrace as expected.

Asterisk version that crashed was 1.6.2.7-rc2 on a Debian GNU/Linux stable.

Compiled with DONT_OPTIMIZED and MALLOC_DEBUG.

By: Digium Subversion (svnbot) 2010-06-05 12:27:12

Repository: asterisk
Revision: 268454

U   trunk/main/manager.c

------------------------------------------------------------------------
r268454 | tilghman | 2010-06-05 12:27:11 -0500 (Sat, 05 Jun 2010) | 5 lines

Verify event is not NULL before attempting to lower its usecount.

(closes issue ASTERISK-16005)
Reported by: mav3rick

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=268454

By: Digium Subversion (svnbot) 2010-06-05 12:28:23

Repository: asterisk
Revision: 268455

_U  branches/1.6.2/
U   branches/1.6.2/main/manager.c

------------------------------------------------------------------------
r268455 | tilghman | 2010-06-05 12:28:22 -0500 (Sat, 05 Jun 2010) | 12 lines

Merged revisions 268454 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r268454 | tilghman | 2010-06-05 12:27:12 -0500 (Sat, 05 Jun 2010) | 5 lines
 
 Verify event is not NULL before attempting to lower its usecount.
 
 (closes issue ASTERISK-16005)
  Reported by: mav3rick
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=268455

By: Digium Subversion (svnbot) 2010-06-07 13:59:27

Repository: asterisk
Revision: 268731

U   trunk/main/manager.c

------------------------------------------------------------------------
r268731 | tilghman | 2010-06-07 13:59:27 -0500 (Mon, 07 Jun 2010) | 4 lines

Event well was going dry.

(issue ASTERISK-16005)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=268731

By: Digium Subversion (svnbot) 2010-06-07 14:00:37

Repository: asterisk
Revision: 268732

_U  branches/1.6.2/
U   branches/1.6.2/main/manager.c
U   branches/1.6.2/res/res_smdi.c

------------------------------------------------------------------------
r268732 | tilghman | 2010-06-07 14:00:37 -0500 (Mon, 07 Jun 2010) | 15 lines

Merged revisions 268653,268731 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r268653 | tilghman | 2010-06-07 12:14:40 -0500 (Mon, 07 Jun 2010) | 2 lines
 
 Avoid unloading res_smdi twice.
........
 r268731 | tilghman | 2010-06-07 13:59:27 -0500 (Mon, 07 Jun 2010) | 4 lines
 
 Event well was going dry.
 
 (issue ASTERISK-16005)
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=268732

By: Digium Subversion (svnbot) 2010-06-08 07:41:04

Repository: asterisk
Revision: 268732

_U  branches/1.6.2/
U   branches/1.6.2/main/manager.c
U   branches/1.6.2/res/res_smdi.c

------------------------------------------------------------------------
r268732 | tilghman | 2010-06-07 14:00:37 -0500 (Mon, 07 Jun 2010) | 18 lines

Merged revisions 268653,268731 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r268653 | tilghman | 2010-06-07 12:14:40 -0500 (Mon, 07 Jun 2010) | 2 lines
 
 Avoid unloading res_smdi twice.

 (closes issue ASTERISK-16007)
 Reported by: pabelanger
........
 r268731 | tilghman | 2010-06-07 13:59:27 -0500 (Mon, 07 Jun 2010) | 4 lines
 
 Event well was going dry.
 
 (issue ASTERISK-16005)
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=268732