[Home]

Summary:ASTERISK-17964: Possible memory leak in chan_sip.c
Reporter:daren ferreira (daren)Labels:
Date Opened:2011-06-04 06:30:32Date Closed:2011-07-26 09:47:45
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:1.6.2.18 1.8.4 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) dbg-sip-alloc.txt.gz
( 1) dbg-sip-alloc-1.8.4.2.txt
( 2) dbg-sip-sum.txt
( 3) dbg-sip-sum-1.8.4.2.txt
( 4) memory-2011-06-04_06:00:01
( 5) memory-2011-06-04_07:00:01
( 6) memory-2011-06-04_08:00:01
( 7) memory-2011-06-04_09:00:01
( 8) memory-2011-06-04_10:00:01
( 9) memory-2011-06-04_11:00:01
(10) memory-2011-06-04_12:00:01
(11) memory-2011-06-04_13:00:01
Description:After several crashes i made investigations and saw memory always growing, days after days, i updated my 1.6.2.15 to 1.6.2.18 and activate memory allocation debugging and now i can say that chan_sip is taking around 5 more megabytes every hour.

I'll try to attach "memory show summary" to this report...

Please let me know if you require anymore information

****** STEPS TO REPRODUCE ******

Just wait, with our without calls.

I have some register tries from unknown peers...
Comments:By: Walter Doekes (wdoekes) 2011-06-04 07:01:53

Please attach 'memory show allocations chan_sip.c' and 'memory show summary chan_sip.c'

That could provide valuable clues.

By: David Woolley (davidw) 2011-06-04 08:22:18

You also need to reproduce it on 1.8.4 as the 1.6 series is no longer supported for non-security issues.

By: daren ferreira (daren) 2011-06-06 04:46:11.135-0500

Thank you for your reply.

With JIRA i found another request for a similar problem that may have been fixed  (ASTERISK-17510) but other people seems to have same problem, so i'll made tests on 1.8.

I will attach 'memory show allocations chan_sip.c' and 'memory show summary chan_sip.c'

By: daren ferreira (daren) 2011-06-06 04:49:03.178-0500

These files are results from memory show summary chan_sip.c and memory show allocations chan_sip.c

By: daren ferreira (daren) 2011-06-07 04:29:13.433-0500

I just upgrade to 1.8.4.2 the leak is still here. You'll find 'memory show allocations chan_sip.c' and 'memory show summary chan_sip.c' attached to this request.

By: daren ferreira (daren) 2011-06-07 04:29:55.305-0500

results of 'memory show allocations chan_sip.c' and 'memory show summary chan_sip.c'

By: Walter Doekes (wdoekes) 2011-06-07 15:35:19.402-0500

Are you using realtime peers? And are they cached?

Can you reduce your configuration to a bare minimum and still get the leaks? (And then post the config.)

Is it REGISTER requests that cause the leaks? You can hammer your own server with a tool like sipp. https://code.osso.nl/projects/sipp/browser/scenario/register.xml

By: daren ferreira (daren) 2011-06-08 19:28:30.168-0500

I'm using realtime peers but there not cached.

I just made tests with the scenario you gave me.

It seems that memory usage is growing continuously while sipp running but memory seems to be freed when sipp is terminated... with both correct or incorrect login and password

I unfortunately can't reduce my configuration (production server)

About SIP, i'm using realtime odbc/sql authentication and my sip.conf parameters are :

----------------------------

[general]
context=default
allowguest=no
bindport=5060
bindaddr=0.0.0.0
srvlookup=no
videosupport=no
disallow=all
allow=g722
allow=alaw
allow=ulaw
musicclass=default
language=fr
dtmfmode=rfc2833
alwaysauthreject=yes
allowexternalinvites=no
autodomain=no
vmexten=550
promiscredir=no
canreinvite=no
t38pt_udptl=yes
ignoreregexpire=yes

tos_sip=cs3
tos_audio=ef
tos_video=af41

rtsavesysname=yes
allowsubscribe=no

counteronpeer=yes
callcounter=yes

register => xxxx:yyyy@sip.voiptraffic.net

--------------------------

I can give you more informations on my config, just let me know what and i'll tell you.


By: Walter Doekes (wdoekes) 2011-06-09 06:54:44.974-0500

Ok. I belive rtcachefriends=no by default. So uncached friends it is.

Memory usage does increase the first moments, but it should stabilize after a while (32 secs?). Certain objects are garbage collected only after some time because they may be needed for retransmits. But, if you're saying the register hammering does not speed up the excess memory usage, then that's probably not it.

Can you try removing the register=> line?
Is there any qualify=yes going on? What happens when you remove that?
Other leaky candidates might be alwaysauthreject, ignoreregexpire, t38pt_udtptl or perhaps the callcounter. Does that counter work for your uncached friends?

By: daren ferreira (daren) 2011-06-09 17:51:54.906-0500

Thank you for your reply.

You're right callcounter is no more useful, so i disabled it, without real change
I verified, qualify is set to no (anyway if i well remember it is not compatible with rt with cache disabled)
I disabled t38pt_udtptl, no change memory is still growing

Unfortunately ignoreregexpire and alwaysauthreject can't be disabled because :
ignoreregexpire is mandatory because of users who forget to regulary register (or don't set their devices properly)
alwaysauthreject is mandatory for security reasons.

Then i wonder why "subversion" has closed this issue because i don't find any mention of any patch for such issue.. is it an error related to move between mantis to jira

A last idea, memory is growin even with no visible activity (neither register nor call) then by showing sip debug i see much activity,  which seems to be related to nat (to bypass nat problems, nat=yes is used in sip peers), do you think that nat can be source of such leaks?

I looked at memory show allocations chan_sip.c and made counts:
occurences| chan_sip.c line
  1477 26057
  1477 26060
  1477 26065
     1 13547
    12 11105
     1 1057
     1 29016
     1 29017
     1 29018
     1 29019
     1 29116
     1 26795
     1 26796
     1 26797
     1 26798
     1 26799
    66 7247
    66 7250
    66 7255
     1 860
     1 6690
     1 6700
     1 6831
     1 6839
    66 7228

The always growing lines are

  1477 26057
  1477 26060
  1477 26065

I had a look at these lines in chan_sip.c, they are all in "else" part of "if (peer)"
which is defined by

-------

if (!realtime || ast_test_flag(&global_flags[1], SIP_PAGE2_RTCACHEFRIENDS)) {
               /* Note we do NOT use find_peer here, to avoid realtime recursion */
               /* We also use a case-sensitive comparison (unlike find_peer) so
                  that case changes made to the peer name will be properly handled
                  during reload
               */
               ast_copy_string(tmp_peer.name, name, sizeof(tmp_peer.name));
               peer = ao2_t_find(peers, &tmp_peer, OBJ_POINTER | OBJ_UNLINK, "find and unlink peer from peers table");
       }

------

but because of realtime and not caching peer is not defined and redefined undefinitely

It seems that peer already connected are not recognized and so are recreated

I have no idea why... if i well understand the code, rtcache would be a way to stop such problem... or a fix in code to recognize already created peers...

I remember rtcachefriend is a problem ( but i don't remember why ) so i hesitate to reactivate it...

I just made tests with rtcachefriend enabled and it seems to stop the leakage....

Can somebody give me reasons to keep rtcachefriend activated? Or better, find a fix for chan_sip.c

By: David Woolley (davidw) 2011-06-17 05:48:25.850-0500

It looks like this was closed by a bug in the repository automation/JIRA interface.  The repository seems to have closed it as the result of committing a completely different issue.

By: daren ferreira (daren) 2011-06-17 06:54:24.528-0500

How to reopen this issue because that's a real issue.

Normally you shouldn't have to enable rtcachefriends not to get memory leak...

By: daren ferreira (daren) 2011-07-01 07:17:49.416-0500

Do somebody have an idea on how to solve the problem? Is there any developper on this issue?

Or maybe that's considered as a normal behaviour... even if i doubt that rtcachefriend is mandatory when using realtime in order not to have a memory leak...

By: Walter Doekes (wdoekes) 2011-07-01 18:20:01.493-0500

Ok. So you've concluded that setting rtcachefriend=yes solves the leak. That's valuable.

I fixed that exact same problem in https://issues.asterisk.org/jira/browse/ASTERISK-17510 which is rolled out in the 1.8 branch first in 1.8.5-rc1 (look for "Don't link non-cached realtime peers" in the ChangeLog).

So.. try that version and see if it helps.

By: daren ferreira (daren) 2011-07-28 09:42:08.839-0500

It seems to work with 1.8.5.0 but 1.8.5.0 has a new, and worse problem, so i'll have to do a rollback and open a new issue.

Thank you for your help !