[Home]

Summary:ASTERISK-15079: [patch] Thousands of Invites never discarded in sip channels
Reporter:Private Name (falves11)Labels:
Date Opened:2009-11-11 14:27:58.000-0600Date Closed:2009-12-21 10:41:21.000-0600
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk-rescheduled-destruction.patch
Description:The scenario is very simple
We receive an INVITE
we respond Hangup(34)

after a few hours, if you type "ship show channels" you get thousand of those failed invites, all identica, llike this:
208.X.X.X   7134239089  65745230093  00102/00000  0x0 (nothing)    No       Init: INVITE              
208.X.X.X   8633996336  41250481070  00102/00000  0x0 (nothing)    No       Init: INVITE              
208.X.X.X   4046670409  0403812114e  00102/00000  0x0 (nothing)    No       Init: INVITE              
208.X.X.X   7134239089  06060dc345a  00102/00000  0x0 (nothing)    No       Init: INVITE

****** ADDITIONAL INFORMATION ******

This happens in all 1.4 versions. Afer a few hours, the Linux box is dead. If you type "lsof -i", you can see one line per each of the failed SIP INVITES. It never stops growing.

I can let a Bug marshall log into my box and look at what happens. It is a production box, so I cannot do traces or debugs, etc.
Comments:By: Private Name (falves11) 2009-12-01 20:07:32.000-0600

The problem can be detected also in the business edition. I have an open ticket (EEX-278218), and one Digium engineer just captured all the traces that he can possible need. Maybe one open source marshal can request access to the case, or to the box, since the issue is present in all versions. I believe that this is the last major bug in Asterisk. Under some circumstances, Asterisk starts to accumulate sockets and never releases them, until the machine needs to be restarted.

By: Vadim Sherbakov (vinsik) 2009-12-02 07:01:01.000-0600

Hey,

I seem to have similar problems, but they are not as persistent.

I was using ast-1.4.23.1 before and normal amount of open
sip channels in my system is 30-80, but
after upgrading to 1.4.26.1 version, it has gone up from 200 and over.

Almost all of them are INVITE and OPTIONS messages.

Also after couple of days my asterisk died/stalled with errors like:

[Dec  2 13:52:22] ERROR[26901] chan_sip.c: Unable to build sip pvt data for 'XXXXXX' (Out of memory or socket error)
[Dec  2 13:53:09] ERROR[2873] rtp.c: Unable to allocate socket: Too many open files
[Dec  2 13:57:30] ERROR[2873] rtp.c: Unable to allocate socket: Too many open files
[Dec  2 14:00:34] ERROR[2863] asterisk.c: Unable to create pipe: Too many open files

So now i need to restart asterisk once a day.

Oh, and i have set ulimit to 65535, no affect.

I would gladly supply more info, but this is also a live system, so i have my
limitations.

------------
Later today i noticed that asterisk had stalled again.
This time sip show channels had over 8000 OPTIONS and INVITE messages waiting.
I had to revert back to asterisk 1.4.23.1.


Best regards, Vadim



By: Ilya Savin (ili4) 2009-12-02 14:43:02.000-0600

Hi.

I have same problem on version 1.6.1.6, 1.6.2 and latest trunk. When asterisk does not revieve any reply from remode side on SIP dial (ex: dialing non existing host), sip channel becomes "frozen". This channels cleared only after asterisk restarts.

In sip.conf RTP timers and SIP Session-Timers are enabled.

In debug log I saw messages "Re-scheduled destruction of SIP call <call-id>". Messages with same call-id repeat every 30 seconds. So I think "Re-scheduled destruction of SIP call" mechanism does not work.

I fixed this by modifying chan_sip.c (for version 1.6.1.6). Code in function __sip_autodestruct:
--------------
       if (p->packets) {
               if (!p->needdestroy) {
                       char method_str[31];
                       ast_debug(3, "Re-scheduled destruction of SIP call %s\n", p->callid ? p->callid : "<unknown>");
                       append_history(p, "ReliableXmit", "timeout");
                       if (sscanf(p->lastmsg, "Tx: %30s", method_str) == 1 || sscanf(p->lastmsg, "Rx: %30s", method_str) == 1) {
                               if (method_match(SIP_CANCEL, method_str) || method_match(SIP_BYE, method_str)) {
                                       p->needdestroy = 1;
                               }
                       }
                       return 10000;
               } else {
                       /* They've had their chance to respond. Time to bail */
                       __sip_pretend_ack(p);
               }
       }
----------------
I changed to:
----------------
       if (p->packets) {
               if (!p->needdestroy) {
                       ast_debug(3, "Re-scheduled destruction of SIP call %s\n", p->callid ? p->callid : "<unknown>");
                       append_history(p, "ReliableXmit", "timeout");
                       p->needdestroy = 1;
                       return 10000;
               } else {
                       /* They've had their chance to respond. Time to bail */
                       __sip_pretend_ack(p);
               }
       }
-------------
So I removed check for SIP_CANCEL method from "Re-scheduled destruction of SIP call" mechanism. And now all frozen channels cleared in few minutes.

I've uploaded patch named asterisk-rescheduled-destruction.patch. Please, try it, and report, how its works.

Alternative link to the patch: http://portal.orn.ru/files/asterisk-rescheduled-destruction.patch

But this is all for version 1.6.1.6. For version 1.4.27.1 it seems to be look like this:
----------------------
       if (p->packets && !ast_test_flag(&p->flags[0], SIP_NEEDDESTROY)) {
               char method_str[31];
               if (option_debug > 2)
                       ast_log(LOG_DEBUG, "Re-scheduled destruction of SIP call %s\n", p->callid ? p->callid : "<unknown>");
               append_history(p, "ReliableXmit", "timeout");
               if (sscanf(p->lastmsg, "Tx: %30s", method_str) == 1 || sscanf(p->lastmsg, "Rx: %30s", method_str) == 1) {
                       if (method_match(SIP_CANCEL, method_str) || method_match(SIP_BYE, method_str)) {
                               ast_set_flag(&p->flags[0], SIP_NEEDDESTROY);
                       }
               }
               return 10000;
       }
-----------------
Change to:
-----------------
       if (p->packets && !ast_test_flag(&p->flags[0], SIP_NEEDDESTROY)) {
               char method_str[31];
               if (option_debug > 2)
                       ast_log(LOG_DEBUG, "Re-scheduled destruction of SIP call %s\n", p->callid ? p->callid : "<unknown>");
               append_history(p, "ReliableXmit", "timeout");
               ast_set_flag(&p->flags[0], SIP_NEEDDESTROY);
               return 10000;
       }
---------------------

But I did not test this on version 1.4.

p.s.
I first time using this bug tracker. Sorry, if I do something wrong :)



By: Private Name (falves11) 2009-12-02 15:00:32.000-0600

Can somebody from Digium please create this patch in a traditional patch format?
Please look in other bugs
for instance
wget 'https://issues.asterisk.org/file_download.php?file_id=24561&type=bug' -O - | patch -p0

By: Private Name (falves11) 2009-12-02 15:19:58.000-0600

I need this for version 1.4 SVN.
Urgently, if I may.

By: Ilya Savin (ili4) 2009-12-02 15:45:48.000-0600

Please, try this patch (for version 1.4):
http://portal.orn.ru/files/asterisk-1.4-rescheduled-destruction.patch

I hope its in right format.

By: Private Name (falves11) 2009-12-02 15:58:42.000-0600

It does not compile for me
make[1]: Entering directory `/usr/src/asterisk/channels'
  [CC] chan_sip.c -> chan_sip.o
chan_sip.c: In function ‘__sip_autodestruct’:
chan_sip.c:2156: warning: unused variable ‘method_str’
chan_sip.c: At top level:
chan_sip.c:2166: error: expected identifier or ‘(’ before ‘if’
chan_sip.c:2167: error: expected identifier or ‘(’ before ‘while’
chan_sip.c:2170: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘->’ token
chan_sip.c:2172: error: expected identifier or ‘(’ before ‘if’
chan_sip.c:2174: error: expected ‘)’ before string constant
chan_sip.c:2175: error: expected identifier or ‘(’ before ‘if’
chan_sip.c:2178: error: expected identifier or ‘(’ before ‘else’
chan_sip.c:2178: error: expected identifier or ‘(’ before ‘)’ token
chan_sip.c:2183: error: expected identifier or ‘(’ before ‘else’
chan_sip.c:2185: error: expected identifier or ‘(’ before ‘return’
chan_sip.c:2186: error: expected identifier or ‘(’ before ‘}’ token
make[1]: *** [chan_sip.o] Error 1
make[1]: Leaving directory `/usr/src/asterisk/channels'
make: *** [channels] Error 2

By: Private Name (falves11) 2009-12-02 16:00:05.000-0600

I copied the patch to /usr/src, then I entered /usr/src/asterisk and typed
patch -p0 < ../asterisk-1.4-rescheduled-destruction.patch
having the patch in /usr/src.

By: Charles Moye (chazzam) 2009-12-02 16:55:01.000-0600

looks like in your patch ili4 that you left in a trailing '}' that still need to be removed. falves11, if you just modify the patch to also remove the '}' just after the "ast_set_flag(&p->flags[0], SIP_NEEDDESTROY);" line that gets added, then it will probably compile.

By: Private Name (falves11) 2009-12-02 17:37:19.000-0600

I tried that but if I remove it, the patch fails to apply. kindly rewrite and test it.
Many thanks

By: Private Name (falves11) 2009-12-02 18:02:07.000-0600

I made it compile with your note. many thanks. I am testing it now.

By: Private Name (falves11) 2009-12-02 21:15:38.000-0600

The patch does work. Could somebody fix it and repost it?

By: dant (dant) 2009-12-02 21:40:07.000-0600

This is the same bug report as ASTERISK-14653 ASTERISK-14576 and ASTERISK-14337 - two of which raised by the same reporter... The patch provided is very similar to the one in those reports...

By: Private Name (falves11) 2009-12-02 21:45:47.000-0600

I think that this is a different issue. My older bugs resulted in patches, which are already in SVN, and the problem continues. In any case, this patch fixes the issue. There is a message about an unused variable when compiling, so probably the bug marshals will modify it.  In fact, this issue is also in the Business Edition and they are working on it right now. It only happens under some conditions, not in a "normal" sip dialog.

By: Ilya Savin (ili4) 2009-12-03 00:54:06.000-0600

Yes, I miss a trailing '}'. I've updated patch at link http://portal.orn.ru/files/asterisk-1.4-rescheduled-destruction.patch .

falves11, did You compiled chan_sip.so with my patch/changes?

If recompiled chan_sip.so causes same problems, try to set in sip.conf this settings:

rtptimeout=60
session-timers=originate
session-expires=600
session-minse=90
session-refresher=uas


Look at /var/log/asterisk/debug. Is there a "Re-scheduled destruction of SIP call" mesages? In my debug log I found many those messages with same call-id. Do you have same?

By: Private Name (falves11) 2009-12-03 05:14:29.000-0600

I need a confirmation: Do session-timers work on 1.4? In any case, my problem is gone. Whatever this patch achieves deals with the issue. Many thanks.

By: Private Name (falves11) 2009-12-08 16:48:20.000-0600

"This is the same bug report as 0015716 0015627 and 0015356 - two of which raised by the same reporter... The patch provided is very similar to the one in those reports..."
I realize that the patches provided before for some reason never make it into the SVN, and I upgrade, and suddenly I get back where I was, losing ports. I want to beg the Bug Marshals to gather all the cases and commit one patch that works. The BE has the same issue, and I have a case open there too. Maybe we should solve this issue once and for all.

By: Private Name (falves11) 2009-12-08 17:09:21.000-0600

I cannot compile SVN chan_sip today at 6:00 PM EST
 [CC] chan_phone.c -> chan_phone.o
  [LD] chan_phone.o -> chan_phone.so
  [CC] chan_sip.c -> chan_sip.o
chan_sip.c: In function â??__sip_autodestructâ??:
chan_sip.c:2156: warning: unused variable â??method_strâ?? (<-------this is for this patch)
chan_sip.c: At top level:
chan_sip.c:2166: error: expected identifier or â??(â?? before â??ifâ??
chan_sip.c:2167: error: expected identifier or â??(â?? before â??whileâ??
chan_sip.c:2170: error: expected â??=â??, â??,â??, â??;â??, â??asmâ?? or â??__attribute__â?? before â??->â?? token
chan_sip.c:2172: error: expected identifier or â??(â?? before â??ifâ??
chan_sip.c:2174: error: expected â??)â?? before string constant
chan_sip.c:2175: error: expected identifier or â??(â?? before â??ifâ??
chan_sip.c:2178: error: expected identifier or â??(â?? before â??elseâ??
chan_sip.c:2178: error: expected identifier or â??(â?? before â??)â?? token
chan_sip.c:2183: error: expected identifier or â??(â?? before â??elseâ??
chan_sip.c:2185: error: expected identifier or â??(â?? before â??returnâ??
chan_sip.c:2186: error: expected identifier or â??(â?? before â??}â?? token
make[1]: *** [chan_sip.o] Error 1
make[1]: Leaving directory `/usr/src/asterisk/channels'
make: *** [channels] Error 2

By: Leif Madsen (lmadsen) 2009-12-15 11:14:29.000-0600

Is this still an issue after commit 234095 in the 1.4 branch? This looks similar to all the issues closed in that commit.

By: Private Name (falves11) 2009-12-15 11:17:18.000-0600

I would close the case. The issue is gone with the wind.

By: Leif Madsen (lmadsen) 2009-12-15 12:11:27.000-0600

Likely fixed with commit 234095. Closed per reporter.