Summary: | ASTERISK-15079: [patch] Thousands of Invites never discarded in sip channels | ||
Reporter: | Private Name (falves11) | Labels: | |
Date Opened: | 2009-11-11 14:27:58.000-0600 | Date Closed: | 2009-12-21 10:41:21.000-0600 |
Priority: | Major | Regression? | Yes |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) asterisk-rescheduled-destruction.patch | |
Description: | The scenario is very simple We receive an INVITE we respond Hangup(34) after a few hours, if you type "ship show channels" you get thousand of those failed invites, all identica, llike this: 208.X.X.X 7134239089 65745230093 00102/00000 0x0 (nothing) No Init: INVITE 208.X.X.X 8633996336 41250481070 00102/00000 0x0 (nothing) No Init: INVITE 208.X.X.X 4046670409 0403812114e 00102/00000 0x0 (nothing) No Init: INVITE 208.X.X.X 7134239089 06060dc345a 00102/00000 0x0 (nothing) No Init: INVITE ****** ADDITIONAL INFORMATION ****** This happens in all 1.4 versions. Afer a few hours, the Linux box is dead. If you type "lsof -i", you can see one line per each of the failed SIP INVITES. It never stops growing. I can let a Bug marshall log into my box and look at what happens. It is a production box, so I cannot do traces or debugs, etc. | ||
Comments: | By: Private Name (falves11) 2009-12-01 20:07:32.000-0600 The problem can be detected also in the business edition. I have an open ticket (EEX-278218), and one Digium engineer just captured all the traces that he can possible need. Maybe one open source marshal can request access to the case, or to the box, since the issue is present in all versions. I believe that this is the last major bug in Asterisk. Under some circumstances, Asterisk starts to accumulate sockets and never releases them, until the machine needs to be restarted. By: Vadim Sherbakov (vinsik) 2009-12-02 07:01:01.000-0600 Hey, I seem to have similar problems, but they are not as persistent. I was using ast-1.4.23.1 before and normal amount of open sip channels in my system is 30-80, but after upgrading to 1.4.26.1 version, it has gone up from 200 and over. Almost all of them are INVITE and OPTIONS messages. Also after couple of days my asterisk died/stalled with errors like: [Dec 2 13:52:22] ERROR[26901] chan_sip.c: Unable to build sip pvt data for 'XXXXXX' (Out of memory or socket error) [Dec 2 13:53:09] ERROR[2873] rtp.c: Unable to allocate socket: Too many open files [Dec 2 13:57:30] ERROR[2873] rtp.c: Unable to allocate socket: Too many open files [Dec 2 14:00:34] ERROR[2863] asterisk.c: Unable to create pipe: Too many open files So now i need to restart asterisk once a day. Oh, and i have set ulimit to 65535, no affect. I would gladly supply more info, but this is also a live system, so i have my limitations. ------------ Later today i noticed that asterisk had stalled again. This time sip show channels had over 8000 OPTIONS and INVITE messages waiting. I had to revert back to asterisk 1.4.23.1. Best regards, Vadim By: Ilya Savin (ili4) 2009-12-02 14:43:02.000-0600 Hi. I have same problem on version 1.6.1.6, 1.6.2 and latest trunk. When asterisk does not revieve any reply from remode side on SIP dial (ex: dialing non existing host), sip channel becomes "frozen". This channels cleared only after asterisk restarts. In sip.conf RTP timers and SIP Session-Timers are enabled. In debug log I saw messages "Re-scheduled destruction of SIP call <call-id>". Messages with same call-id repeat every 30 seconds. So I think "Re-scheduled destruction of SIP call" mechanism does not work. I fixed this by modifying chan_sip.c (for version 1.6.1.6). Code in function __sip_autodestruct: -------------- if (p->packets) { if (!p->needdestroy) { char method_str[31]; ast_debug(3, "Re-scheduled destruction of SIP call %s\n", p->callid ? p->callid : "<unknown>"); append_history(p, "ReliableXmit", "timeout"); if (sscanf(p->lastmsg, "Tx: %30s", method_str) == 1 || sscanf(p->lastmsg, "Rx: %30s", method_str) == 1) { if (method_match(SIP_CANCEL, method_str) || method_match(SIP_BYE, method_str)) { p->needdestroy = 1; } } return 10000; } else { /* They've had their chance to respond. Time to bail */ __sip_pretend_ack(p); } } ---------------- I changed to: ---------------- if (p->packets) { if (!p->needdestroy) { ast_debug(3, "Re-scheduled destruction of SIP call %s\n", p->callid ? p->callid : "<unknown>"); append_history(p, "ReliableXmit", "timeout"); p->needdestroy = 1; return 10000; } else { /* They've had their chance to respond. Time to bail */ __sip_pretend_ack(p); } } ------------- So I removed check for SIP_CANCEL method from "Re-scheduled destruction of SIP call" mechanism. And now all frozen channels cleared in few minutes. I've uploaded patch named asterisk-rescheduled-destruction.patch. Please, try it, and report, how its works. Alternative link to the patch: http://portal.orn.ru/files/asterisk-rescheduled-destruction.patch But this is all for version 1.6.1.6. For version 1.4.27.1 it seems to be look like this: ---------------------- if (p->packets && !ast_test_flag(&p->flags[0], SIP_NEEDDESTROY)) { char method_str[31]; if (option_debug > 2) ast_log(LOG_DEBUG, "Re-scheduled destruction of SIP call %s\n", p->callid ? p->callid : "<unknown>"); append_history(p, "ReliableXmit", "timeout"); if (sscanf(p->lastmsg, "Tx: %30s", method_str) == 1 || sscanf(p->lastmsg, "Rx: %30s", method_str) == 1) { if (method_match(SIP_CANCEL, method_str) || method_match(SIP_BYE, method_str)) { ast_set_flag(&p->flags[0], SIP_NEEDDESTROY); } } return 10000; } ----------------- Change to: ----------------- if (p->packets && !ast_test_flag(&p->flags[0], SIP_NEEDDESTROY)) { char method_str[31]; if (option_debug > 2) ast_log(LOG_DEBUG, "Re-scheduled destruction of SIP call %s\n", p->callid ? p->callid : "<unknown>"); append_history(p, "ReliableXmit", "timeout"); ast_set_flag(&p->flags[0], SIP_NEEDDESTROY); return 10000; } --------------------- But I did not test this on version 1.4. p.s. I first time using this bug tracker. Sorry, if I do something wrong :) By: Private Name (falves11) 2009-12-02 15:00:32.000-0600 Can somebody from Digium please create this patch in a traditional patch format? Please look in other bugs for instance wget 'https://issues.asterisk.org/file_download.php?file_id=24561&type=bug' -O - | patch -p0 By: Private Name (falves11) 2009-12-02 15:19:58.000-0600 I need this for version 1.4 SVN. Urgently, if I may. By: Ilya Savin (ili4) 2009-12-02 15:45:48.000-0600 Please, try this patch (for version 1.4): http://portal.orn.ru/files/asterisk-1.4-rescheduled-destruction.patch I hope its in right format. By: Private Name (falves11) 2009-12-02 15:58:42.000-0600 It does not compile for me make[1]: Entering directory `/usr/src/asterisk/channels' [CC] chan_sip.c -> chan_sip.o chan_sip.c: In function ‘__sip_autodestruct’: chan_sip.c:2156: warning: unused variable ‘method_str’ chan_sip.c: At top level: chan_sip.c:2166: error: expected identifier or ‘(’ before ‘if’ chan_sip.c:2167: error: expected identifier or ‘(’ before ‘while’ chan_sip.c:2170: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘->’ token chan_sip.c:2172: error: expected identifier or ‘(’ before ‘if’ chan_sip.c:2174: error: expected ‘)’ before string constant chan_sip.c:2175: error: expected identifier or ‘(’ before ‘if’ chan_sip.c:2178: error: expected identifier or ‘(’ before ‘else’ chan_sip.c:2178: error: expected identifier or ‘(’ before ‘)’ token chan_sip.c:2183: error: expected identifier or ‘(’ before ‘else’ chan_sip.c:2185: error: expected identifier or ‘(’ before ‘return’ chan_sip.c:2186: error: expected identifier or ‘(’ before ‘}’ token make[1]: *** [chan_sip.o] Error 1 make[1]: Leaving directory `/usr/src/asterisk/channels' make: *** [channels] Error 2 By: Private Name (falves11) 2009-12-02 16:00:05.000-0600 I copied the patch to /usr/src, then I entered /usr/src/asterisk and typed patch -p0 < ../asterisk-1.4-rescheduled-destruction.patch having the patch in /usr/src. By: Charles Moye (chazzam) 2009-12-02 16:55:01.000-0600 looks like in your patch ili4 that you left in a trailing '}' that still need to be removed. falves11, if you just modify the patch to also remove the '}' just after the "ast_set_flag(&p->flags[0], SIP_NEEDDESTROY);" line that gets added, then it will probably compile. By: Private Name (falves11) 2009-12-02 17:37:19.000-0600 I tried that but if I remove it, the patch fails to apply. kindly rewrite and test it. Many thanks By: Private Name (falves11) 2009-12-02 18:02:07.000-0600 I made it compile with your note. many thanks. I am testing it now. By: Private Name (falves11) 2009-12-02 21:15:38.000-0600 The patch does work. Could somebody fix it and repost it? By: dant (dant) 2009-12-02 21:40:07.000-0600 This is the same bug report as ASTERISK-14653 ASTERISK-14576 and ASTERISK-14337 - two of which raised by the same reporter... The patch provided is very similar to the one in those reports... By: Private Name (falves11) 2009-12-02 21:45:47.000-0600 I think that this is a different issue. My older bugs resulted in patches, which are already in SVN, and the problem continues. In any case, this patch fixes the issue. There is a message about an unused variable when compiling, so probably the bug marshals will modify it. In fact, this issue is also in the Business Edition and they are working on it right now. It only happens under some conditions, not in a "normal" sip dialog. By: Ilya Savin (ili4) 2009-12-03 00:54:06.000-0600 Yes, I miss a trailing '}'. I've updated patch at link http://portal.orn.ru/files/asterisk-1.4-rescheduled-destruction.patch . falves11, did You compiled chan_sip.so with my patch/changes? If recompiled chan_sip.so causes same problems, try to set in sip.conf this settings: rtptimeout=60 session-timers=originate session-expires=600 session-minse=90 session-refresher=uas Look at /var/log/asterisk/debug. Is there a "Re-scheduled destruction of SIP call" mesages? In my debug log I found many those messages with same call-id. Do you have same? By: Private Name (falves11) 2009-12-03 05:14:29.000-0600 I need a confirmation: Do session-timers work on 1.4? In any case, my problem is gone. Whatever this patch achieves deals with the issue. Many thanks. By: Private Name (falves11) 2009-12-08 16:48:20.000-0600 "This is the same bug report as 0015716 0015627 and 0015356 - two of which raised by the same reporter... The patch provided is very similar to the one in those reports..." I realize that the patches provided before for some reason never make it into the SVN, and I upgrade, and suddenly I get back where I was, losing ports. I want to beg the Bug Marshals to gather all the cases and commit one patch that works. The BE has the same issue, and I have a case open there too. Maybe we should solve this issue once and for all. By: Private Name (falves11) 2009-12-08 17:09:21.000-0600 I cannot compile SVN chan_sip today at 6:00 PM EST [CC] chan_phone.c -> chan_phone.o [LD] chan_phone.o -> chan_phone.so [CC] chan_sip.c -> chan_sip.o chan_sip.c: In function â??__sip_autodestructâ??: chan_sip.c:2156: warning: unused variable â??method_strâ?? (<-------this is for this patch) chan_sip.c: At top level: chan_sip.c:2166: error: expected identifier or â??(â?? before â??ifâ?? chan_sip.c:2167: error: expected identifier or â??(â?? before â??whileâ?? chan_sip.c:2170: error: expected â??=â??, â??,â??, â??;â??, â??asmâ?? or â??__attribute__â?? before â??->â?? token chan_sip.c:2172: error: expected identifier or â??(â?? before â??ifâ?? chan_sip.c:2174: error: expected â??)â?? before string constant chan_sip.c:2175: error: expected identifier or â??(â?? before â??ifâ?? chan_sip.c:2178: error: expected identifier or â??(â?? before â??elseâ?? chan_sip.c:2178: error: expected identifier or â??(â?? before â??)â?? token chan_sip.c:2183: error: expected identifier or â??(â?? before â??elseâ?? chan_sip.c:2185: error: expected identifier or â??(â?? before â??returnâ?? chan_sip.c:2186: error: expected identifier or â??(â?? before â??}â?? token make[1]: *** [chan_sip.o] Error 1 make[1]: Leaving directory `/usr/src/asterisk/channels' make: *** [channels] Error 2 By: Leif Madsen (lmadsen) 2009-12-15 11:14:29.000-0600 Is this still an issue after commit 234095 in the 1.4 branch? This looks similar to all the issues closed in that commit. By: Private Name (falves11) 2009-12-15 11:17:18.000-0600 I would close the case. The issue is gone with the wind. By: Leif Madsen (lmadsen) 2009-12-15 12:11:27.000-0600 Likely fixed with commit 234095. Closed per reporter. |