Summary: | ASTERISK-17487: [patch] Large number of active sip dialogs PUBLISH in the output "sip show channels". | ||||
Reporter: | obi van (obi van) | Labels: | |||
Date Opened: | 2011-02-28 07:55:40.000-0600 | Date Closed: | 2011-06-08 01:32:22 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Channels/chan_sip/General | ||
Versions: | 1.8.2 | Frequency of Occurrence | |||
Related Issues: |
| ||||
Environment: | Attachments: | ( 0) svn-320708.diff ( 1) svn-320715.diff ( 2) svn-320715-bad-event.diff | |||
Description: | On Debian 5.0 and Asterisk 1.8.2.4 (also 1.8.2.3) in the output "sip show channels" I see the following (IP addresses is fake): 123.45.678.910 (None) OTMxYmI3YWRjOGN 0x0 (nothing) No Rx: PUBLISH <guest> 123.45.678.910 (None) ZmNmNmRhMTUyNDQ 0x0 (nothing) No Rx: PUBLISH <guest> 123.45.678.910 (None) NTZiYjhlNGRlNzY 0x0 (nothing) No Rx: PUBLISH <guest> To these addresses are registered softphones clients. Execute a command "sip show channel" on any of the PUBLISH dialogues gives the following results (123.45.678.900 - is address Asterisk): *CLI>sip show channel NTZiYjhlNGRlNzY * SIP Call Curr. trans. direction: Incoming Call-ID: NTZiYjhlNGRlNzY1NGIxMTBhMzFiMTgxNTlkNGNjNmU. Owner channel ID: <none> Our Codec Capability: 0x10d (g723|ulaw|alaw|g729) Non-Codec Capability (DTMF): 1 Their Codec Capability: 0x0 (nothing) Joint Codec Capability: 0x0 (nothing) Format: 0x0 (nothing) T.38 support No Video support No MaxCallBR: 0 kbps Theoretical Address: 123.45.678.910:5060 Received Address: 123.45.678.910:5060 SIP Transfer mode: open Force rport: Yes Audio IP: 123.45.678.900 (local) Our Tag: as2b4a5359 Their Tag: 27384736 SIP User agent: Zoiper rev.6313 Need Destroy: No Last Message: Rx: PUBLISH Promiscuous Redir: No Route: N/A DTMF Mode: rfc2833 SIP Options: (none) Session-Timer: Inactive Number of such dialogues can reach up to 100 or more! CLI command "sip reload" does not help. Only helps "core stop now". I noticed that Session-Timer: Inactive With the execution of commands for any active dialogue, for example ACK, I get the following: Session-Timer: Active S-Timer Interval: 600 S-Timer Refresher: uas S-Timer Expirys: 0 S-Timer Sched Id: 162202 S-Timer Peer Sts: Inactive S-Timer Cached Min-SE: 0 S-Timer Cached SE: 600 S-Timer Cached Ref: auto S-Timer Cached Mode: Originate While the output is consistent with the settings in the file sip.conf. It is seen that: Session-Timer: Active It seems to me that the dialogue PIBLISH does not work Session-Timer. | ||||
Comments: | By: Leif Madsen (lmadsen) 2011-02-28 13:55:40.000-0600 Here is a request for additional information after discussing this with another developer. Find his responses below: So the issue is apparently that incoming PUBLISH messages create a sip_pvt structure that is not going away. Well, I assume the sip_pvt does not go away based on how the issue description is worded. Now, in past Asterisk versions, Asterisk had no support at all for SIP PUBLISH, so those packets would be dropped immediately and such a problem couldn't occur. In 1.8, PUBLISH support was added because call completion required it. So it's likely a problem that only affects 1.8. What would be helpful to find out is 1) Why are these PUBLISHes being sent to Asterisk? 2) Do the PUBLISH dialogs go away eventually or do they stick around in the system forever? I'm assuming they're sticking around, but it's not 100% clear in that regard. But the other thing you can note for the reporter is that if the PUBLISHes do not relate to call completion (which I am 99.99% sure they are not related), find the source of the PUBLISHes and stop sending them. Because if the PUBLISHes are for presence or something like that, Asterisk is just going to completely ignore them anyway. By: isrl (isrl) 2011-02-28 18:01:37.000-0600 i had them on snoms and turned off publish presence which made them go away just wondering if that affects native call completion of the snoms By: obi van (obi van) 2011-03-01 00:12:27.000-0600 I have asked enable a softphone, which generated these PUBLISH dialogs, and the next day came 8 such dialogues. SIP client is Zoiper penultimate version. I will look softphone settings to understand why he sends these dialogs. For example, I watched the console a week and saw that the dialogues have not gone away and there they have accumulated about a hundred. By: Amilcar S Silvestre (amilcar) 2011-03-01 09:58:02.000-0600 We have the exact same problem here. In one or two days, we are more than 1000 active SIP dialogs. All stuck incoming PUBLISH dialogs. By: Amilcar S Silvestre (amilcar) 2011-03-01 13:36:42.000-0600 About the developer questions (yes, it only affects 1.8); 1) Mostly of the PUBLISHs are sent by clients for presence. I have snoms, xlites, polycoms, linksys and grandstreams phones as sip clients. The PUBLISH problem occurs with all of them. 2) The PUBLISH dialogs stick around in the system forever. If asterisk completely ignores the PUBLISH messages for presence, why the sip dialogs are not killed or something like this? By: Amilcar S Silvestre (amilcar) 2011-03-01 13:41:23.000-0600 Folow is a dialog that stick around in the system forever. Asterisk sends 489 "Bad Event", but the sip dialog remains in the system: (the client, as you can see is X-Lite release 1100l stamp 47546 in this case, and i've changed the domain name for privacy) <--- SIP read from UDP:189.39.17.2:33313 ---> PUBLISH sip:3004@pbx.domain.com SIP/2.0 Via: SIP/2.0/UDP 189.39.17.2:33870;branch=z9hG4bK-d8754z-bd588b0167259706-1---d8754z-;rport Max-Forwards: 70 Contact: <sip:3004@189.39.17.2:33313> To: "Position #04"<sip:3004@pbx.domain.com> From: "Position #04"<sip:3004@pbx.domain.com>;tag=4347bb3e Call-ID: Zjg3YzU4ZTM5MWM4OGQ1OTgzNTZhMzk3OWI3ZGY5MjM. CSeq: 1 PUBLISH Expires: 3600 Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, NOTIFY, MESSAGE, SUBSCRIBE, INFO Content-Type: application/pidf+xml User-Agent: X-Lite release 1100l stamp 47546 Event: presence Content-Length: 425 <?xml version='1.0' encoding='UTF-8'?><presence xmlns='urn:ietf:params:xml:ns:pidf' xmlns:dm='urn:ietf:params:xml:ns:pidf:data-model' xmlns:rpid='urn:ietf:params:xml:ns:pidf:rpid' xmlns:c='urn:ietf:params:xml:ns:pidf:cipid' entity='sip:3004@pbx.domain.com'><tuple id='t290fce49'><status><basic>open</basic></status></tuple><dm:person id='t290fce49'><rpid:activities><rpid:unknown/></rpid:activities></dm:person></presence> <-------------> --- (14 headers 1 lines) --- <--- Transmitting (no NAT) to 189.39.17.2:33313 ---> SIP/2.0 489 Bad Event Via: SIP/2.0/UDP 189.39.17.2:33870;branch=z9hG4bK-d8754z-bd588b0167259706-1---d8754z-;rport;received=189.39.17.2 From: "Position #04"<sip:3004@pbx.domain.com>;tag=4347bb3e To: "Position #04"<sip:3004@pbx.domain.com>;tag=as2c86fa71 Call-ID: Zjg3YzU4ZTM5MWM4OGQ1OTgzNTZhMzk3OWI3ZGY5MjM. CSeq: 1 PUBLISH Server: Vonix PBX Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH Supported: replaces, timer Content-Length: 0 <------------> After this, sip channel is stuck forever: pbx-cpe*CLI> sip show channel Zjg3YzU4ZTM5MWM4OGQ1OTgzNTZhMzk3OWI3ZGY5MjM * SIP Call Curr. trans. direction: Incoming Call-ID: Zjg3YzU4ZTM5MWM4OGQ1OTgzNTZhMzk3OWI3ZGY5MjM. Owner channel ID: <none> Our Codec Capability: 0x80000008000e (gsm|ulaw|alaw|h263|testlaw) Non-Codec Capability (DTMF): 1 Their Codec Capability: 0x0 (nothing) Joint Codec Capability: 0x0 (nothing) Format: 0x0 (nothing) T.38 support No Video support No MaxCallBR: 0 kbps Theoretical Address: 189.39.17.2:33313 Received Address: 189.39.17.2:33313 SIP Transfer mode: open Force rport: No Audio IP: 123.45.67.8 (local) Our Tag: as2c86fa71 Their Tag: 4347bb3e SIP User agent: X-Lite release 1100l stamp 47546 Need Destroy: No Last Message: Rx: PUBLISH Promiscuous Redir: No Route: N/A DTMF Mode: rfc2833 SIP Options: (none) Session-Timer: Inactive By: Amilcar S Silvestre (amilcar) 2011-03-02 08:24:35.000-0600 The issue remains in FEEDBACK status. What else do you need? Thanks. By: Amilcar S Silvestre (amilcar) 2011-03-02 08:27:21.000-0600 An example of another endpoint (this time a snom IP phone, useragent snom190/3.60x): <--- SIP read from UDP:192.168.0.245:2051 ---> PUBLISH sip:0702@192.168.0.1 SIP/2.0 Via: SIP/2.0/UDP 192.168.0.245:2051;branch=z9hG4bK-g5t37zvtlwta;rport From: "Lauriane" <sip:0702@192.168.0.1>;tag=wslu8n69as To: "Lauriane" <sip:0702@192.168.0.1> Call-ID: 3c2e08c6a875-104w99db1fdq@snom190 CSeq: 1 PUBLISH Max-Forwards: 70 Event: number-guessing Content-Type: application/text Content-Length: 27 Number: 0715 Max-Hits: 3 <-------------> --- (10 headers 2 lines) --- <--- Transmitting (no NAT) to 192.168.0.245:2051 ---> SIP/2.0 489 Bad Event Via: SIP/2.0/UDP 192.168.0.245:2051;branch=z9hG4bK-g5t37zvtlwta;rport;received=192.168.0.245 From: "Lauriane" <sip:0702@192.168.0.1>;tag=wslu8n69as To: "Lauriane" <sip:0702@192.168.0.1>;tag=as0023ecea Call-ID: 3c2e08c6a875-104w99db1fdq@snom190 CSeq: 1 PUBLISH Server: Vonix PBX Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH Supported: replaces, timer Content-Length: 0 <------------> This sip dialog get stuck too. By: Birger "WIMPy" Harzenetter (wimpy) 2011-03-02 09:55:47.000-0600 I have an installation that suffers quite badly from this issue. However I have absolutely no idea what is causing this. I tried disabling PUBLISH in the Snoms but that made no difference. OTOH I'm also using Snom here and I get none of these stuck dialogs. Even though PUBLISH is enabled here (default). I think it might happen each time, a call is transferred. But still I can't reproduce it here. By: Amilcar S Silvestre (amilcar) 2011-03-03 09:46:29.000-0600 For us, this is always reproducible. Using xlite, all you have to do is configure on sip account setting, tab Presence, the PRESENCE AGENT option. Do some calls using the softphone and you will end having lots of stuck PUBLISH dialogs. By: Amilcar S Silvestre (amilcar) 2011-03-28 15:19:33 Any information needed to push this forward? It's a pretty big issue. By: Birger "WIMPy" Harzenetter (wimpy) 2011-03-28 19:09:35 Yes, I get 1000-3000 stuck dialogs at the end of the day. Unlike what I thought to have observed, the transport type does not make a difference. I'm pretty sure, however, that it only happens on outgoing calls. And there is a difference. I have overlap dialing enabled here, whereas I had to disable that on the affected system. By: Tony Vroon (chainsaw) 2011-05-18 08:13:41 This also affects the Ekiga client; Asterisk 1.8.4 To confirm, these channels never clear. There is no command available to clear them, only a full core restart does so. Call routing remains functional until all slots are stuck. pink*CLI> sip show channel 0eae63b7-b77f-e011-976e-001742895bf3@paulh-laptop2 * SIP Call Curr. trans. direction: Incoming Call-ID: 0eae63b7-b77f-e011-976e-001742895bf3@paulh-laptop2 Owner channel ID: <none> Our Codec Capability: 0xc7f000c (ulaw|alaw|jpeg|png|h261|h263|h263p|h264|mpeg4|red|t140) Non-Codec Capability (DTMF): 1 Their Codec Capability: 0x0 (nothing) Joint Codec Capability: 0x0 (nothing) Format: 0x0 (nothing) T.38 support No Video support No MaxCallBR: 0 kbps Theoretical Address: 1.2.3.4:5060 Received Address: 1.2.3.4:5060 SIP Transfer mode: open Force rport: No Audio IP: 4.3.2.1 (local) Our Tag: as2b82314b Their Tag: fe326cb7-b77f-e011-976e-001742895bf3 SIP User agent: Ekiga/3.2.7 Need Destroy: No Last Message: Rx: PUBLISH Promiscuous Redir: No Route: N/A DTMF Mode: rfc2833 SIP Options: (none) Session-Timer: Inactive By: obi van (obi van) 2011-05-19 00:31:30 While I do not use softphones type Zoiper, such dialogues are not observed. With other phones as I have not seen this problem. By: Douglas Jensen (djensen99) 2011-05-20 11:14:39 Seeing same thing with Zoiper 2.05 and Asterisk 1.8.4. Physical SIP phones in the environment do not seem to have this issue. I've had the client shut down for 2-3 days now to test and the dialog persists. --- * SIP Call Curr. trans. direction: Incoming Call-ID: YzQ1ZmIzYzQ3N2Q3NGZiMDc5YzMzZDQ0MTJmMzFjYTg. Owner channel ID: Our Codec Capability: 0xc (ulaw|alaw) Non-Codec Capability (DTMF): 1 Their Codec Capability: 0x0 (nothing) Joint Codec Capability: 0x0 (nothing) Format: 0x0 (nothing) T.38 support No Video support No MaxCallBR: 0 kbps Theoretical Address: 192.168.1.Client:5060 Received Address: 192.168.1.Client:5060 SIP Transfer mode: open Force rport: Yes Audio IP: 192.168.1.Server (local) Our Tag: as089b9f75 Their Tag: 053dcc5f SIP User agent: Zoiper Communicator 2.05.11136 rev.11135 Need Destroy: No Last Message: Rx: PUBLISH Promiscuous Redir: No Route: N/A DTMF Mode: rfc2833 SIP Options: (none) Session-Timer: Inactive By: Gareth Palmer (gareth) 2011-05-23 21:19:03 Attached a patch (svn-320708.diff) that looks like it should fix it. handle_request_publish was not scheduling dialog destruction. By: Ronald Chan (loloski) 2011-05-24 07:15:56 I can fairly reproduce this at will at 3CXPHONE will test the patch and report. By: Tony Vroon (chainsaw) 2011-05-24 08:33:15 Gareth, I have this patch applied but an RX: Publish session from Ekiga has been open for 3 hours now. What is the expected kill/expiry time for these please? Details follow, IPs obfuscated: pink*CLI> sip show channel 1cde0af7-5b84-e011-91df-001742895bf3@paulh-laptop2 * SIP Call Curr. trans. direction: Incoming Call-ID: 1cde0af7-5b84-e011-91df-001742895bf3@paulh-laptop2 Owner channel ID: <none> Our Codec Capability: 0xc7f000c (ulaw|alaw|jpeg|png|h261|h263|h263p|h264|mpeg4|red|t140) Non-Codec Capability (DTMF): 1 Their Codec Capability: 0x0 (nothing) Joint Codec Capability: 0x0 (nothing) Format: 0x0 (nothing) T.38 support No Video support No MaxCallBR: 0 kbps Theoretical Address: 1.2.3.4:5060 Received Address: 1.2.3.4:5060 SIP Transfer mode: open Force rport: No Audio IP: 4.3.2.1 (local) Our Tag: as5a7efd47 Their Tag: f88d0bf7-5b84-e011-91df-001742895bf3 SIP User agent: Ekiga/3.2.7 Need Destroy: No Last Message: Rx: PUBLISH Promiscuous Redir: No Route: N/A DTMF Mode: rfc2833 SIP Options: (none) Session-Timer: Inactive I was expecting it to be: Sess-Expires : 1800 secs By: Gareth Palmer (gareth) 2011-05-24 17:39:38 Chainsaw: maximum expiry time is whatever maxexpiry is set to but could be less depending on what the client sends in the Expires header. The session timer settings do not have any effect on publish requests. There was a bug in the patch, it didn't handle invalid/unknown publish types, new patch (svn-320715-bad-event.diff) should fix that. By: obi van (obi van) 2011-05-25 00:10:20 I'm testing a patch first day. I will try softphone CSipSimple on Android 2.2 (HTC Desire). By: Tony Vroon (chainsaw) 2011-05-25 04:43:27 Gareth: svn-320715-bad-event.diff works a treat, no publish sessions accumulate. By: Ronald Chan (loloski) 2011-05-25 12:03:46 On my limited testing this appears to work, thanks! By: Birger "WIMPy" Harzenetter (wimpy) 2011-05-27 19:41:24 My testing is also limited, as there isn't much traffic on the affected system ATM. But since I applied the bad-event patch there haven't been any stuck dialogs. So that seems to work fine as far as I can tell. By: Ronald Chan (loloski) 2011-05-29 05:35:45 ok, will now move the patch to at least 70 locals extension with lots of BLF or asterisk hints, let see how it perform. kudos to gareth for this patch. By: Ronald Chan (loloski) 2011-06-01 11:05:41 the patch works great on my end on production server, guys could you please confirm if this also holds true on your end? thank you!!! By: Birger "WIMPy" Harzenetter (wimpy) 2011-06-01 12:47:23 I have disabled the cron job that did a daily restart and the result looks good. *CLI> core show uptime System uptime: 5 days, 13 hours, 33 minutes, 14 seconds Last reload: 5 days, 13 hours, 33 minutes, 14 seconds *CLI> core show calls 1 active call 1123 calls processed *CLI> sip show channels Peer User/ANR Call ID Format Hold Last Message Expiry Peer 192.168.12.145 kab1 363ff8a402b3cfc (alaw) No Tx: ACK kab1 1 active SIP dialog That single ACK is stuck, but the PUBLISH problem is definitely gone. By: se (se) 2011-06-01 12:58:20 Our production server has been running for just over 24 hours now.. no stuck PUBLISH channels.. usually we would have at least 100 per day.. *CLI> core show version Asterisk 1.8.4.1 built by diptel @ bbobv2 on a i686 running Linux on 2011-05-30 12:09:32 UTC *CLI> core show uptime System uptime: 1 day, 2 hours, 40 minutes, 6 seconds Last reload: 1 day, 2 hours, 40 minutes, 6 seconds *CLI> sip show channels Peer User/ANR Call ID Format Hold Last Message Expiry Peer XXX.90.21.230 (None) c05d65c40301139 0x0 (nothing) No Rx: OPTIONS <guest> 10.4.10.188 (None) 6645c8cb-51cb99 0x0 (nothing) No Rx: NOTIFY <guest> 2 active SIP dialogs NOTIFY channel might be stuck (Linksys/SPA942), PUBLISH problem seems gone here as well.. By: Ronald Chan (loloski) 2011-06-01 13:12:20 thanks for confirming guys, i guess it's time to commit this now, and put this to end. By: Gareth Palmer (gareth) 2011-06-01 22:09:34 I have put the patch up on reviewboard: https://reviewboard.asterisk.org/r/1253/ By: Gregory Hinton Nietsky (irroot) 2011-06-08 01:32:22.207-0500 r322322 | irroot | 2011-06-08 08:18:38 +0200 (Wed, 08 Jun 2011) | 18 lines Make handle_request_publish do dialog expiration and destruction. This patch fixes handle_request_publish so that it does dialog expiration and destruction. Without this patch the incoming PUBLISH requests will get stuck in the dialog list. Restarting asterisk is the only way to remove them. Personal observation on one system the server hung up while looping through the channels rendering asterisk unusable and all sip phones unregisterd when they try reregister more requests are added. (closes issue #18898) Reported by: gareth Tested by: loloski, Chainsaw, wimpy, se, kuj, irroot Jira: https://issues.asterisk.org/jira/browse/ASTERISK-17915 Review: https://reviewboard.asterisk.org/r/1253 By: Gregory Hinton Nietsky (irroot) 2011-06-08 02:13:11.440-0500 Trunk r322323 | irroot | 2011-06-08 08:45:55 +0200 (Wed, 08 Jun 2011) | 25 lines Merged revisions 322322 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.8 ........ |