Summary: | ASTERISK-08969: [patch] Asterisk leaves lingering UDP ports until no more ports available | ||
Reporter: | Bluce Ree (tasker) | Labels: | |
Date Opened: | 2007-03-08 10:27:37.000-0600 | Date Closed: | 2010-03-15 11:16:21 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) history_bad_call ( 1) history_good_call ( 2) hung_bye.diff.txt ( 3) hung_cancel_log.txt ( 4) lingering_udp_chan_sip.patch8-against-1.2.18.txt ( 5) lingering_udp_chan_sip.patch8-against-1.4.4.txt ( 6) sip_show_channels ( 7) sip_show_channels2 | |
Description: | After a couple of days with lots of traffic (moderate load), running "netstat -a" shows hundreds of lingering UDP ports, which eventually saturates the available handles and asterisk stops accepting media connections. This occurs in all 1.4.x versions and versions after 1.2.14. This does not occur in 1.2.13 and prior versions. | ||
Comments: | By: Serge Vecher (serge-v) 2007-03-08 11:00:31.000-0600 which channel driver(s) do you use? Do you think you can narrow down the 'breakage' to a specific revision between 1.2.13 and .14 releases? By: Bluce Ree (tasker) 2007-03-08 12:15:35.000-0600 Started at 1.2.14 and 1.4.0. I have not tried any trunk versions prior to 1.4 but current svn trunk also has the problem. This is using the SIP channel driver. I have no means to test H.323 or IAX. By: Serge Vecher (serge-v) 2007-03-08 12:20:58.000-0600 do you have any patches on your system, or these installations are straight from the tarball? By: Bluce Ree (tasker) 2007-03-08 13:40:16.000-0600 Straight from tarball. No patches. By: Bluce Ree (tasker) 2007-03-10 20:06:42.000-0600 I just queried netstat -a on a system running 1.2.13 and 1.4.1 after running both in parallel for last 24 hours. Both systems are identical blades running FC4. Here's the status taken right now with 0 calls online. The first (1.2.13) has no lingering UDP ports. The second (1.4.1) has 398 UDP ports lingering. By: Serge Vecher (serge-v) 2007-03-12 09:40:20 see if you notice any differences in the sip show channels output. By: Bluce Ree (tasker) 2007-03-13 01:07:21 That's what I did in my last note. "Here's the status taken right now with 0 calls online. The first (1.2.13) has no lingering UDP ports. The second (1.4.1) has 398 UDP ports lingering." The same 1.4.1 box right now: bld-6*CLI> show uptime System uptime: 3 days, 11 hours, 55 minutes, 44 seconds Last reload: 3 days, 5 hours, 47 minutes, 3 seconds A "show channels" shows 2 active calls and 4 active channels. A "netstat -a" shows 854 lingering UDP ports. If I restart Asterisk at this point, the lingering gradually ports disappear. By: Bluce Ree (tasker) 2007-03-15 13:21:39 *bump* By: Joshua C. Colp (jcolp) 2007-03-15 14:30:43 What does sip show channels show? It is possible for an Asterisk channel to not be around but a SIP dialog to still be up occupying an RTP and RTCP port. The question would be, why did they not get destroyed. By: Serge Vecher (serge-v) 2007-03-15 14:34:20 ok, since you've narrowed down the breakage to between the 1.2.13 and .13 releases, the next step is to examine specific revisions between these releases. The releases were done for the following revisions of the 1.2 branch: 1.2.13/ 45413 1.2.14/ 48468 If you check out and test specific revisions (start with changes in chan_sip.c and rtp.c first http://svn.digium.com/view/asterisk/branches/1.2/channels/chan_sip.c?rev=58847&view=log) that should hopefully provide the revision that "broke it". By: Bluce Ree (tasker) 2007-03-16 22:21:48 A "sip show channels" shows 35 open dialogs, "netstat -a" shows around 100 open UDP ports and "show channels" shows 0 calls. By: Anthony LaMantia-2 (anthonyl) 2007-03-17 17:50:59 can you provide any infomation about your current sip setup. do you have many peers? are you sending out many MWI notifcations.. By: Joshua C. Colp (jcolp) 2007-03-19 11:36:01 The actual output and content of sip show channels was what I was looking at... to see what state the SIP dialogs are in, and a sip show channel <name> of one would also be useful. Add as an attachment. Thanks! By: Anthony LaMantia-2 (anthonyl) 2007-03-19 12:51:16 hi file, quick question.. is sip_alloc() the only place a new udp sockets (it looks like 2 via ast_rtp_new_bindaddr) are bound? .. i was scanning over calls to sip_alloc() to make sure they were either destoryed after the function which called it destoryed the session or sechd'ed it for dectruction and most all of the cases pretty much seem to do so.. By: Bluce Ree (tasker) 2007-03-19 18:58:14 Issuing "sip show channels" currently shows 1164 sip channels. Issuing "show channels" shows only 138 open channels. The system has no peers. It accepts inbound calls and checks the originating IP in real-time against a database in an AGI application. Here's partial output: Peer User/ANR Call ID Seq (Tx/Rx) Form Hold Last Message 216.xx.xx.69 9739681582 12b07824096 00103/00102 unkn No Tx: ACK 216.xx.xx.69 9739254877 3fa55777561 00103/00102 unkn No Tx: ACK 216.xx.xx.69 6315258681 7e45151d37e 00103/00102 unkn No Tx: ACK 216.xx.xx.69 9739311309 7b36bed0537 00103/00103 unkn No Tx: ACK 216.xx.xx.69 9144885200 0cd2bef4774 00103/00102 unkn No Tx: ACK 216.xx.xx.69 6319682316 4af8fb995be 00103/00102 unkn No Tx: ACK 216.xx.xx.69 8626868904 047779d5406 00103/00103 unkn No Tx: ACK Shouldn't these open dialogs time out and get destroyed? By: Serge Vecher (serge-v) 2007-03-20 08:37:27 Alright, we'll need to see some logs here. Pick one IP and in the instruction below, replace "sip debug" with "sip set debug ip aaa.bbb.ccc.ddd" 1) Prepare test environment (reduce the amount of unrelated traffic on the server); 2) Make sure your logger.conf has the following line: console => notice,warning,error,debug 3) restart Asterisk with the following command: 'asterisk -Tvvvvvdddddngc | tee /tmp/verbosedebug.txt' 4) Enable SIP transaction logging with the following CLI commands (1.4/trunk commands in parenthesis): set debug 4 (core set debug 4) set verbose 4 (core set verbose 4) sip debug (sip set debug) 5) Reproduce the problem 6) Trim startup information and attach verbosedebug.txt to the issue. By: Serge Vecher (serge-v) 2007-03-20 08:50:47 linking to bug 8036 based on note 0061087 there. tasker: where you able to work on narrowing down a specific revision as per my note 0060940. By: Bluce Ree (tasker) 2007-03-20 19:57:22 Unfortunately, this is happening in a production environment. I have both time and resource constraints. By: Serge Vecher (serge-v) 2007-03-26 12:07:49 tasker: still need to see those logs ... By: cmaj (cmaj) 2007-03-26 13:48:14 I'm seeing the exact same problems in all 1.4 releases, including 1.4.2. "sip show channels" shows hundreds of channels with last message as "Init: INVITE" I tried turning on more intense SIP logging, but it crashed * and this is a production server, so I can't turn the logging back on right now. Also, I'm running this on a server behind another server running the iptables firewall with SIP connection tracking enabled. By: Bluce Ree (tasker) 2007-04-07 14:22:07 After 4 days: "sip show channels - 4153 active sip channels" Screen also full of these: "[Apr 7 15:07:55] ERROR[15759]: rtp.c:1913 ast_rtp_new_with_bindaddr: No RTP ports remaining. Can't setup media stream for this call. [Apr 7 15:07:55] WARNING[15759]: udptl.c:819 ast_udptl_new_with_bindaddr: No UDPTL ports remaining" Asterisk rejects calls at this point and required a reboot. I maintain my recommendation that this be bumped to critical. By: Badalian Vyacheslav (slavon) 2007-04-09 08:53:10 SVN-trunk-r60763 - bug still here By: cmaj (cmaj) 2007-04-09 09:32:30 I second the move to critical. I've disabled SIP on my production server because of this problem. I want to rule out my firewall setup. To others experiencing this problem -- are you behind a firewall ? Using SIP connection tracking with iptables ? Thanks. By: Serge Vecher (serge-v) 2007-04-09 12:14:46 Guys, if this is an urgent production issue, I would suggest putting a bounty offer together and contacting one of the Asterisk consultants (oej could be one) to resolve this issue, otherwise things will proceed until one of sip developers has a chance to do that. By: Joel Jacobson (joelj) 2007-04-10 02:26:51 I am having the same problem. I am willing to contribute with 500 USD to the bounty if this problem can be solved within 5 days. Thanks. By: Clod Patry (junky) 2007-04-10 10:07:14 tasker: could ya attach a snip of ur sip show channels (with all totals), same for core show channels please. Thanks. By: cmaj (cmaj) 2007-04-10 11:54:19 I've been running 5-10 concurrent SIP calls thru one provider for several hours today after applying the attached file lingering_udp_chan_sip.patch.txt and not seen this problem. Can somebody try it under a little more load ? (Patch is against 1.4.2 release tarball). Thanks. By: cmaj (cmaj) 2007-04-10 17:00:55 It looks like this problem only crops up when you reach more than 10 concurrent SIP calls. My patch is not solving it. By: cmaj (cmaj) 2007-04-11 18:04:47 My bad on that patch yesterday. The 'patch2' is holding up much better today. I really sat down and cranked thru some logs to get at it. There's two cases: one with calls stuck in INVITE and the other in an ACK. Both are the result of retransmission failures. When the server finally comes back with a response, the call is already hungup. Problem is in the first case that the 100 TRYING received from the server cancels the destruction of the SIP dialog. patch2 attempts to fix that. In the second case, a 487 received from the server fails to schedule an auto-destruction of the SIP dialog. patch2 also attempts to fix that. To help test this patch, besides 'netstat -an' output, do 'core set verbose 4' on the CLI then after an hour of heavy use 'cat /var/log/asterisk/messages | grep Retransmitting'. If you see a lot of retransmit counts of 5, then you are probably going to run into this bug and should try patch2 above. By: Octavio Ruiz (tacvbo) 2007-04-12 14:00:13 You will obtain a more accurate metric with: # lsof | grep asterisk | grep UDP By: cmaj (cmaj) 2007-04-12 17:15:38 Can anybody else test patch2 ? This is my second day of running it in production. But it's not on a very busy server. Only 10-20 concurrent calls. Still, no signs of hung UDP. By: daniel g (revolution) 2007-04-13 11:27:12 i can test it on both of our production systems here... one normally only has between 10 - 20 concurrent calls; while the other usually gets to around 40 - 50 during peak calling time. I'll apply the patch tonight and let you know how it's working over the weekend. By: Joel Jacobson (joelj) 2007-04-13 17:39:54 Why is the patch against 1.4.2? Wouldn't it be better to fix this bug for the latest version? I tried to find the piece of code to patch and apply the patch manually for 1.4.12, but the block of relevant code is totally missing in 1.4.12? No special handler for response 101 exists in 1.4.12, it is threated like 183. /* RFC3261 says we must treat every 1xx response (but not 100) that we don't recognize as if it was 183. */ By: cmaj (cmaj) 2007-04-13 18:19:08 1.4.12 ??? Where did you get that ? The last tarball I see is 1.4.2 That said, this was broke between 1.2.13 and 1.2.14, and for all 1.4.x I took a diff between 1.2.13 and 1.2.14 -- the patch2 should apply with a little fuzz on 1.2.x, but I haven't tried. About that comment in the code you cited, the 2 lines right below that comment are somewhat confusing: if (resp > 100 && resp < 200 && resp!=101 && resp != 180 && resp != 183) resp = 183; But they are the same as: if (resp > 101 && resp < 200 && resp != 180 && resp != 183) resp = 183; So 100 and 101 were not handled as 183. However, both 100 and 101 are handled the same way in the switch statement below that. I'm attaching patch3 which tries to make it a little clearer and only handles the case of 100, which is a little more honest because I only saw the bug with 100 and never with 101. Please let me know if that clears things up. By: cmaj (cmaj) 2007-04-13 18:22:43 Okay, I see now, you pulled 1.2.17 My patch doesn't work against that. Let me fix it. By: cmaj (cmaj) 2007-04-13 18:31:47 Alright 2 versions of patch3 are up now, one each for 1.2.17 and 1.4.12, clearly marked. I haven't tested with 1.2.17, but it compiles. (The first 3 patches can be deleted by a bug marshal.) By: Joel Jacobson (joelj) 2007-04-14 07:13:33 OK, thank you. I have upgraded to 1.4.2 and applied the patch for it now. This is the result after only a couple of minutes of uptime. Will retest later today and let you know the result. (Data such as IP addresses and phone numbers have been removed below.) voip*CLI> sip show channels Call ID Seq (Tx/Rx) Form Hold 3435fa7b08d 00102/00000 g729 No Init: INVITE 8D30164E-E9 00101/00101 g729 No Rx: INVITE 3e2fdf392fa 00102/00000 g729 No Tx: ACK 88A1BC4A-E9 00101/00101 g729 No Rx: ACK 427754d3717 00102/00000 g729 No Tx: ACK 7B8003E1-E9 00101/00101 g729 No Rx: ACK 3cba3fbe0da 00102/00000 unkn No Tx: ACK 34f9302e699 00102/00000 unkn No Tx: ACK 584740b81cf 00102/00000 g729 No Tx: ACK 4943A8A0-E9 00101/00101 g729 No Rx: ACK 2527a63c469 00102/00000 g729 No Tx: ACK 4875FA95-E9 00101/00101 g729 No Rx: ACK 39863fb6144 00102/00000 g729 No Tx: ACK 6DCA1A76-E9 00101/00101 g729 No Rx: ACK 7020851f15d 00102/00000 g729 No Tx: ACK 67CBB8D0-E9 00101/00101 g729 No Rx: ACK 5128fd8c15f 00102/00000 g729 No Tx: ACK 635108-3385 00101/00001 g729 No Rx: ACK 636295e8581 00102/00000 g729 No Tx: ACK 4B22480A-E9 00101/00101 g729 No Rx: ACK 20 active SIP channels voip*CLI> show channels State Application Ringing AppDial Ring Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial 18 active channels 9 active calls root@voip:~# lsof | grep asterisk | grep UDP | wc -l 44 By: cmaj (cmaj) 2007-04-14 09:38:24 Are you bridging SIP-to-SIP ? If so, then that looks pretty good ! From what I can tell, the SIP dialogs (from "sip show channels") should auto-destruct in at most, 32 seconds after the last message was received. So even after a call is hungup (not seen in "core show channels"), the open UDP ports will still linger for half a minute. By: daniel g (revolution) 2007-04-14 13:52:23 OK... after running over night... some DEFINITE improvements are seen... however it seems to still be lingering on Invites... 03eb6fc5215 00102/00000 unkn No Init: INVITE 3a8403fe412 00102/00000 unkn No Init: INVITE 0f61724419a 00102/00000 unkn No Init: INVITE 0451ad5f34c 00102/00000 unkn No Init: INVITE 765355af622 00102/00000 unkn No Init: INVITE 7160affa282 00102/00000 unkn No Init: INVITE 6dbdef3c375 00102/00000 unkn No Init: INVITE 7e23a040752 00102/00000 unkn No Init: INVITE 7914bfaf6ce 00102/00000 unkn No Init: INVITE 456e67c87e1 00102/00000 unkn No Init: INVITE 2dcce62b63a 00102/00000 unkn No Init: INVITE 1f3e34341f7 00102/00000 unkn No Init: INVITE 0678a58e225 00102/00000 unkn No Init: INVITE 3893ca636e2 00102/00000 unkn No Init: INVITE 284bf2a106e 00102/00000 unkn No Init: INVITE These have remained in tehre for a few minutes at least -- suggestions? ideas? THAT said... The number of channels reported in "sip show channels" has dropped from around 500+ after a night of usage to closer to 200. Any additional information you need? By: Joel Jacobson (joelj) 2007-04-14 13:57:55 revolution: What patch did you use? I am using the patch3 for 1.4.2. I have been running Asterisk for 7 hours now. Results below. Look good, I think? voip*CLI> show channels State Application Down AppDial Ring Dial Ringing AppDial Ring Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Bridged Up Dial Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial Up Bridged Up Dial 26 active channels 13 active calls voip*CLI> sip show channels Call ID Seq (Tx/Rx) Form Hold 198cc12f032 00102/00000 g729 No Init: INVITE DC27D0DF-E9 00101/00101 g729 No Rx: INVITE 2ccf5965331 00102/00000 unkn No Tx: ACK 2bc546c55cd 00295/00000 unkn No 04b1ccf1079 00295/00000 unkn No 0e22c84a14a 00102/00000 g729 No Tx: ACK CEF946CD-E9 00101/00101 g729 No Rx: ACK 7fd2b2b82d3 00102/00000 g729 No Tx: ACK 62666c29304 00102/00000 unkn No Tx: ACK CD80B3CA-E9 00101/00101 g729 No Rx: ACK 461ed7e5380 00102/00000 g729 No Tx: ACK 8E1DBA79-E9 00101/00101 g729 No Rx: ACK 2a65fa3800a 00102/00000 g729 No Tx: ACK 661238-3385 00101/00001 g729 No Rx: ACK 0e66e3c4467 00102/00000 g729 No Tx: ACK 1c10f7fe440 00102/00000 g729 No Tx: ACK 6316697B-E9 00101/00101 g729 No Rx: ACK 6307C35A-E9 00101/00101 g729 No Rx: ACK 0c49310b72f 00102/00000 g729 No Tx: ACK 62673ED2-E9 00101/00101 g729 No Rx: ACK 36af22f937b 00102/00000 g729 No Tx: ACK 46BA7D48-E9 00101/00101 g729 No Rx: ACK 229cec5876e 00102/00000 g729 No Tx: ACK 6A9AECF3-E9 00101/00101 g729 No Rx: ACK 6f8093137ea 00102/00000 g729 No Tx: ACK 2790EC0B-E9 00101/00101 g729 No Rx: ACK 0b7bd73407f 00102/00000 g729 No Tx: ACK 660480-3385 00101/00001 g729 No Rx: ACK 5995122e55f 00102/00000 g729 No Tx: ACK 3DF6BF92-E9 00101/00101 g729 No Rx: ACK 26 active SIP channels root@voip:~# lsof | grep asterisk | grep UDP | wc -l 58 By: cmaj (cmaj) 2007-04-14 14:16:45 revolution: Can you post output similar to joelj, to compare "core show channels", "sip show channels", and "lsof | grep asterisk | grep UDP | wc -l" ? Also, some output from "sip show channel XXX" would be great. That will reveal more state information on those stuck channels. By: daniel g (revolution) 2007-04-14 16:17:12 not being during peak times makes logs easier :) keep in mind that this is active so it might be a slight change between logs... lsof | grep asterisk | grep UDP | wc -l 79 core show channels Up Bridged Call Up Dial Up Bridged Call Up Dial Up Bridged Call Up Dial Up Bridged Call Up Dial Up Bridged Call Up Dial 10 active channels 5 active calls sip show channels 740d343e260 00102/00000 ulaw No Init: INVITE 52ae679c00c 00101/22215 g729 No Rx: INVITE 6190c022069 00102/00000 g729 No Init: INVITE HOUMGC01200 00101/00001 g729 No Rx: INVITE 1c939d11180 00102/00000 ulaw No Init: INVITE c9db0f64ac2 00101/51104 g729 No Rx: INVITE 00102/00000 g729 No Tx: ACK 00101/00001 g729 No Rx: ACK 00102/00000 ulaw No Tx: ACK da84f7d617e 00101/37565 g729 No Rx: ACK 7f9d111918e 00102/00000 ulaw No Tx: ACK 4e75ca45bfb 00101/61239 ulaw No Rx: ACK 1f3e2d326dd 00102/00000 ulaw No Tx: ACK ORLMGC01200 00101/00001 g729 No Rx: ACK 7a8fdd36453 00102/00000 ulaw No Tx: ACK 3134758561@ 00101/04803 g729 No Rx: ACK 4bd432a92f2 00102/00000 ulaw No Tx: ACK a9aba496ff2 00101/10089 g729 No Rx: ACK 269c95ff595 00102/00000 unkn No Init: INVITE 38c9fa9148d 00102/00000 unkn No Init: INVITE 03eb6fc5215 00102/00000 unkn No Init: INVITE 3a8403fe412 00102/00000 unkn No Init: INVITE 0f61724419a 00102/00000 unkn No Init: INVITE 0451ad5f34c 00102/00000 unkn No Init: INVITE 765355af622 00102/00000 unkn No Init: INVITE 7160affa282 00102/00000 unkn No Init: INVITE 6dbdef3c375 00102/00000 unkn No Init: INVITE 7e23a040752 00102/00000 unkn No Init: INVITE 7914bfaf6ce 00102/00000 unkn No Init: INVITE 456e67c87e1 00102/00000 unkn No Init: INVITE 2dcce62b63a 00102/00000 unkn No Init: INVITE 1f3e34341f7 00102/00000 unkn No Init: INVITE 0678a58e225 00102/00000 unkn No Init: INVITE 3893ca636e2 00102/00000 unkn No Init: INVITE 284bf2a106e 00102/00000 unkn No Init: INVITE sip show channel 284bf2a106e * SIP Call Curr. trans. direction: Outgoing Call-ID: 284bf2a106ebdd176632dd9863c24cc3 Owner channel ID: <none> Our Codec Capability: 256 Non-Codec Capability (DTMF): 1 Their Codec Capability: 0 Joint Codec Capability: 256 Format: 0x0 (nothing) MaxCallBR: 384 kbps Theoretical Address: Public IP:5060 Received Address: Public IP:5060 SIP Transfer mode: closed NAT Support: Always Audio IP: Local IP Our Tag: as1040ea60 Their Tag: SIP User agent: Username: Peername: Original uri: sip: @192.168.0.12 Need Destroy: 0 Last Message: Init: INVITE Promiscuous Redir: No Route: N/A DTMF Mode: rfc2833 SIP Options: (none) any thing else you need -- feel free to ask. So far this patch has cut down on a LOT of the lingering channels... in one of the servers (the one running between 10 - 20 concurrent calls typically) -- it seems to have cut them out entirely. These come from a much larger box... typically running around 40 - 50 (and sometimes *rarely* as high as 60) concurrent calls Patch being used is patch3-against-1.4.2 and it has been running since about 9 P.M. EST last night (which makes it around 20 hours). Progress is definitely being made... thanks for that cmaj. By: Olle Johansson (oej) 2007-04-16 09:31:49 cmaj: Please confirm that you have a disclaimer on file, so I can check your patches. Thanks. By: Olle Johansson (oej) 2007-04-16 09:38:35 Ok guys. To debug these channels, we need sip history on them. Turn on siphistory and dumphistory in the configuration file, restart. Then show history on the hung channels so we know what's going on. Thanks. By: cmaj (cmaj) 2007-04-16 10:05:16 Disclaimer on file. I have no rights when I submit patches :) oej: Do you feel that dialogs should be destroyed if they are call-related but have no channel owner ? That is the idea behind my patches. These additional cases of hung channels can be solved by applying that idea in more places. By: daniel g (revolution) 2007-04-16 12:00:55 I can enable *history later tonight (when I don't have 44 active calls running) and will submit either tomorrow or wednesday (whichever will be most useful to you all). thanks olle/cmaj! (p.s. my active SIP channels are STILL lower than they were prior to cmaj's patches) By: cmaj (cmaj) 2007-04-16 14:07:34 patch4 expounds on the idea that you can not cancel SIP dialogs related to a channel if said channel no longer exists. By: cmaj (cmaj) 2007-04-16 14:15:56 joelj & revolution: Are y'all registering with a SIP server in sip.conf ? If so, are those stuck channels only with that server and no other SIP endpoints ? By: Badalian Vyacheslav (slavon) 2007-04-16 14:35:01 - /*! XXX: we should also check here does the other side supports t38 at all !!! XXX */ + /*! XXX: we should also check here does the other side supports t39 at all !!! XXX */ Intresting change ) By: daniel g (revolution) 2007-04-16 15:28:39 Negative. In my case, these are calls going to off-site ATAs. I will apply patch4 asap and let you know the results tomorrow when I post the sip history. Running sip show history <channel> shows me that * attempted (five times) to send an Invite to an ATA which was behind a router (in the case of this channel). In the case of some other channels, the ATA has been on a public IP but may have gone offline while Asterisk was attempting to send the Invite (or may not have -- just tested a call to an ATA I thought was offline and found out it was not!). By: Joel Jacobson (joelj) 2007-04-16 16:00:21 I don't register with any SIP peers. sip.conf: [general] port=5060 bindaddr=0.0.0.0 dtmfmode=rfc2833 canreinvite=no deny=all allow=g729 nat=yes [outbound] fromuser=voip type=peer host=x.x.x.x port=5060 qualify=no [inbound] type=peer accountcode=xxxx context=xxxx insecure=very host=x.x.x.x extensions.conf: [general] writeprotect=yes [globals] [default] exten => s,1,Answer() exten => s,2,Playback(demo-thanks) exten => s,3,Hangup() [xxxx] exten => _XXXXXXXXXXX,1,AGI(route.agi) exten => h,1,DeadAGI(hangup.agi|${DIALSTATUS}|${ANSWEREDTIME}) By: Joel Jacobson (joelj) 2007-04-16 16:17:18 After having applied the new patch4, is it sufficient to recompile and just reload asterisk and the sip module? Like this: rm -rf asterisk-1.4.2 tar xvzf asterisk-1.4.2.tar.gz cd asterisk-1.4.2 patch -p0 < ../lingering_udp_chan_sip.patch4-against-1.4.2.txt ./configure --prefix=/usr make make install asterisk -r voip*CLI> reload voip*CLI> module reload chan_sip.so Or do I need to do "stop now" and then restart asterisk? By: cmaj (cmaj) 2007-04-16 16:28:28 In theory, the "reload module XXX" should work, but I always use "restart now". By: Joel Jacobson (joelj) 2007-04-16 17:00:48 Will active calls be killed if I do a restart now? By: cmaj (cmaj) 2007-04-16 17:13:26 Yes. You could do "restart gracefully" to wait for the last call to hangup. During that time, no new calls will be accepted by the server. By: daniel g (revolution) 2007-04-17 08:46:24 some history... it seems that the channels on only hanging now on certain peers... but this is an ATA with a public IP that is currently accessible. The other hung channels look fairly similar to this (with only the obvious differences -- IP/Chan ID/etc). Is this what you wanted oej? sip show history 3f5c72a645dcd822250ea4eb47f204cf@sip._.com 1. NewChan Channel SIP/0-0829aef0 - from 3f5c72a645dcd822250ea4eb4 2. TxReqRel INVITE / 102 INVITE - -UNKNOWN- 3. ReTx 1000 INVITE sip:0@aaa.bbb.ccc.ddd:5060 SIP/2.0 4. ReTx 2000 INVITE sip:0@aaa.bbb.ccc.ddd:5060 SIP/2.0 5. ReTx 4000 INVITE sip:0@aaa.bbb.ccc.ddd:5060 SIP/2.0 6. ReTx 8000 INVITE sip:0@aaa.bbb.ccc.ddd:5060 SIP/2.0 7. ReTx 16000 INVITE sip:0@aaa.bbb.ccc.ddd:5060 SIP/2.0 By: cmaj (cmaj) 2007-04-17 09:36:33 I didn't know about "sip history" until this bug. That is extremely helpful. That call you posted is never scheduling a destruction of the SIP dialog. My patch only cancels the cancel of a pending destruction. It doesn't set one up in the first place. Back to the drawing board for a bit... By: daniel g (revolution) 2007-04-17 12:36:13 Understandable. I've found with some peers... an Invite isn't responded to in a timely manner (the peer may've gone off-line mid Invite... or a poorly-designed/too-restrictive firewall may be blocking or any number of things) and this causes a hang in the sip channels. Progress has DEFINITELY been made as I said before cmaj... I don't really have to worry about restarting * every couple of days now ;) By: Olle Johansson (oej) 2007-04-18 04:54:13 Are all of these calls *outbound* calls, initiated from the PBX where the INVITE failed and we did not cancel them and clean up? By: Joel Jacobson (joelj) 2007-04-18 05:58:00 >Are all of these calls *outbound* calls, initiated from the PBX where the INVITE failed and we did not cancel them and clean up? Yes, all of the "unkn No Init: INVITE" sip channels are outgoing, that is, the peer IP is the IP of the SIP server receiving the call. I have not observed this problem for incoming channels. By: Olle Johansson (oej) 2007-04-18 06:37:33 I've been able to reproduce this issue by just placing a call to a server that has nothing on port 5060 with dial(sip/a.b.c.com) - no timeouts or anything. Autocongest kicks in, but the calling channel (inbound) doesn't seem to issue a hangup. Wonder if different inbound channels will behave differently - I can only test SIP here. Will continue to dig. By: daniel g (revolution) 2007-04-18 11:18:13 From my end -- yes these are all outbound. I've not noticed this problem with inbound either... By: Joel Jacobson (joelj) 2007-04-18 13:43:08 Once again, I see a lot of "unkn No Init: INVITE", but this time I noticed something I didn't think of before. I have two different groups of SIP peers that I connect two, located at two different colocation sites. The problems only happens to one of them. Same hardware on both sites. The difference is that one of them is on a quite unreliable ADSL connection, while the other one is on a dedicated T1. All errors occurs to the SIP peers on the ADSL connection. I did a ping on the IPs, and they sometimes are offline. The Asterisk have probably sent a lot of SIP INVITEs to the SIP peers on that connection that sometimes fail because of the unreliable Internet connection. By: daniel g (revolution) 2007-04-18 15:35:03 I would agree with you joelj... in most cases -- it would seem that the peer has lost connectivity mid-invite (how's that for poor timing?) -- but asterisk should still destroy the channel instead of just leaving it hanging after the 5 invites fail... right? By: cmaj (cmaj) 2007-04-18 16:15:40 It should totally do that. By calling an IP that is up but not listening for SIP, I am able to reproduce the error reliably now. This should speed things up. Do you see an auto-congest at the end of "sip show history xxx" ? I am scheduling a dialog destruction in the auto_congest function for my next patch, amongst other places. By: daniel g (revolution) 2007-04-18 16:33:20 negative... from sip show history xxxx you get exactly what I posted earlier (with the peers on my system at least). It sends an Invite... retransmits a few times... then gives up and leaves the channel hung. By: cmaj (cmaj) 2007-04-18 17:50:35 patch5 is a drastic change, but it's much simpler than earlier patches. If it doesn't fix this bug, then I think combining some elements from patch4 will help. Thanks for the good feed back on those connectivity issues with your endpoints. By: cmaj (cmaj) 2007-04-18 17:58:20 Hold on with patch 5. It was too simple. I see a problem with it. Just a few minutes... By: cmaj (cmaj) 2007-04-18 18:18:11 I don't understand how something as simple as patch 6 is fixing this problem, but it is. Please test with just patch 6. If it doesn't solve everything with patch 6, but it seems to solve the other half of the problems not solved with patch 4, then apply patch 4 on top of patch 6. By: daniel g (revolution) 2007-04-18 19:02:25 patch applied... I actually saw the active channels get down to around 100 (some of our original ATA had been set to register WAY too frequently thus result in hung registration channels... your patch seems to be clearing most of those out as well...) I'll let you know the results later tonight or tomorrow morning... so far so good though!! By: daniel g (revolution) 2007-04-19 07:02:50 OK. Applied *JUST* patch 6 last night... still have some hung channels (but not outbound as far as I can tell -- though I admit to not having checked every single channel... just a few), so I'm going to apply patch 4 right now and test and post the results later today. edit -- OK, some testing has been done with both patches 4 and 6 applied (only about two hours of uptime thus far) and I see no hung channels. I've tried to recreate random failures (rather than doing it programmatically) by taking the power cord out of the phone... taking the cat5 cable out of the phone... disabling the DS3... etc. and so far I still have no hung sip channels. Will continue to run with patches 4 and 6 until close of business today and report back (if there are hung channels). If there are no hung channels by C-O-B, i'll check after 24 full hours have elapsed and post then. (p.s. in case you were wondering, we've already had 30 concurrent calls running this morning... ) By: daniel g (revolution) 2007-04-19 10:27:29 ok the story behind this one... is irrelevent (although still sorta amusing) 1. NewChan Channel SIP/0a-08288e18 - from 77770544077a33693a9b2abf2 2. TxReqRel INVITE / 102 INVITE - -UNKNOWN- 3. Rx SIP/2.0 / 102 INVITE / 100 Trying 4. Rx SIP/2.0 / 102 INVITE / 180 Ringing 5. Rx SIP/2.0 / 102 INVITE / 200 OK 6. TxReq ACK / 102 ACK - -UNKNOWN- 7. Rx INVITE / 26194 INVITE / sip:+0b@aaa.bbb.ccc.ddd:5060 8. TxResp SIP/2.0 / 26194 INVITE - 488 Not acceptable here 9. Rx ACK / 26194 ACK / sip:+0b@aaa.bbb.ccc.ddd:5060 10. Rx INVITE / 26195 INVITE / sip:+0b@aaa.bbb.ccc.ddd:5060 11. TxResp SIP/2.0 / 26195 INVITE - 488 Not acceptable here 12. SchedDestroy 32000 ms 13. CancelDestroy In this case it seems as though a destruction was scheduled until * got the message to cancel it. Now I can fix the situation that caused this (essentially a person was trying to dial their number at another location and the box was automatically switching the codec which was a disallowed codec on the receiving end... thus the 488)... but it still brings up a point of interest as to whether any other situations could possibly cause this same thing. By: Bluce Ree (tasker) 2007-04-19 14:55:39 How about just eliminating CancelDestroy? Unless there is an active voice stream on a channel I don't see a reason to cancel a destroy. By: daniel g (revolution) 2007-04-19 15:18:48 I could be wrong and it may not ACTUALLY mean Cancel the Destroy. It could be Cancel/Destroy... but from looking at the chan_sip.c file, the comment is Cancel destruction of SIP Dialog. I'm assuming it's there for a reason -- so who can shed some light on this? By: cmaj (cmaj) 2007-04-19 15:36:59 I can't speak for the original authors, and I think this is my first patch in chan_sip, but SIP is not just for voice. That seems to be part of the concern in the code. You might be sending a text message via SIP (SIMPLE) with no voice at all. Registrations, notifications, etc., never make a sound. That said, I think the CancelDestroy's -- cancel the destruction of a SIP dialog -- are sprinkled too liberally in the code. My patches beat those CancelDestroy's with a large stick. Perhaps it is too large, but for the most part, I agree that the code should err on the side of the voice phone call. Worst case scenario, my final patches for this bug are modified to only operate if some option in sip.conf like "babysitdeadIPaddresses=yes" is set. I will add patches 4 & 6, and try to fix the funky voice codec situation in upcoming patch 7, but if there are more corner cases it would be good to find them ASAP. By: daniel g (revolution) 2007-04-19 16:26:34 Well cmaj I can say this much for your patches -- the ONLY lingering channel thus far is the one I posted earlier. All others have cleared out. edit : just so you know... I restarted * last night (to clear out the one lingering channel after I changed the codecs to both be ulaw -- which is what they were supposed to be anyways)... and I have NO lingering channels at this point. * has been running for 18 hours now... and all SIP channels have cleared properly. I feel like working out the CancelDestroy may be the solution to all the rare unique cases. :) Patch7 (which I suppose would be a combo of patches 4 & 6 and maybe something to address the canceldestroy) should hopefully be your last one cmaj. By: Ronald Chan (loloski) 2007-04-24 09:37:35 cmaj: could you create a backport for 1.2 branch for your patch 7 also :) thanks By: Joel Jacobson (joelj) 2007-04-24 13:06:00 I have run Asterisk with patch4+patch6 since the 19th of April, with peak 24 concurrent calls. This is the result: root@voip:~# lsof | grep asterisk | grep UDP | wc -l 46 root@voip:~# asterisk -r Asterisk 1.4.2, Copyright (C) 1999 - 2006 Digium, Inc. and others. Created by Mark Spencer <markster@digium.com> Asterisk comes with ABSOLUTELY NO WARRANTY; type 'core show warranty' for details. This is free software, with components licensed under the GNU General Public License version 2 and other licenses; you are welcome to redistribute it under certain conditions. Type 'core show license' for details. ========================================================================= Connected to Asterisk 1.4.2 currently running on voip (pid = 25828) voip*CLI> show channels Channel Location State Application(Data) SIP/xxxxxxxxx xxxxxxx@xxxx Down AppDial((Outgoing Line)) SIP/xxxxxxxxx xxxxxxx@xxxx Ring Dial(SIP/xxxxxxxxx SIP/xxxxxxxxx (None) Up Bridged Call(OOH323/x.x.x.x OOH323/x.x.x.x xxxxxxx@xxxx Up Dial(SIP/xxxxxxxxx SIP/xxxxxxxxx (None) Up Bridged Call(SIP/xxxxxxxxx SIP/xxxxxxxxx xxxxxxx@xxxx Up Dial(SIP/xxxxxxxxx SIP/xxxxxxxxx (None) Up Bridged Call(SIP/xxxxxxxxx SIP/xxxxxxxxx xxxxxxx@xxxx Up Dial(SIP/xxxxxxxxx SIP/xxxxxxxxx (None) Up Bridged Call(SIP/xxxxxxxxx SIP/xxxxxxxxx xxxxxxx@xxxx Up Dial(SIP/xxxxxxxxx SIP/xxxxxxxxx (None) Up Bridged Call(SIP/xxxxxxxxx SIP/xxxxxxxxx xxxxxxx@xxxx Up Dial(SIP/xxxxxxxxx SIP/xxxxxxxxx (None) Up Bridged Call(OOH323/x.x.x.x OOH323/x.x.x.x xxxxxxx@xxxx Up Dial(SIP/xxxxxxxxx SIP/xxxxxxxxx (None) Up Bridged Call(OOH323/x.x.x.x OOH323/x.x.x.x xxxxxxx@xxxx Up Dial(SIP/xxxxxxxxx SIP/xxxxxxxxx (None) Up Bridged Call(SIP/xxxxxxxxx SIP/xxxxxxxxx xxxxxxx@xxxx Up Dial(SIP/xxxxxxxxx 18 active channels 9 active calls voip*CLI> sip show channels Peer User/ANR Call ID Seq (Tx/Rx) Form Hold Last Message x.x.x.x xxxxx 0d6379b3780 00102/00000 g729 No Init: INVITE x.x.x.x (None) 4CD78C74-F1 00101/00101 g729 No Rx: INVITE x.x.x.x xxxxx 1124692724e 00102/00000 g729 No Tx: ACK x.x.x.x (None) 47D49F1B-F1 00101/00101 g729 No Rx: ACK x.x.x.x xxxxx 280ce54f593 00102/00000 unkn No Tx: ACK x.x.x.x xxxxxxxxx 11988b5106b 00102/00000 g729 No Tx: ACK x.x.x.x xxxxx 2ab967a202d 00102/00000 unkn No Tx: ACK x.x.x.x xxxxxxxxx 3c1464c7765 00102/00000 unkn No Tx: ACK x.x.x.x xxxxx 19bb31190cd 00102/00000 g729 No Tx: ACK x.x.x.x 6153158959 810748-3386 00101/00001 g729 No Rx: ACK x.x.x.x xxxxxxxxx 1a7e6d425c7 00102/00000 g729 No Tx: ACK x.x.x.x (None) E0154D5B-F1 00101/00101 g729 No Rx: ACK x.x.x.x xxxxxxxxx 671ab04d587 00102/00000 g729 No Tx: ACK x.x.x.x (None) 848875CF-F1 00101/00101 g729 No Rx: ACK x.x.x.x xxxxxxxxx 07301e04081 00102/00000 g729 No Tx: ACK x.x.x.x (None) 723F30BB-F1 00101/00101 g729 No Rx: ACK x.x.x.x xxxxxxxxx 6d7238376c1 00102/00000 g729 No Tx: ACK x.x.x.x xxxxxxxxx 6f1c1d7c579 00102/00000 g729 No Tx: ACK x.x.x.x xxxxxxxxx 65bd428e781 00102/00000 g729 No Tx: ACK x.x.x.x (None) 17389D67-F1 00101/00101 g729 No Rx: ACK 20 active SIP channels voip*CLI> The problem seems to have disappeared. By: Joel Jacobson (joelj) 2007-04-24 14:18:58 oej: Btw, would it be possible for you to take a look at my suggested solution to another problem? I have waited many days now, but noone has commented on my idea. Noticed you were involved in the bug/feature. Bug # 4825 (I know this is probably a bad place to discuss other bugs, but I don't know of any way to contact a specific user in the forum, is there a way?) By: Olle Johansson (oej) 2007-04-27 03:37:01 You never ever auth a CANCEL message, so the latest patch is very bad. By: cmaj (cmaj) 2007-04-30 10:06:38 I put together two versions of patch 8; against 1.4.4 & 1.2.18. The 1.4.4 version should apply to 1.4.2 with a little fuzz. I would guess the 1.2.18 version would apply with similar fuzz for recent 1.2.x. I have not tested it yet. It's just an addition of patches 4 & 6 (patch 7 has the t39 typo; ignore it). With no significant problems reported from the combo of 4 & 6, other than the one odd codec issue experienced by revolution, I didn't see a need to do much else other than back port to 1.2.18 oej: Were you looking at the first patch by accident ? That is old and should be ignored. serge-v: Can you delete all patches *except the two patches labeled "patch 8"* ? joelj & revolution: Any change in status with patches 4 & 6 applied ? Any other weird SIP issues cropping up that you think might be related to the patches ? Is it possible to upgrade to 1.4.4 and try that out with patch 8 ? loloski: Please let me know how the 1.2.18 patch works and/or if it works on another revision of 1.2.x. I've only just gotten it to compile against 1.2.18 and have no idea if it works in production, nor do I have an environment to test it. The differences between 1.2.x and 1.4.x are significant, and thus so are the differences between the patches. It will need thorough testing. By: daniel g (revolution) 2007-04-30 10:58:13 cmaj -- moved to 1.4.4 on one of the systems earlier, I'll do some testing with that and see how it goes. Other than the odd codec issue, I didn't see any other invites hanging. Give me a couple of days and I'll let you know the results. By: Ronald Chan (loloski) 2007-05-01 03:18:53 cmaj, Ok i will do that, i'll let you know how is it going, i experience this mostly with reinvites turn on with Queues without having chan_agent involve via incoming SIP/ZAP call with 20+ members/queue, thanks for the patch anyway i will report my progress here. By: Olle Johansson (oej) 2007-05-02 01:27:59 cmaj: I think you are on the right track, but change way too many places. The big question is where in the code we cause a CancelDestroy on the sip history posted here. We send a 488 then for some unknown reason cancels destruction after a scheduled destruction - with no incoming data. Do we have more examples on "sip history" on calls that hang or do they all look like this one? By: Badalian Vyacheslav (slavon) 2007-05-02 08:08:40 asterisk*CLI> sip show history 0de646f20c113ad364f43f2a75c3b878@87.255.0.218 asterisk*CLI> * SIP Call 1. ReTx 1000 INVITE sip:a74956444456-4@10.10.134.6:5061 SIP/2.0 2. ReTx 2000 INVITE sip:a74956444456-4@10.10.134.6:5061 SIP/2.0 3. ReTx 4000 INVITE sip:a74956444456-4@10.10.134.6:5061 SIP/2.0 4. ReTx 8000 INVITE sip:a74956444456-4@10.10.134.6:5061 SIP/2.0 asterisk*CLI> sip show history 7a96d228339b547b3e3cea24102b03fa@87.255.0.218 asterisk*CLI> * SIP Call> 1. ReTx 1000 INVITE sip:a74956444456-4@10.10.134.6:5061 SIP/2.0 2. ReTx 2000 INVITE sip:a74956444456-4@10.10.134.6:5061 SIP/2.0 3. ReTx 4000 INVITE sip:a74956444456-4@10.10.134.6:5061 SIP/2.0 4. ReTx 8000 INVITE sip:a74956444456-4@10.10.134.6:5061 SIP/2.0 5. ReTx 16000 INVITE sip:a74956444456-4@10.10.134.6:5061 SIP/2.0 asterisk*CLI> core show version Asterisk SVN-trunk-r62242 built by root @ asterisk on a i686 running Linux on 2007-04-28 05:52:04 UTC No patch. Added file sip_show_channels. Maybe help for u work. Thanks By: cmaj (cmaj) 2007-05-02 08:48:06 oej: Looking at slavon's and revolution's "sip show history" logs, I don't think the Destroy is ever scheduled in a lot of those hung channel situations. So there's no CancelDestroy to worry about. But that is not the only reason for the hung channels. You might be right that this patch is overkill. The improvements in patch 8 vs. patch 2/3 might be all that is required. But considering the number of bug reports filed related to this one, and the reports of this patch fixing the problem for all those whom have tried it, why not just apply it now and let future generations worry about re-factoring it to reduce complexity ? By: Ronald Chan (loloski) 2007-05-03 03:58:01 cmaj: Thanks your patch is working great, even reinvites turn on i don't see any channel now stuck in INVITE,ACK state. thanks By: Olle Johansson (oej) 2007-05-03 06:04:09 slavon: What's the dial string in the dial plan for these calls? Do you match a peer at all or just dial out? cmaj: I would really like to find the error in the code, the calls that cause this, not just apply a blind fix that makes me worried that something else might change in this messy code. By: Badalian Vyacheslav (slavon) 2007-05-03 06:11:23 oej exten => 6444456,1,ChanIsAvail(SIP/a74956444456-1|j) exten => 6444456,2,GotoIf($["${SIPPEER(a74956444456-1:status):0:2}" != "OK"]?102:3) exten => 6444456,3,Dial(SIP/a74956444456-1,9,) exten => 6444456,4,Goto(102) exten => 6444456,102,ChanIsAvail(SIP/a74956444456-2|j) exten => 6444456,103,GotoIf($["${SIPPEER(a74956444456-2:status):0:2}" != "OK"]?203:104) exten => 6444456,104,Dial(SIP/a74956444456-2,9,) exten => 6444456,105,Goto(203) exten => 6444456,203,ChanIsAvail(SIP/a74956444456-3|j) exten => 6444456,204,GotoIf($["${SIPPEER(a74956444456-3:status):0:2}" != "OK"]?304:205) exten => 6444456,205,Dial(SIP/a74956444456-3,9,) exten => 6444456,206,Goto(304) exten => 6444456,304,ChanIsAvail(SIP/a74956444456-4|j) exten => 6444456,305,GotoIf($["${SIPPEER(a74956444456-4:status):0:2}" != "OK"]?405:306) exten => 6444456,306,Dial(SIP/a74956444456-4,9,) exten => 6444456,307,Goto(405) exten => 6444456,405,Dial(SIP/a74956444456-1&SIP/a74956444456-2&SIP/a74956444456-3&SIP/a74956444456-4,60,) exten => 6444456,406,Busy exten => 6444456,407,Hangup By: cmaj (cmaj) 2007-05-03 10:19:32 oej: Did you really mean to call my code a "blind fix" ? Disregarding the fact I hired a team of recently unemployed circus monkeys who squatted for weeks on my keyboard to create the patch: it works, as verified by myself and others. I moved 10k minutes of SIP yesterday on 1.4.4 + patch8 and woke up to 0 hung channels when previously I would have crashed under that load. I don't see how I should bear responsibility for all the "messy code" that is chan_sip -- last time I checked, you were the gatekeeper in this area, and I didn't see any notices of pending re-writes. Considering that this bug is in both the most recent 1.2.x and 1.4.x "stable" versions of Asterisk, but not in the original 1.2.x releases, then it would appear a previous bug fix is at fault. I did not work on that previous fix. But I'm assuming that owing to your authority in this area, you probably did, or at least you are more aware of it than I. I think you should consider that the previous bug fix was perhaps only half of what was necessary. My point is this: when I see comments in the code like "Do we need a timer here if we don't hear from them at all?" in chan_sip, and I put a timer in, and it works, what is the problem with that ? I guess this is just another one of those patches I carry around in my store of "Asterisk Survival Supplies". By: Olle Johansson (oej) 2007-05-03 10:52:27 cmaj: My apologies if that remark hurted. But my way of reasoning, especially for released code, is that we need to know what type of call that causes this before we fix. I haven't seen anything in this bug report specific enough to tell me what types of calls that cause this issue so far, so for myself, I'm working in the blind. If someone provides me access to a system where it happens frequently or provides me with more detailed information, I will look into it and evaluate all patches here. There is an issue here that needs to be fixed, definitely. By: Leif Madsen (lmadsen) 2007-05-03 12:12:53 I can reproduce this for you oej if you wanna login to one of our miami boxes (just co-ordinate a time with me). I'm doing this simply by setting up a bunch of channels via SIPp --> Asterisk (hung channels) --> Asterisk (media generator). After about 500 simultaneous calls are setup (maybe around 3k-4k worth of calls) you'll see at least 10+ hung channels that don't tear down. By: Edoardo Serra - WeBRainstorm S.r.l. (webrainstorm) 2007-05-04 09:10:01 oej, if you want I have a machine on which it occurs regurarly show channels doesn't show up channels but lsof and netstat show a lot of RTP ports opened. By: Edoardo Serra - WeBRainstorm S.r.l. (webrainstorm) 2007-05-05 01:49:42 Just to inform you that the attached patch works for me. My 2 cents By: drrt (drrt) 2007-05-05 07:04:03 actually i can provide with log files which can display behavior of the system and details about a calls which froze sip channels. i ve been noticed the channels get stucked when any of SIP-clients lost it`s connectivity with asterisk. By: drrt (drrt) 2007-05-10 02:33:53 2 oej any updates about the issue? here is CDRs of calls that hangs sip channels. There are 3 strange callswith "no answer" state in the example below and ordinary one. I tried to snif on client side and found nothing special. The calls was initiated by Asterisk. 59101 | 2007-05-10 10:26:50+03 | | | s | outcome | SIP/phone86-0841bb88 | | | | 1 | 0 | NO ANSWER | 3 | | 1178782010.7943 | 59099 | 2007-05-10 10:26:32+03 | | | s | outcome | SIP/phone93-084877d0 | | | | 5 | 0 | NO ANSWER | 3 | | 1178781992.7940 | 59098 | 2007-05-10 10:26:32+03 | | | s | outcome | SIP/phone92-08490c70 | | | | 5 | 0 | NO ANSWER | 3 | | 1178781992.7939 | 59102 | 2007-05-10 10:26:25+03 | 4012703704 | 4012703704 | 6678 | techqueued | SIP/**.**.**.**-08441b40 | SIP/phone91-084470d0 | Queue | tech|t|||120 | 59 | 59 | ANSWERED | 3 | | 1178781985.7937 | By: Olle Johansson (oej) 2007-05-11 02:01:45 I would really like access to the systems. I could repeat this yesterday on one of my systems. What I see is that chan_sip send a congestion control frame to APP_DIAL, but app_dial does not react to this. I added a timer so that the inbound channel was hung up, but APP_DIAL did not react to that either. If that's the case on your systems too, this is something that goes deeper than the channel driver or something is wrong in the way the channel driver communicates with the PBX. By: drrt (drrt) 2007-05-11 02:18:04 Will help you with pleasure. Would you to communicate with my by email(junior_removethis_kanet.ru), jabber(junior_removethis_kanet.ru) or icq(171985179). What is your actual timezone by the way? By: Edoardo Serra - WeBRainstorm S.r.l. (webrainstorm) 2007-05-11 02:20:37 hi oej, send me an email (edoardo.serra at webrainstorm.it) and we can arrange that By: Olle Johansson (oej) 2007-05-11 16:34:39 Will mail you during the weekend. I'm in Sweden - GMT+1 :-) By: drrt (drrt) 2007-05-12 10:14:04 i will look forward for your message. My actual timezone is GMT+3. It makes much easy to cooperate. By: Edoardo Serra - WeBRainstorm S.r.l. (webrainstorm) 2007-05-12 13:51:05 I'm in GMT+1 too (Italy) ;) By: Olle Johansson (oej) 2007-05-14 08:09:34 Ok, the weekend was spent on other private issues... Find me on IRC in the #asterisk-bugs channel or send me email so we can localize and fix this bug. THanks. E-mail: oej@edvina.net By: Olle Johansson (oej) 2007-05-18 07:52:53 Still haven't got access to any system that has this problem. Thanks drrt for repeated contacts and providing me with information. Yet another set of questions: - Do you do any processing in the "h" extension? - What CDR backend do you use? Do you have CDR batch support on in cdr.conf? By: Olle Johansson (oej) 2007-05-18 07:55:13 Logs from drrt indicate what I've seen. We do send a hangup from the SIP channel, but after that the call is still hanging around for some reason, we're not getting a call to sip_hangup from the core. By: Olle Johansson (oej) 2007-05-18 08:02:32 On tuesday, I will look into this with other developers at the dev meeting. This is a highly prioritized bug that I need to get my hands on to analyze. Anything you can add to this bug report is welcome. By: Olle Johansson (oej) 2007-05-18 08:25:10 Ok, we have two cases with call queues involved. Anyone else? It might not be app_dial, but app_queue being the culprit here. By: Olle Johansson (oej) 2007-05-18 10:21:13 Tested on one of the systems that where using Queue. Obviously calls that we sent CANCEL on without any owner where hanging. Fixed this in svn 1.2 rev 65075, 1.4 rev 65076 and trunk 65077. Please test these new versions on your systems - without any other patches. Thank you, webrainstorm, for giving me access to your system. By: Leif Madsen (lmadsen) 2007-05-18 11:07:24 FYI: I was able to reproduce this by using the following topology: SIPp --> Asterisk A --> Asterisk B SIPp calls into Asterisk A. Asterisk A performs a Dial() over SIP to Asterisk B. Asterisk B plays back some audio. Hangup. oej: you asked if we were doing any post call processing with 'h', and I am. No queuing involved in the above topology. By: daniel g (revolution) 2007-05-18 12:21:09 oej -- In my case I have one system using Queue and one not using Queue. I will update the system using Queue to svn later tonight (when people aren't using the phone) and post the results in a few days. By: Olle Johansson (oej) 2007-05-18 14:28:45 Ok, committed a second patch for the case where we are sending 200 OK and not receive an ACK. Please try latest svn on a system with no other patches applied to chan_sip. Thanks. If you still have calls hanging, make sure you upload sip history for those calls and information how the call was places (dial or queue) and if you called a local peer registered with Asterisk or something else ( a service provider, a SIP proxy, a pair of red boots). Thanks. By: Olle Johansson (oej) 2007-05-18 14:28:46 Ok, committed a second patch for the case where we are sending 200 OK and not receive an ACK. Please try latest svn on a system with no other patches applied to chan_sip. Thanks. If you still have calls hanging, make sure you upload sip history for those calls and information how the call was places (dial or queue) and if you called a local peer registered with Asterisk or something else ( a service provider, a SIP proxy, a pair of red boots). Thanks. By: daniel g (revolution) 2007-05-22 08:22:24 1. NewChan Channel SIP/0-081ec3f8 - from 451829f215a6b4ca3ec7f42e0 2. TxReqRel INVITE / 102 INVITE - -UNKNOWN- 3. ReTx 1000 INVITE sip:0@aaa.bbb.ccc.ddd:5060 SIP/2.0 4. ReTx 2000 INVITE sip:0@aaa.bbb.ccc.ddd:5060 SIP/2.0 5. ReTx 4000 INVITE sip:0@aaa.bbb.ccc.ddd:5060 SIP/2.0 6. ReTx 8000 INVITE sip:0@aaa.bbb.ccc.ddd:5060 SIP/2.0 Placed using Dial and to a remote peer registered with *. It seems that this peer may have gone offline during the Invite and thus never responded. By: daniel g (revolution) 2007-05-22 14:28:42 1. NewChan Channel SIP/0-081e5bf0 - from 4c10fc8c03acbce668e680c451bc18 2. TxReqRel INVITE / 102 INVITE - -UNKNOWN- 3. ReTx 1000 INVITE sip:0@aaa.bbb.ccc.ddd:19186;rinstance=c9c6c6ac2c62b 4. ReTx 2000 INVITE sip:0@aaa.bbb.ccc.ddd:19186;rinstance=c9c6c6ac2c62b 5. ReTx 4000 INVITE sip:0@aaa.bbb.ccc.ddd:19186;rinstance=c9c6c6ac2c62b 6. ReTx 8000 INVITE sip:0@aaa.bbb.ccc.ddd:19186;rinstance=c9c6c6ac2c62b 7. ReTx 16000 INVITE sip:0@aaa.bbb.ccc.ddd:19186;rinstance=c9c6c6ac2c62 8. ReTx 32000 INVITE sip:0@aaa.bbb.ccc.ddd:19186;rinstance=c9c6c6ac2c62 9. Cong Auto-congesting (timer) sip peer is local to *. The network did not go down in this case. (however it IS a softphone). Updated to SVN. Call Definitely came from Dial (in this case, the term/orig provider initated call to Asterisk and customer chose to dial by extension) By: drrt (drrt) 2007-05-22 16:37:50 2 oej. i dont use hangup extension in the dialplan. pgsql is my only cdr`s backend. By: bcoppens (bcoppens) 2007-05-23 05:34:08 Olle - I did sent you traces and logs by email (to 'oej@edvina.net') The following is still a problem case SIP INVITE from A -> ASTERISK SIP INVITE from ASTERISK -> B B not responding..... CANCEL from A -> ASTERISK By: Badalian Vyacheslav (slavon) 2007-05-23 08:57:46 New info. maybe intresting SVN-trunk-r65395 Subsctibe: asterisk*CLI> sip show history NTQxNGJiYWIyZjRhMzk2NzA2NjI0YWJlZmYyZjBmNDg. asterisk*CLI> * SIP Call 1. AuthChal Auth challenge sent for - nc 0 2. AuthChal Auth challenge sent for - nc 0 3. AuthChal Auth challenge sent for - nc 0 4. AuthChal Auth challenge sent for - nc 0 5. AuthChal Auth challenge sent for - nc 0 6. AuthChal Auth challenge sent for - nc 0 7. AuthChal Auth challenge sent for - nc 0 8. AuthChal Auth challenge sent for - nc 0 9. AuthChal Auth challenge sent for - nc 0 10. AuthChal Auth challenge sent for - nc 0 11. AuthChal Auth challenge sent for - nc 0 asterisk*CLI> sip show history 0f6a4b097bb812366839c8053483c0f3@87.255.0.218 asterisk*CLI> * SIP Call 1. ReTx 1000 INVITE sip:a74957976827@10.10.17.13:5060 SIP/2.0 2. ReTx 2000 INVITE sip:a74957976827@10.10.17.13:5060 SIP/2.0 3. ReTx 4000 INVITE sip:a74957976827@10.10.17.13:5060 SIP/2.0 4. ReTx 8000 INVITE sip:a74957976827@10.10.17.13:5060 SIP/2.0 5. ReTx 16000 INVITE sip:a74957976827@10.10.17.13:5060 SIP/2.0 Added file sip_show_channels2 By: drrt (drrt) 2007-05-25 04:04:06 the problem is still here for me. i wonder how can i show my debug files(about 200mb) to developers? By: Olle Johansson (oej) 2007-05-29 03:15:02 slavon: That's the weirdest call. We do challenge something repeatedly but get no incoming request? What started this? Can you try to debug a bit more to find out what's going on? By: Badalian Vyacheslav (slavon) 2007-05-29 03:36:33 oej how i can put debug info into file in realtime? I have produce system and can't stop service and do 'asterisk -Tvvvvvdddddngc | tee /tmp/verbosedebug.txt'. Can i setup "logger" to generate full debug file and send 1 day full log to you? I can give to you access to asterisk, but system must work 24/7 (access without experements;) ) By: Olle Johansson (oej) 2007-05-29 04:27:14 slavon: You can't give me any more information on that? Maybe access could help. Please contact me on IRC or via e-mail. Thanks. By: Olle Johansson (oej) 2007-05-29 04:28:26 slavon: Please confirm exactly which version of Asterisk you are using now. Also, if you are using any modules or patches outside of the general distribution, like asterisk-addons. Thanks. By: Leif Madsen (lmadsen) 2007-05-29 08:12:18 Here is a pastebin of a bunch of stuck calls: http://www.pastebin.ca/518836 Those channels hang around (looks like in a BYE state), and are never destroyed. This was done on revision 66244 (1.4 branch). FYI: 62005 works perfect for me, so whatever changed after that branch breaks things badly for me. By: Olle Johansson (oej) 2007-05-29 11:22:13 Ok, please test with latest 1.4 or trunk from the svn repositories, I've added a change that should fix an issue I saw on slavo's system and some extra logging. Again, make sure you enable sip history and show me the history of the calls that hang. Hopefully this fixed and issue where the CALLER sent a CANCEL and we sent a AST_HANGUP to the outbound INVITE, which failed to happen since we did not even have 100 trying on that channel. That means that the channel hung with a delayed BYE, waiting for at least 100 trying. We have also fixed an issue on jcmoore's system where we did resend BYE's forever since we did not get any replies. By: Olle Johansson (oej) 2007-05-29 14:16:48 Added another patch to not issue a BYE on a failed BYE transaction By: Curt Moore (jcmoore) 2007-05-29 21:50:13 oej: I'm still seeing the occasional hung BYE using SVN-1.4-r66538. If I use my patch, hung_bye.diff.txt, I do not get hung BYEs. I will concede that this patch is pretty much a brute force way of tearing down the BYEs and there is probably something else going on which is causing us to queue up a BYE to a BYE. Any thoughts on where I should look next? By: Olle Johansson (oej) 2007-05-30 01:25:59 jcmoore: I tried looking at the code for more similar hangups on bye failures. Please show me the SIP history, as always in this bug report. slavon: Please update to latest version and restart so we can see if I've fixed the issue I noticed on your system. Note: Will be travelling for a week, hopefully getting Internet access back early next week latest, if not in conference centers and hotels. Can't promise immediate reply during this period. This bug has consisted of several unrelated issues so far. Let's try to track down the rest of them. Thank you for your cooperation and a special thank you to all of you that have given me access to systems, even production systems. By: daniel g (revolution) 2007-05-30 07:28:28 Still seeing some hung channels here... though not nearly as many. This is actually a hung ACK... sip show history 1. NewChan Channel SIP/0-08327cc0 - from 129170fb24c30c5948d89b201 2. TxReqRel INVITE / 102 INVITE - -UNKNOWN- 3. Rx SIP/2.0 / 102 INVITE / 100 Trying 4. Rx SIP/2.0 / 102 INVITE / 180 Ringing 5. Rx SIP/2.0 / 102 INVITE / 200 OK 6. TxReq ACK / 102 ACK - -UNKNOWN- 7. Rx INVITE / 1 INVITE / sip:+0@aaa.bbb.ccc.ddd 8. Hold INVITE 9. ReInv Re-invite received 10. TxResp SIP/2.0 / 1 INVITE - 200 OK 11. Rx ACK / 1 ACK / sip:+0@aaa.bbb.ccc.ddd 12. Rx INVITE / 2 INVITE / sip:+0@aaa.bbb.ccc.ddd 13. Unhold INVITE 14. ReInv Re-invite received 15. TxResp SIP/2.0 / 2 INVITE - 200 OK 16. Rx INVITE / 3 INVITE / sip:+0@@aaa.bbb.ccc.ddd 17. TxResp SIP/2.0 / 3 INVITE - 491 Request Pending 18. Rx ACK / 3 ACK / sip:+0@aaa.bbb.ccc.ddd 19. Rx ACK / 3 ACK / sip:+0@aaa.bbb.ccc.ddd 20. SchedDestroy 32000 ms 21. CancelDestroy sip show channel * SIP Call Curr. trans. direction: Incoming Call-ID: 129170fb24c30c5948d89b2014181f8d@aaa.bbb.ccc.ddd Owner channel ID: <none> Our Codec Capability: 256 Non-Codec Capability (DTMF): 1 Their Codec Capability: 256 Joint Codec Capability: 256 Format: 0x0 (nothing) MaxCallBR: 384 kbps Theoretical Address: aaa.bbb.ccc.ddd:5060 Received Address: aaa.bbb.ccc.ddd:5060 SIP Transfer mode: open NAT Support: RFC3581 Audio IP: aaa.bbb.ccc.ddd (local) Our Tag: as6aca1e14 Their Tag: 1413123795 SIP User agent: Username: 0 Peername: 0 Original uri: sip:0@aaa.bbb.ccc.ddd:5060 Need Destroy: 0 Last Message: Rx: ACK Promiscuous Redir: No Route: sip:0@aaa.bbb.ccc.ddd:5060 DTMF Mode: rfc2833 SIP Options: (none) The only other hung channels are all in regards to the same peer (3 channels hung) and all look similar (i.e. same cause) 1. NewChan Channel SIP/0-083a1ea8 - from 4febe71f2f0d4b07080fe7646 2. TxReqRel INVITE / 102 INVITE - -UNKNOWN- 3. Rx SIP/2.0 / 102 INVITE / 100 Trying 4. Rx SIP/2.0 / 102 INVITE / 180 Ringing 5. Rx SIP/2.0 / 102 INVITE / 200 OK 6. TxReq ACK / 102 ACK - -UNKNOWN- 7. Rx INVITE / 65454 INVITE / sip:restricted@aaa.bbb.ccc.ddd 8. TxResp SIP/2.0 / 65454 INVITE - 488 Not acceptable here 9. Rx ACK / 65454 ACK / sip:restricted@aaa.bbb.ccc.ddd 10. Rx INVITE / 65455 INVITE / sip:restricted@aaa.bbb.ccc.ddd 11. TxResp SIP/2.0 / 65455 INVITE - 488 Not acceptable here 12. SchedDestroy 32000 ms 13. CancelDestroy Curr. trans. direction: Incoming Call-ID: 7d3b95ea27992d470fc5d82e738064e0@aaa.bbb.ccc.ddd Owner channel ID: <none> Our Codec Capability: 256 Non-Codec Capability (DTMF): 1 Their Codec Capability: 256 Joint Codec Capability: 256 Format: 0x0 (nothing) MaxCallBR: 384 kbps Theoretical Address: aaa.bbb.ccc.ddd:5060 Received Address: aaa.bbb.ccc.ddd:5060 SIP Transfer mode: open NAT Support: RFC3581 Audio IP: aaa.bbb.ccc.ddd (local) Our Tag: as1b8e6fd1 Their Tag: ed7814519c165226 SIP User agent: Username: 0 Peername: 0 Original uri: sip:0@aaa.bbb.ccc.ddd Need Destroy: 0 Last Message: Rx: INVITE Promiscuous Redir: No Route: sip:0@aaa.bbb.ccc.ddd DTMF Mode: rfc2833 SIP Options: replaces replace Interesting side-note: While I was getting all of this information together, I noticed two other hung ACKs; however, when I went to view the history on both of those channels -- they had already been cleared out. So obviously progress has been made and we're not too far from a resolution (imo). p.s. Asterisk SVN-branch-1.4-r66414 By: Leif Madsen (lmadsen) 2007-05-30 23:04:09 oej: some more history for you (this is what I believe to be a hung channel on revision 62005) http://www.pastebin.ca/524416 By: Olle Johansson (oej) 2007-05-31 05:04:09 revolution: these histories indicate an open call - there's no bye and a proper call setup. In order to have asterisk to destroy these, someone needs to send a bye or you need an absolute timeout or rtptimers - Asterisk can't destroy properly setup calls randomly By: Olle Johansson (oej) 2007-05-31 05:10:02 blitzrage: where's the SIP history for these calls? By: daniel g (revolution) 2007-05-31 07:45:07 oej -- in this case, a call was setup and kept open for almost 19 minutes, then it was terminated and cdr was written. Does Asterisk not wait to write CDR until after a BYE is received? SIP/0-08327cc0,Dial,SIP/0|20,1132,1125,ANSWERED,3,acctcode,0,0,0 sip show history 1. NewChan Channel SIP/0-08327cc0 - from 129170fb24c30c5948d89b201 2. TxReqRel INVITE / 102 INVITE - -UNKNOWN- 3. Rx SIP/2.0 / 102 INVITE / 100 Trying 4. Rx SIP/2.0 / 102 INVITE / 180 Ringing 5. Rx SIP/2.0 / 102 INVITE / 200 OK 6. TxReq ACK / 102 ACK - -UNKNOWN- 7. Rx INVITE / 1 INVITE / sip:+0@aaa.bbb.ccc.ddd 8. Hold INVITE 9. ReInv Re-invite received 10. TxResp SIP/2.0 / 1 INVITE - 200 OK 11. Rx ACK / 1 ACK / sip:+0@aaa.bbb.ccc.ddd 12. Rx INVITE / 2 INVITE / sip:+0@aaa.bbb.ccc.ddd 13. Unhold INVITE 14. ReInv Re-invite received 15. TxResp SIP/2.0 / 2 INVITE - 200 OK 16. Rx INVITE / 3 INVITE / sip:+0@@aaa.bbb.ccc.ddd 17. TxResp SIP/2.0 / 3 INVITE - 491 Request Pending 18. Rx ACK / 3 ACK / sip:+0@aaa.bbb.ccc.ddd 19. Rx ACK / 3 ACK / sip:+0@aaa.bbb.ccc.ddd 20. SchedDestroy 32000 ms 21. CancelDestroy Is it a possibility that * received the 'BYE' and acknowledged it to write CDR -- but not to close the channel? I will attempt to reproduce these errors and capture the relevant information if I can. By: Leif Madsen (lmadsen) 2007-05-31 08:24:07 Crap, sorry oej... it was really late, and I was talking to Josh, and those were the things he wanted. Oh ya! I did make a bad pastebin, and it happens to have the SIP history in it as well: http://www.pastebin.ca/524402 By: daniel g (revolution) 2007-05-31 10:51:58 oej... I have a capture of the cause of one of these hung channels (specifically it is the "488 Not acceptable here" which is caused by the ATA attempting to switch from 729 to ulaw in order to receive a fax and Asterisk having been set to disallow ulaw on this account). When the ATA is set to disallow ulaw and allow g729... all involved (Provider, ATA, Test Caller) request a BYE; however, * doesn't send BYE to the ATA (thus resulting in the hung channel I'd suppose?) -- could this be due to the "Not Acceptable Here" msg? When the ATA is set to allow ulaw and disallow g729... everything functions as expected (i.e. BYE is requested by and sent to all)... faxing is moderately successful. When ATA is set to allow both ulaw and g729... same results as a: ulaw and d: g729 (BYE requested and sent to all). So my only question is this -- why is * not sending a BYE after a "488 Not Acceptable Here" oej if you would like to see the captures... please let me know... By: Lucas Martinez (martlucas_gtt) 2007-06-13 09:27:05 I have the same problem sending call to a host not listening in 5060 port, is this solved in the svn trunk right now? By: Leif Madsen (lmadsen) 2007-06-13 09:42:08 martlucas_gtt: no, this is not resolved -- please attach as much debugging information as possible as requested from oej in the bug notes. Thanks! By: Russell Bryant (russell) 2007-06-13 11:20:13 Changing from "assigned" back to "new" since it's not actually assigned to anyone. I'm looking at this one now, too, though ... By: Russell Bryant (russell) 2007-06-13 15:06:43 jcmoore's hung_bye.diff.txt has been merged into 1.4 and trunk in revisions 69183 and 69187. The commit message: Move the logic for destroying a call when no response is received to a BYE outside of the block that checks for FLAG_FATAL to be set. This flag is only set when the packet is transmitted with the reliability set to XMIT_CRITICAL when the original packet is transmitted. A BYE is always sent with it set to XMIT_RELIABLE, meaning this code could never be encountered. This resulted in seeing some SIP channels that would never go away with the last packet sent being a BYE. (part of issue ASTERISK-8969, patch from jcmoore) By: Lucas Martinez (martlucas_gtt) 2007-06-13 15:26:11 Could we apply the "hung_bye.diff.txt" patch or we need to do other stuff more? This is talkin about the 1.4.4 stable version By: Leif Madsen (lmadsen) 2007-06-13 17:27:48 martlucas_gtt: uhhhhh... sure, go ahead and apply to 1.4.4. It may or may not apply depending on what has changed. By: Curt Moore (jcmoore) 2007-06-14 01:31:32 I've uploaded hung_cancel_log.txt which shows Asterisk sending a CANCEL after an INVITE times out; all is normal there. The odd thing is that the far end responds with a "200 canceling" and then sends a "180 Ringing"!?! This confuses Asterisk since upon the receipt of the "180 Ringing" we cancel the destroy of the channel. Since the destroy has been canceled and there are no further dialogs received, the SIP channel is hung forever after sending a CANCEL to which it received a "180 Ringing". Is there any scenario in which we would care about anything other than an ACK after we issue a CANCEL? I haven't quite figured out the best way to go about checking for this so I need some input. It would appear that in my hung_cancel_log.txt example that we sent a CANCEL in step 5 as a result of the scheduled destroy time expiring in step 4. This would seem to indicate that we were in sip_hangup(). It appears that we sent the CANCEL in the section of code with the comment "Send a new request: CANCEL". It would seem that correct thing would be to set SIP_PKT_IGNORE on the 180 which comes in after we've sent the CANCEL. This way we would not cancel the destroy in the 180 case of handle_response_invite() which was scheduled as a result of our sending of the CANCEL as happens in step 9. It would seem the proper place to set this flag would be in handle_request() but I'm not exactly sure of the logic to correctly do this. I hope this makes sense... By: Leif Madsen (lmadsen) 2007-06-14 16:30:37 I am able to reproduce a hung channel on my test system. I make a call to a remote Asterisk box via a DID number that goes through our softswitch. First call answers fine (DTMF doesn't work... but that's a different bug). Second call gets no audio, and when I hang up, gets hung. Topology: Polycom IP501 --> SER_1 (tampa) --> Asterisk (1.4, r69258) --> SER_1 (tampa) --> softswitch --> SER_2 (miami) --> Asterisk (1.4, r62005) --> MeetMe() I've attached 2 SIP history files: history (good call), and history (bad call). If you need more information, or access to the box, let me know as I can reproduce this now. By: Russell Bryant (russell) 2007-06-15 15:24:30 I just fixed the hung channel that blitzrage just reported in 1.4 in revision 69583. It was not a problem anymore in trunk. By: Russell Bryant (russell) 2007-06-15 18:08:19 1.4.5 has been released which contains fixes for a lot of the issues reported here. Everyone that is having any problems with this, please update to (at least) 1.4.5, or later code from svn and report back if you are still having problems. Then, we can address the remaining issues from there. Thanks! By: daniel g (revolution) 2007-06-18 00:05:42 All involved: I've updated to Ast 1.4.5 as of an hour ago on two boxes (one of which uses Queues to direct calls and the other of which uses Dial). I'll let both of these run until next Saturday before checking for lingering channels and updating everyone. Thanks to everyone for your work/help on this. By: drrt (drrt) 2007-06-21 16:42:35 moved my system to latest release(1.4.5). there is no hanging sip channels for about three days. the problem is solved at my point. many thanks to developers who fixed the bug. By: Leif Madsen (lmadsen) 2007-06-21 18:15:33 Upgraded to 1.4.5 on the day it was released in one of our production systems, and have yet to see a hung channel. So far so good. Will report back again if I see something. By: Ronald Chan (loloski) 2007-06-22 11:45:43 I will upgrade my system from SVN branch 1.2 to SVN 1.4 i hope and pray that this issue was finally rest to end :), since this is a production server with 30-40 mixed in/out bound SIP calls during peak time with queues. I will surely report whatever happen for the next couple of days. Huge thanks to everyone By: Bluce Ree (tasker) 2007-06-22 19:02:14 Ditto for me on 1.4.5. Systems are loaded here and no hung channels yet. By this time, i'd be forced to restart due to resource saturation. By: Andrey Solovyev (corruptor) 2007-06-23 04:35:02 After upgrade to 1.4.5 I don't have hung channels too. By: Ronald Chan (loloski) 2007-06-23 20:40:09 To All, Great! so far so good, after passing our peak time there are no more hung channel on ACK and INVITE :) Thanks again oej and the rest. By: Russell Bryant (russell) 2007-06-23 23:53:14 Thanks for the feedback, everyone. I am going to close this one out as fixed. If you have any further problems, please just create a new report. Thanks! |