[Home]

Summary:ASTERISK-06637: SIP channels hang in semi-active state
Reporter:Roy Sigurd Karlsbakk (rkarlsba)Labels:
Date Opened:2006-03-28 08:26:16.000-0600Date Closed:2007-02-07 10:59:46.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) oej-sip-2006-06-20.patch
( 1) res_agi_patch.txt
( 2) sipshowdownstuff.txt
Description:Hi

With a setup with ATAs <-sip-> asterisk sip/sip gw <-sip-> sip/pstn gws, we keep seeing hanging SIP channels on the sip/sip gw. Measuring the total channels on the pstn gateways and the total sip channels shows this 'leaks', just a few, 5-10 channels a day or so.

I'm unsure of how to debug this...

roy
Comments:By: Andrey S Pankov (casper) 2006-03-28 10:06:59.000-0600

Can you show at least 'sip show channels'?

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-03-28 10:33:24.000-0600

May I email you that? I don't know what to look for, and I do not want to disclose all that. sorry.

By: Andrey S Pankov (casper) 2006-03-28 11:06:11.000-0600

Wipe out IP addresses from the 1st column and past the output here (attach if the output is too large).

By: Olle Johansson (oej) 2006-03-28 19:03:30.000-0600

...and a "sip show channel <name>" for the hanging channel, possibly also sip history. Thanks.

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-03-31 05:22:05.000-0600

These 'hanging channels' show up as these

SIP/pstngw2-311d     (None)               Down    AppDial((Outgoing Line))      
SIP/1005781-5a5f     62424300@default:1   Ring    Dial(SIP/pstngw2/62424300)

See attached file sipshowdownstuff.txt for output from 'show channel ...' for these two

By: Olle Johansson (oej) 2006-03-31 17:54:46.000-0600

Hmm. I asked for "sip show channel" and got "show channel"....

:-)

By: Olle Johansson (oej) 2006-03-31 17:55:01.000-0600

Both would be excellent!

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-04-02 06:24:56

I can't see how sip show channels can be relevant when this one shows one hanging channel's details, that being AFAICS the channel pair is hanging on ast_waitfor_nandfds

By: Olle Johansson (oej) 2006-04-03 15:15:08

Roy, that it hangs there does not tell me anything on what happened before or how the SIp channel was configured...

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-04-11 05:10:42

anymore news here, olle?

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-04-13 11:04:05

we just had a crash outside working hours (bug 6936), and it seems this no longer happens after the crash. after the crash, however, there has been no app_queue activity since it's been night and national norwegian holiday, so it _could_ be something only happening when sip channels are queued up...

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-04-24 09:51:00

This is still happening at all times of the day
Are anyone woring on this?

By: Andrey S Pankov (casper) 2006-04-24 11:05:45

Can you upload your extensions.conf file please?

By: Janahan (jv) 2006-04-25 18:37:43

I'm seeing similar behavior. We're using fastAGI to answer a call and record voice. Once in a while the fastAGI request never returns even though the remote user has hung up.

I just applied the patch for 6647, I'm wondering if the root cause of this bug is related to 6647. Unfortunately because of the infrequency of this problem on our system I can't verify for sure.

Roy have you applied 6647 to your system?



By: Roy Sigurd Karlsbakk (rkarlsba) 2006-04-26 07:31:01

casper: extensions.conf shouldn't really be of any relevance. It's a rather massive AGI script that does it all.

However - we had some problems with a database server and the problem is quite less visible after upgrading the db server, so it looks like it's related to db timeouts etc.

By: Andrey S Pankov (casper) 2006-04-26 12:08:42

I mean there may be problems if you do not call Hangup() or use Busy() or Congestion() without a timeout.

By: Andrey S Pankov (casper) 2006-04-26 12:11:53

jv: 6647 is one of my fixes. I'm happy it helped you.. ;)

rkarlsba: it would be nice if when reporting a bug you describe the environment. It may be AGI related, not SIP...

By: Janahan (jv) 2006-04-27 16:48:29

I'm adding this note here because it might be the cause of rkarlsba's issue as well (he's using AGI).

In res/reg_agi.c, the function 'handle_recordfile' invokes 'ast_waitfor' with '-1' as the timeout even if a valid timeout value is passed in. Any ideas why this is?

In my setup, sometimes the SIP 'BYE' doesn't get delievered to me(poor interconnection). At this point the remote end stops sending RTP packets as well. This causes the channel to block forever because there's nothing to kick it out of 'ast_waitfor_nandfds'. While I agree that the root cause is really a problem on the remote end, I don't think it should cause the channel to hang.

One option would be to set the timeout in to ast_waitfor to what ever is passed in by the caller. But I'm not sure if there are any side effects. I tried this on my system and things work well.

P.S. thanks casper....:-)



By: Serge Vecher (serge-v) 2006-05-04 13:40:47

jv: can you please produce a patch with your change to be tried?

By: Joshua C. Colp (jcolp) 2006-05-08 10:44:13

You could also use rtptimeout so that the channel will go away if no RTP traffic is received.

By: Janahan (jv) 2006-05-08 14:14:15

sorry for the delay. I just uploaded the file. plese let me know if the format is incorrect.

joshnet: does a non-zero 'rtptimeout' override the the timeout value used inside of ast_waitfor_nandfds?

By: Andrey S Pankov (casper) 2006-05-08 14:27:12

vechers: if this is an AGI issue, could you please ask junky to comment on the issue (I can't post a reminder from here). He is a big AGI specialist...

By: Serge Vecher (serge-v) 2006-05-08 14:36:30

junky: please take a look at this patch here. Thanks.

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-05-09 03:07:12

joshnet: i'm using rtptimeout, but the leak's still there....
i'll try to see if the patch fixes the problem

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-05-10 07:31:18

* still leaks after applying the res_agi patch....

By: Andrey S Pankov (casper) 2006-05-15 14:52:25

Please add relation to ASTERISK-6911 and change category to AGI in both. Thanks!

By: Serge Vecher (serge-v) 2006-05-19 14:51:22

roy: what is the revision of 1.2 you have tried and did you narrow down the module that leaks memory?

By: Andrey S Pankov (casper) 2006-05-19 16:44:44

> did you narrow down the module that leaks memory?
nothing "leaks memory" here... there are hang channels in semi-active state.

By: Andrey S Pankov (casper) 2006-05-23 19:47:44

It seems like it is not SIP related at all... rather AGI related...
or related to a specific AGI script

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-05-24 08:26:02

In any case, this never happened on 1.0

By: Serge Vecher (serge-v) 2006-05-24 08:30:19

>In any case, this never happened on 1.0
Well, I don't think anyone at this point will try to narrow down the exact "breakage" point by examining daily cvs versions between 1.0 and 1.2. Unless you will.

By: Serge Vecher (serge-v) 2006-06-02 10:29:19

royk: can you please turn sip history on and post the log here. Also, please specify the latest version this is exhitibited on. Thanks.

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-06-19 04:20:22

Currently running 1.2.9.1
This channel is hung

sipgw1 *CLI> show channel SIP/pstngw2-8aba
-- General --
          Name: SIP/pstngw2-8aba
          Type: SIP
      UniqueID: 1150453510.22513
     Caller ID: 9593185
Caller ID Name: (N/A)
   DNID Digits: (N/A)
         State: Down (0)
         Rings: 0
  NativeFormat: 8
   WriteFormat: 8
    ReadFormat: 8
1st File Descriptor: 412
     Frames in: 9817
    Frames out: 100
Time to Hangup: 0
  Elapsed Time: N/A
 Direct Bridge: <none>
Indirect Bridge: <none>
--   PBX   --
       Context: default
     Extension:
      Priority: 1
   Call Group: 0
  Pickup Group: 0
   Application: AppDial
          Data: (Outgoing Line)
   Blocking in: ast_waitfor_nandfds
     Variables:
DIALEDPEERNUMBER=pstngw2/9593185
SIPCALLID=12f9df86240f26263b47cf355de0665f@213.160.242.5

sipgw1 *CLI> sip show history 12f9df86240f26263b47cf355de0665f@213.160.242.5
sipgw1 *CLI>
 * SIP Call
1. TxReqRel        INVITE / 102 INVITE
2. ReTx 2          INVITE sip:9593185@213.160.242.21 SIP/2.0
3. ReTx 4          INVITE sip:9593185@213.160.242.21 SIP/2.0
4. ReTx 8          INVITE sip:9593185@213.160.242.21 SIP/2.0
5. ReTx 16         INVITE sip:9593185@213.160.242.21 SIP/2.0
6. ReTx 32         INVITE sip:9593185@213.160.242.21 SIP/2.0
7. Rx              SIP/2.0 / 102 INVITE
8. CancelDestroy  
9. Rx              SIP/2.0 / 102 INVITE
10. CancelDestroy  
11. Rx              SIP/2.0 / 102 INVITE
12. CancelDestroy  
13. Rx              SIP/2.0 / 102 INVITE
14. CancelDestroy  
15. Rx              SIP/2.0 / 102 INVITE
16. CancelDestroy  
17. Unhold          SIP/2.0

By: Olle Johansson (oej) 2006-06-19 05:15:50

Why is it saying "Unhold" but not "hold"... Weird...

By: Olle Johansson (oej) 2006-06-20 05:35:13

Patched 1.2 and svn trunk in different ways. Please confirm whether this fixes your issue or not.

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-06-20 07:09:21

hi. after patching up with  oej-sip-2006-06-20.patch (attached), I still see hanging channels......

sipgw1 *CLI>
 * SIP Call
1. TxReqRel        INVITE / 102 INVITE
2. Rx              SIP/2.0 / 102 INVITE /100 Trying
3. CancelDestroy  
4. Rx              SIP/2.0 / 102 INVITE /183 Session Progress
5. CancelDestroy  
6. Unhold          SIP/2.0
7. Rx              SIP/2.0 / 102 INVITE /200 OK
8. CancelDestroy  
9. Unhold          SIP/2.0
10. TxReq           ACK / 102 ACK
11. Rx              SIP/2.0 / 102 INVITE /200 OK
12. CancelDestroy  
13. Unhold          SIP/2.0
14. TxReq           ACK / 102 ACK
15. Rx              SIP/2.0 / 102 INVITE /200 OK
16. CancelDestroy  
17. Unhold          SIP/2.0
18. TxReq           ACK / 102 ACK

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-06-22 11:31:33

Are these the same patches you sent me?
If not, may I have this as a separate patch, since I'd rather not upgrade production rig to something in svn....

thanks

roy

By: Olle Johansson (oej) 2006-06-22 12:47:53

The stuff I committed is the same as what I sent you for testing, apart from the patch that added a lot of history that only you have.

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-06-22 15:34:01

Then the effect is negative, or null

No change

Sorry about that

Same sort of sip history from the dead channels

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-07-07 09:12:14

What more can be done here to debug this further? As I've said, the system's in production, and a full SIP DEBUG will quite probably be impossible.

By: Serge Vecher (serge-v) 2006-08-01 08:15:11

does 7459 fix this issue as well?

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-08-01 09:43:40

Is there a patch available? I really don't want to run svn in production if I can help it, not even 1.2

roy

By: Serge Vecher (serge-v) 2006-08-01 16:19:21

the patch was committed directly to 1.2 and trunk. You can make your own patch from the following information http://lists.digium.com/pipermail/svn-commits/2006-July/015544.html

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-08-04 04:47:56

Patch applied yesterday. The bug usually takes a few days showing, so we'll have to wait a little.

roy

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-08-14 05:36:55

It might look like this is fixed now, so just close this for now. I'll reopen it if the error reappears.

thanks

roy

By: Serge Vecher (serge-v) 2006-08-14 09:54:42

presumably fixed by r38611 in 1.2. Please reopen if not the case.

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-08-30 08:56:26

This still seems to be a problem, guys

By: Serge Vecher (serge-v) 2006-09-05 11:05:09

interesting, because 7664 was fixed by 1.2.11 ...

By: Serge Vecher (serge-v) 2006-09-27 13:38:00

rkarlsba: what's the deal with this bug. An issue in 1.2.12.1?

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-09-27 19:35:10

Give me a week or so, and I'll report back. This bug takes a little time tracking

By: Olle Johansson (oej) 2006-10-29 10:40:55.000-0600

...and a week turned into a month.

REMINDER!

By: Olle Johansson (oej) 2006-10-29 10:41:20.000-0600

And please test with latest 1.2 in subversion. Thanks.

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-10-29 13:30:27.000-0600

reproduced with 1.2.12.1, yes, but it's a little bit harder now due to ASTERISK-7869 crashing the system regularly. Also, I really do not want to run svn on systems in production.

By: Olle Johansson (oej) 2006-11-24 02:09:18.000-0600

Any updates? Do you still see this on latest 1.2 trunk ?

By: Roy Sigurd Karlsbakk (rkarlsba) 2006-11-24 05:34:23.000-0600

I've seen it with the latest release, although not lately, since asterisk has been crashing regularly, and this takes some time to reproduce. Asterisk now looks stable, so I may see it again in a week's time or so

roy

By: Serge Vecher (serge-v) 2007-01-09 13:07:59.000-0600

please reopen if you are able to reproduce this with 1.2.14 without backported patches.

By: Roy Sigurd Karlsbakk (rkarlsba) 2007-01-09 15:26:46.000-0600

The backported patch (the jitterbuffer submitted to 1.4) does not relate to the issue - how could it?

Also, I obviously have no possibility to remove the jitterbuffer, as that involves losing customers

Have anyone else running a high-volume asterisk server tried monitoring over days/weeks with mrtg as described here? http://karlsbakk.net/asterisk/

By: Serge Vecher (serge-v) 2007-01-09 15:48:51.000-0600

Via a deductive analysis: since you are the only one seeing the issue and since you run a system with complex backported patches, the problem must be caused by the patches. On of the ways to verify that the patches are not the cause of the stuck channels would be to reproduce the problem on a system without the patches, but since you refuse to do that, this report itself is 'stuck' in the cycle of opening and reopening.

By: Roy Sigurd Karlsbakk (rkarlsba) 2007-01-10 13:19:24.000-0600

well, then just close it, and i'll just restart asterisk every now and then. we really cannot afford removing the jitterbuffer, and at the moment we can't upgrade to 1.4, not before it's been more thoroughly tested.

best regards

roy

By: Serge Vecher (serge-v) 2007-01-10 13:28:07.000-0600

Thanks. Please reopen the report if new information is available as per note 0057453.

By: Leif Madsen (lmadsen) 2007-02-01 07:42:26.000-0600

Reopened per request of 'burnproof' in IRC.

By: Ronald Chan (loloski) 2007-02-01 07:53:19.000-0600

Guys,

This is reproducible to me anytime, i'm using SVN-branch-1.2-r52954 with no external or unsupported patches just a plain asterisk, i will post a new data today or tommorow as soon as the call center start operating.

what's weird with this bugs was there's an RTP activity is going on here, even the softphone or hardware phone was offline, the tech guy on the provider is also wondering why we have 1 1/2 call duration on their CDR's and i said the maximum call should be last for 5 to 10 minutes / call.

As soon this bug emerge i will post new data here. for evaluation

Ronald

By: Ronald Chan (loloski) 2007-02-06 05:57:54.000-0600

Please close this for now, the problem doesn't resurface since i ask to reopen this, the only significant change i made from my end is changing our codec from gsm to ulaw since bandwidth is not a problem according to client.


sorry for wasting your time :)