[Home]

Summary:ASTERISK-05826: Channels being apparently locked by chan_h323
Reporter:Isamar Maia (isamar)Labels:
Date Opened:2005-12-12 23:26:55.000-0600Date Closed:2011-06-07 14:10:14
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_h323
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk-3.JPG
( 1) DEADLOCKS_H323_1.txt
( 2) h322_debug.txt
( 3) mvts-cap.JPG
Description:Channels are being apparently locked by chan_h323.
We have this Asterisk server as PSTN outbound gateway and MVTS server delivering h323 calls to it.
Please, check the image files attached and see what is happening.
There 7 active channels for 13 calls

****** ADDITIONAL INFORMATION ******

Using:
* Asterisk 1.2.0
* Pwlib and Openh323 version as indicated by README

If more information is necessary, please just let me know.
Comments:By: Jason Parker (jparker) 2005-12-12 23:40:08.000-0600

I don't know where to start with this one.

1) That is only a warning - it avoided a deadlock.
2) I don't believe this is major (see 1) - please read the bug guidelines at http://www.digium.com/index.php?menu=bugguidelines
3) Please don't post images when text logs will do (asterisk-3.JPG could have easily been copied and pasted from the terminal) - it is very difficult for some browsers to view them from the tracker.

By: lesnet (lesnet) 2005-12-16 00:28:39.000-0600

I also experience this issue.

I have 48 E&M Wink trunks (two T1s) on a Digium 4 Port T1 Card.

Setup: Calls arrive H323, and go out E&M trunks.
H323 -> Asterisk -> E&M Trunks

After anywhere from 1 hour to 1 day, I get 1 channel "locked".  This happens after a H323 call hangs up.  The H323 call shows in CDR as completed, but the Zap/ Channel stays active (off-hook)

After... 4 days, I had 9 "locked" channels.. (ie: Zap/G1 started dialing out on Zap/39-1 instead of Zap/48-1)

A "stop now" will not shut down the box (but it will drop any active calls)

I have to kill -9 asterisk and restart it pretty much daily.

Asterisk CVS head from 2005-09-01 with intermediate updates right to Asterisk 1.2,
using openh323_v1_17_2,
using pwlib_v1_9_1

Build was totally fresh, removed all old libraries/.so etc, did make clean on everything, make opt on pwlib,h323,chan_h323.

By: Sean (c0w) 2005-12-16 02:52:07.000-0600

I don;t think this channel driver is very active. if you want stability as of now i would recommend you check out ooh323c out of the asterisk-addons. this was the reason why i changed to the ooh323c, also the load of the box was reduced massively due to the fact its no used openh323

By: Isamar Maia (isamar) 2005-12-17 04:00:43.000-0600

Lesnet,
Thanks a lot for you fantastic description.
It is exactly what is happening with us too.

Cow,
The fact is that this channel AFAIK is the official
H323 channel bundled with the main Asterisk Package.

Anyway, I have tried also chan_ooh323 and it gave me
one way audio issues.

By: Tilghman Lesher (tilghman) 2005-12-17 20:33:37.000-0600

When you do a 'show channels', it appears that all of the channels are shown and summary information appears.  Only afterwards is there logging information.  What that tells me is that during the traversal of the channel list, there was no lock on any of the channels (otherwise, the list would be truncated and the logging messages would appear before the summary information).

I'd have to see your extensions.conf, zapata.conf, and h323.conf to know what's going on.  Please upload those 3 text files so we can make a determination of the problem.

By: lesnet (lesnet) 2005-12-17 23:37:48.000-0600

Since I am experiencing what appears to be the exact same problem, I've uploaded those files for you to review.

It is a very simple config, h323 peer, and a simple 1 line dial based on the number that the peer sends.

I did try the asterisk-addons ooh323, however the channel deadlocks appeared instantly after 1 call, versus 12-24 hours later as they usually do, so I have reverted back to the stock h323 driver.

One thing to note.. is that I am running this same setup on a H323->SIP box, and I have never had a channel deadlock in a non-Zaptel environment.

By: Cesar (otulio) 2006-01-01 19:42:00.000-0600

I am using ooh323 and I am presenting the same issue. I haven’t figure out what triggers the problem, everything seems to be working fine and then I get channel locks.
I have a PRI with 23 channels and when the problem appears even thought I have free pstn channels the call cannot be completed. If I don’t reset asterisk my 23 pstn channels will become busy as people try to call and hang-up



By: Grigoriy Puzankin (boroda) 2006-01-10 05:25:56.000-0600

The same issue, occurs several times per day. I'm using Asterisk in the similar way:
E1 -> Cisco5300 -> H323 -> CiscoCM -> H323 -> Asterisk 1.2.1 -> Zap TE110P (E1)

I did full logging and figured out that sometimes Asterisk does nothing upon  receiving H.323 RELEASE (according to h.323 originator), even logger does not show it. No CDR records are logged for such calls. Later (when "terminator" hangs up) Zap channel reports "got hangup request" and "got hangup". PRI debug shows DISCONNECT, RELEASE and RELEASE COMPLETE (with case 102 [timer expired after RELEASE] and protocol error 6) from Zap channel, but even now the channel is not free. Doing "show channel" for Zap channel displays "no blocking" info (otherwise it is "ast_waitfor_nandfds").

Unfortunately with "h.323 debug" Asterisk doesn't run for a long time, so I don't have h.323 logs for such calls.

h323: openh323_v1_17_2, pwlib_v1_9_1

Since this morning I switched to SIP instead of H.323. Let's see if it helps. But using H.323 is more preferable since it routes voice directly to another end point (Cisco5300) without MTP on CiscoCM.

Let me know, if you need detailed information.



By: Grigoriy Puzankin (boroda) 2006-03-09 01:53:25.000-0600

Any news?

I recompiled Asterisk with DETECT_DEADLOCKS option. As soon as the channel get locked the following pair of messages appear in log. They are repeated about 100 times for every fifth second.

Mar  8 23:54:05 ERROR[14275] ../include/asterisk/lock.h: chan_h323.c line 620 (oh323_write): Deadlock? waited 5 sec for mutex '&pvt->lock'?                                                                      
Mar  8 23:54:05 ERROR[14275] ../include/asterisk/lock.h: chan_h323.c line 857 (find_call_locked): '&pvt->lock' was locked here.                                                                                  

Function find_call_locked is called from 11 different functions in chan_h323.c. How to find the function called find_call_locked and not released the lock?

By: Grigoriy Puzankin (boroda) 2006-03-09 03:46:35.000-0600

There're also another locks:

Mar  8 23:54:05 ERROR[14276] include/asterisk/lock.h: channel.c line 624 (ast_queue_frame): Deadlock? waited 5 sec for mutex '&chan->lock'?                                                                      
Mar  8 23:54:05 ERROR[14276] include/asterisk/lock.h: channel.c line 2193 (ast_write): '&chan->lock' was locked here.

By: jerjer (jerjer) 2006-04-17 13:29:59

I just spent 2 hours eyeballing chan_h323.c.  The bug and thus the solution is not obvious to me at all.

I need to know more about exactly what asterisk is doing when this happens.

GDB Attach to a running, but deadlocked asterisk system.

gdb asterisk <pid>

(gdb) info thread apply all bt

By: jerjer (jerjer) 2006-04-17 13:31:01

maybe just  thread apply all bt

By: Objective Systems (objsys) 2006-04-17 15:27:49

Hi otulio,
What version of ooh323c/asterisk channel driver you are using?

If you are using older version, Can you try with 1.2.2 version, it contains many deadlock fixes.

Regards,
Avin Patel

By: Bluce Ree (tasker) 2006-04-24 02:17:58

Under heavy load, chan_h323 will stop responding altogether after a short period even though only a small number of locked channels are reported to the screen (sometimes as little as 1 or 2). Processing about 5-10 calls per second with an average load of about 150+ channels will kill this driver in about 5-10 minutes.

chan_ooh323 version 0.05 from Objective's website does not show any locked channels but it, too, will completely stop responding under heavy load. Processing about 5-10 calls per second with an average load of about 150+ channels will kill this driver in about 10 minutes. Turning on debug reveals nothing and /var/log/asterisk/h323_log only shows normal messages until it stops responding. I had to revert back to version 0.04 from asterisk-addons-1.2.2 to get better reliability.

By: Objective Systems (objsys) 2006-04-24 08:28:21

Hi Tasker,
If you think you are facing deadlocks for asterisk-ooh323c-0.5, than you will have to recompile the asterisk & asterisk-addons.

For asterisk:
--------------
Goto asterisk directory, than run:
make clean
make "DEBUG_THREADS=-DDUMP_SCHEDULER -DDEBUG_SCHEDULER -DDEBUG_THREADS -DDETECT_DEADLOCKS"
make install


For asterisk-addons:
--------------------
Compile the with following option
make clean
make debugthread
su
make install

Run the scenario again, Run asterisk with "-vvvc" option, On deadlock or jam condition, send us the following 3 datas:
/var/log/asterisk/h323_log
/var/log/asterisk/messages
And messages printed on CLI prompt.

Regards,
Avin Patel

By: Bluce Ree (tasker) 2006-04-24 14:16:51

Unfortunately, this is happening in a production environment where it can't be done at the moment.

To reproduce the problem, send a high volume of continuous calls (5-10 per second) into Asterisk via H.323 and play a 10 second voice file, during which you send a few DTMF tones, then hang up. It will stop responding within 10 minutes.

Asterisk is still very functional during this period, before and after ooh323 stops responding and the SIP calls continue to be serviced without a hitch. If I don't reset the box, SIP keeps going. Unloading and reloading chan_ooh323.so causes the whole system to stop taking calls (SIP included).

As soon as I get a chance I will try to set up a box for testing in the same fashion but it may not be for some time.

By: Objective Systems (objsys) 2006-04-27 08:53:43

Hi tasker,
I couldn't create this situtation. So I will have to wait till you provide us this   logs.

Regards,
Avin Patel

By: Claus Holt Futtrup (clausf) 2006-04-27 09:41:35

I have exactely the same problem, not sure when or why it happens..

By: Bluce Ree (tasker) 2006-04-28 10:45:20

Looks like clausf may have found the problem in bug ASTERISK-6987041. I will try his modification and see what happens.

By: Bluce Ree (tasker) 2006-04-29 11:08:48

clausf's patch didn't fix the issue. Still dies after 10-20 minutes.

By: Claus Holt Futtrup (clausf) 2006-05-04 04:30:55

I was wondering if the locking occurs when asterisk tries to disconnect the channels due to lack of rtp? Has anyone looked into this?

By: Objective Systems (objsys) 2006-05-05 14:15:39

Hi,
I am trying to look at it, But I am waiting for logs from Tasker.

Without logs, I can't do anything.

Regards,
Avin Patel
Objective Systems, Inc

By: Serge Vecher (serge-v) 2006-05-05 14:27:10

ok, I've reread the bug notes here and observe two unrelated bugs:

1) a problem with chan_h323 as reported by isamar, lesnet and boroda.
2) a problem with chan_ooh323 as reported by tasker and clausf.

I suggest that tasker, clausf or avin open a new bug for chan_ooh323 problems separately.

By: Serge Vecher (serge-v) 2006-05-07 16:04:15

is the original problem with chan_h323 still there?

By: Claus Holt Futtrup (clausf) 2006-05-08 00:19:54

Actually I have problems with both :-). The things I have written here are concerning chan_h323

By: Objective Systems (objsys) 2006-05-08 13:04:19

Hi Tasker, Clausf,
We can create a new bug with log informations, instead of posting data here.

Regards,
Avin Patel
Objective Systems, Inc.

By: Serge Vecher (serge-v) 2006-05-08 13:09:44

clausf: since you have a problem with chan_h323, can you produce some relevant debug information illustrating it?

By: Claus Holt Futtrup (clausf) 2006-05-08 15:06:10

I'll try..

I've had a test system running with asterisk 1.2.7.1 without any problems and a production system running excately the same configuration but with locked channels. The reason I asked about the RTP timeout issue is that my logs report the following:

May  8 21:58:07 NOTICE[5523] chan_sip.c: Disconnecting call 'SIP/88389809-7145' for lack of RTP activity in 84452 seconds
May  8 21:58:07 NOTICE[5523] chan_sip.c: Disconnecting call 'SIP/88389809-69dd' for lack of RTP activity in 85946 seconds
May  8 21:58:07 NOTICE[5523] chan_sip.c: Disconnecting call 'SIP/88389809-be57' for lack of RTP activity in 87321 seconds
May  8 21:58:07 NOTICE[5523] chan_sip.c: Disconnecting call 'SIP/88389809-8b4d' for lack of RTP activity in 91395 seconds
May  8 21:58:07 NOTICE[5523] chan_sip.c: Disconnecting call 'SIP/88389809-5dd7' for lack of RTP activity in 92541 seconds

All of these calls are created from a client connecting with
h323 -> some scripts -> sip channel

In CLI -> show channels report the following:

May  8 22:01:01 WARNING[22641]: channel.c:790 channel_find_locked: Avoided deadlock for '0x1584c74c', 10 retries!

Now the reason asterisk is not able to disconnect the calls could be one of two things..

1) Asterisk is not able to terminate it and this results in locks (strange but could be).
2) The H323 channels are locked and asterisk is not able to traverse the list of channels and is therefore not able to terminate them.

Please tell me more about the information you need.... This problem occurs within an a few hours after restart and the amount of locked channels continue to grow.. (No acutally it doesn't grow, it seems to be only one channels that's locked)

39 active channels
43 of 255 max active calls (16.86% of capacity)

39 channels to 43 calls... Hmmmm



By: Max Litvinov (mthawk) 2006-05-16 02:00:48

Hi all!
There is the same problem!

I have Asterisk 1.2.5, and the same warning appear after a few hours of * work by "show channesl" command.
For example
asterisk*CLI> show channels
Channel Location State Application(Data)
0 active channels
1 active calls
Apr 17 17:30:40 WARNING[10930]: channel.c:787 channel_find_locked: Avoided deadlock for '0x82e9060', 10 retries!

System ASP Linux 9.0 (RedHat based)
The same warning was at asterisk 1.2.4 and 1.2.6.
But 1.0.9 at this machine with the same system has no problems at all.

The packets i used:
Asterisk 1.2.4; pwlib 1.9.2; openh323 1.17.3
Asterisk 1.2.4; pwlib 1.10.0; openh323 1.18.0
The same with 1.2.6 and 1.2.5
Now i have * 1.2.5 and pwlib 1.9.0 and openh323 1.17.1

kernel 2.4.20-9asp
glibc 2.3.2-27.9.1asp

As i monitored - this issue appears whith calls h323 -> * -> SIP; h323 -> * -> h323...



By: Serge Vecher (serge-v) 2006-05-16 08:39:43

mthawk, can you please review the deadlock section of http://www.voip-info.org/tiki-index.php?page=Asterisk%20debugging and produce some debugging information?

By: Andrey S Pankov (casper) 2006-05-16 09:36:40

vechers: I doubt a regular user is able to "produce some debugging information".
Do you imagine the complexity of pwlib/openh323?

By: Serge Vecher (serge-v) 2006-05-16 09:52:48

casper: a person working with Asterisk is not a "regular user," in my opinion. Can you judge a person's technical aptitude by looking at a single post? I can't -- that is why I ask questions.

I have to remind you that you have been asked several times to keep the discussion within the bug tracker to that of bug only, which includes not discussing activities of a bug marshall. If you can help with diagnosis or patches, like you do with many other bugs -- wonderful, please keep doing it -- everyone apreciates that. If you have a question about what I am doing or have comments or feedback, please use direct email -- mine is svecher@ant1spam.gmail.com.

Thanks for your understanding and contributing to Asterisk!

By: Claus Holt Futtrup (clausf) 2006-05-17 01:40:10

I'll try and produce some debugging information for you. I can get the locks in an instant and actually kill asterisk as well after 4-5 minutes.

By: Claus Holt Futtrup (clausf) 2006-05-25 17:12:37

I need some help to generate the information you need.. Please tell me what I should do..

By: Serge Vecher (serge-v) 2006-05-26 15:23:46

clausf: I'm no expert myself, but I think the following webpage should get you started. Please review the deadlock section:
http://www.voip-info.org/tiki-index.php?page=Asterisk%20debugging [^]

Or join the #asterisk-dev channel and request help there from a developer, who can access your machine and do diagnosis remotely.

Thanks and good luck.

By: Serge Vecher (serge-v) 2006-06-12 19:32:42

Suspending this issue until somebody can figure out how to debug this problem in the latest stable or provide a patch to fix it. Please find a bug marshall on asterisk-bugs if you are ready to reopen this bug. Thanks