[Home]

Summary:ASTERISK-01326: Asterisk Core dump with chan_h323
Reporter:nix (nix)Labels:
Date Opened:2004-04-01 17:26:23.000-0600Date Closed:2004-09-25 02:01:39
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) coredump.txt
Description:I can reliably crash chan_h323 and asterisk within 300-500 calls. A sample backtrace is below. Anyone who wishes is welcome to have access to the box to test it. I have setup a h323 call generator so I can reproduce the crash at will.
The box is SuSE 9.0 + updates with Asterisk CVS + chan_h323 + G.729 with a Digium E1 card.

****** ADDITIONAL INFORMATION ******

#0  0x40026052 in pthread_mutex_lock () from /lib/libpthread.so.0
#1  0x080599c7 in ast_read (chan=0x0) at channel.c:1048
#2  0x08062ab2 in ast_waitstream (c=0x0, breakon=0x403fd1fd "") at file.c:876
#3  0x403fcf5b in playback_exec (chan=0x96250b8, data=0xa49ff76c) at app_playback.c:81
#4  0x0806764b in pbx_exec (c=0x96250b8, app=0x8100200, data=0xa49ff76c, newstack=0) at pbx.c:396
ASTERISK-1  0x0806fdb4 in pbx_extension_helper (c=0x96250b8, context=0x9625210 "default", exten=0x9625304 "s", priority=1,
   callerid=0x91394b0 "XxxxxxxOriginationSrv <>", action=135266816) at pbx.c:1176
ASTERISK-2  0x08069694 in ast_pbx_run (c=0x96250b8) at pbx.c:1660
ASTERISK-3  0x08070271 in pbx_thread (data=0x3e8) at pbx.c:1885
ASTERISK-4  0x400250f0 in pthread_start_thread () from /lib/libpthread.so.0
ASTERISK-5  0x401cbc77 in clone () from /lib/libc.so.6
Comments:By: Mark Spencer (markster) 2004-04-02 02:48:14.000-0600

You cannot even remotely run 300 - 500 calls with G.729 on Asterisk on any machine I know of.  How are you doing that if you only have one E1 anyway?

By: Mark Spencer (markster) 2004-04-04 15:07:22

Can you provide any more information or allow su to login or similar?

By: darkthorn (darkthorn) 2004-04-07 19:03:56

I have the same bug. Accidently duped it though. Information from my system available http://bugs.digium.com/bug_view_page.php?bug_id=0001381.

By: jerjer (jerjer) 2004-04-07 20:13:51

Craig Southeren informs me the seg fault attached to bug 1381 must contain garbage data in the PDU.  His suggestion is to upgrade to the latest CVS of Open H.323 and PWlibb, but this is going to cause problems with the build process of the H.323 stuff in the h323 directory.

I will see if I can find time to update the Makefile, but i'm not sure when that will happen

By: nix (nix) 2004-04-08 11:39:47

Subject: h323 + asterisk.
From: Derek Smithies <derek@indranet.co.nz>
To: asterisk-dev@lists.digium.com
Date: 2004-04-08 14:00:53

Hi,
working with Peter Nixon, I have been endeavouring to make h323 calls to
asterisk. Sorry about this long bug report, but, well, it is long.

Following the instructions, asterisk was compiled with the correct version
of openh323.

On a second computer, ohphone (a simple H323 command line gui) ran.
With experimentation, I found that ohphone had to be set with fast start
off, h245 tunnelling off. it seemed sensible to add --gsmframes 1, so
ohphone would only put one gsm audio frame in each ethernet packet. This
is the same packing style as used by iax/asterisk.

Following Peters advice (Thanks) I set asterisk up to receive the incoming
h323 call, and play some demo audio. It was cool, you could make a call to
asterisk, enter 500 on the dtmf keypad, and get put through to digium.

Now, the problem I was looking at was reported earlier by Peter, of
Asterisk crashing after 200 calls. The very first warning messages were
from lines 537 and 873 of file.c, current cvs code.

Some more work.
With just one ohphone running
ohphone -n -q0 -Tf --gsmframes 1 --autorepeat 200000 --autodisconnect 1 ast
Which means,
  no h323 gk search,
  use quicknet card
  No h245 tunnelling, no fast start
  1 gsm audio frame per ethernet packet
  do the call 200000 times
  disconnect the call 0.1 seconds after the connection is made
  make a call to the box with ip address as found by dns lookup of ast

On average, it makes a call every 4 seconds.
There is a very short burst of sound heard at the ohpone end.
This sound is recoginisable as the start of the demonstration message.

Leave running for a couple of hours.
On the box which is running asterisk, run other programs (compiler etc) to
load it. After a while, (10 minutes or so), the first warning
messages are generated. The first warning messages are described above.
At about this point, no audio is heard at the ohphone end.

Some time later, ohphone dies, and asterisk is no longer taking h323 calls.

=====
Now, I am not running any asterisk hardware on the box with asterisk.
===========================================
To the logs..
h.323 trace 8
h.323 debug
With logging on, the error described above does not happen as quickly.

However, it seems to be that the chan_h323 code uses
  *openh323 to do all the h323 signalling.
  *asterisk to send, receive and process the h323 rtp packets
         (which each contain one gsm frame)
The h.323 side of things seems to be shutting down the call (and
associated data structures) before the rtp side of things has finished.
When this happens, things are not good.
When the rtp side shuts down first, all is fine..

Going back to the above comment, logging slows down the onset of problems.
Logging is primarily in the signalling code, so it will take longer for
the signalling code to execute. Thus, there is more time available for the
rtp side to exit at the end of the call. Yes, next test. Put logging in
the rtp side, slow rtp down, remove logging from signalling code, and look
for onset of error.

Comment?
Any one prepared to explain what actually happens, and then it can be
fixed/proven etc???

==
Oh, when the call starts up, you can get messages about the frame being in
the future, (sorry, did not make a note of this).

Derek.
--
Derek Smithies Ph.D.                           This PC runs pine on linux for email
IndraNet Technologies Ltd.                     If you find a virus apparently from me, it has
Email: derek@indranet.co.nz                    forged  the e-mail headers on someone else's machine
ph +64 3 365 6485                              Please do not notify me when (apparently) receiving a
Web: http://www.indranet-technologies.com/     windows virus from me......

By: nix (nix) 2004-04-08 11:42:13

Mark: Your tech support guys already have login access to both the machines in question. I spent 3 hours on an international call with them trying to work through this problem and when I didn't get anywhere I called Derek in.

By: nix (nix) 2004-04-08 11:45:46

Mark: To answer the first bugnote, I am only ever running 20 calls simultaneously. As you will see from Derek's work above, it is possible to crash asterisk with only ONE simultaneous call if you send enough calls in a row.

By: Mark Spencer (markster) 2004-04-12 11:06:49

Is it possible to run with valgrind?

By: Mark Spencer (markster) 2004-04-20 12:02:50

Can you try this test again with latest CVS head?

By: darkthorn (darkthorn) 2004-04-20 15:35:22

Don't bother testing this yet. There's some stuff broken right now. JerJer's working on it though.

By: casey0999 (casey0999) 2004-04-21 10:23:02

NEW INFORMATION:  I'm posting this because my BT is different, and consistent.  I realize that Jeremy's post-Janus s/w is broken too, so upon his advise I went back and still have these crashes.  I thought the different BT might be helpful. PS: why isn't this bug under the H.323 section?
--------------------------------------
Any load test of H.323 with more than 15 simultaneous calls crashes asterisk very quickly (within 60 seconds!!). Per Jeremy's advice, am running pre-Janus code (PW 1.5.2, H.323 1.12.2, asterisk CVS 4/17.)
Load test scenario: Asterisk System A generates voice calls on a PRI to a Cisco AS5300 which in turn routes calls to Asterisk system B using H.323 (G.711 ALaw). System B answers each H.323 call, simply plays an aLaw prompt of about 5 seconds, then hangs up. Works fine with a low number of simultaneous calls. Over 15 at the same time, crashes within 60 seconds! Same BT each time - see below.
------------------------
Last Error Was:
Apr 20 21:04:02 ERROR[1388100528]: chan_h323.c:1155 connection_made: Something is wrong: connection

Back Trace:
(gdb) bt
#0 connection_made (call_reference=5590275) at chan_h323.c:1158
#1 0x0054d8b5 in MyH323EndPoint::OnConnectionEstablished(H323Connection&, PString const&) (
this=0x87c03f8, connection=@0x894fe58, estCallToken=@0x894fe74) at ast_h323.cpp:335
#2 0x054df4bf in H323Connection::OnEstablished() ()
from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
#3 0x0550228a in H323Connection::InternalEstablishedConnectionCheck() ()
from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
#4 0x055006a4 in H323Connection::HandleControlData(PPER_Stream&) ()
from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
0000005 0x054dc082 in H323Connection::HandleTunnelPDU(H323SignalPDU*) ()
from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
0000006 0x054dbd83 in H323Connection::HandleSignalPDU(H323SignalPDU&) ()
from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
0000007 0x054d8e2f in H323Connection::HandleSignallingChannel() ()
from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
0000008 0x05565eb0 in H323Transport::HandleFirstSignallingChannelPDU() ()
from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
0000009 0x0555d28b in H225TransportThread::Main() ()
from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
0000010 0x012f1178 in PThread::PX_ThreadStart(void*) ()
from /usr/src/pwlib/lib/libpt_linux_x86_r.so.1.5.2
0000011 0x00b1079c in start_thread () from /lib/tls/libpthread.so.0
0000012 0x0095d27a in clone () from /lib/tls/libc.so.6

By: casey0999 (casey0999) 2004-04-21 11:20:29

Suggest changing the view status on this report to "Public" so people (like me) don't waste time re-reporting.  thanks

By: Mark Spencer (markster) 2004-04-28 02:31:56

Just to confirm -- the recent poll vs. select changes, they didn't help any did they?

By: twisted (twisted) 2004-05-02 01:31:06

Reminder sent to nix

Need a status update as perk markster's request..

By: nix (nix) 2004-05-02 07:51:16

A was waiting for Jeremy to finish his rewrite before I tested again. In any case I will be in Africa for the next 1 1/2 weeks and will not have a chance. I will test it again when I return. Darkthorn and numerous others also has the same problem so it should not be a problem to find other people to test it in the mean time...

By: zoa (zoa) 2004-05-16 06:04:22

according to kram, i also have a similar problem. (cvs-head 15 may).

backtrace posted.

By: zoa (zoa) 2004-05-18 07:33:45

i'm currently trying todays new cvs version with around 30 simultaneous calls, i'll increase the amount of calls until it breaks or i run out of calls :)

By: zoa (zoa) 2004-05-18 09:44:33

the backtraces i posted were pre janus releases, i'm now running on janus patch 2, will see what happens

By: zoa (zoa) 2004-05-18 10:10:03

my asterisk crashes with the firstincoming h323 call when i'm using janus 2

By: zoa (zoa) 2004-05-18 12:32:33

fixed in cvs by the almighty jj.

Be sure to use the exact openh323 version as found in the readme.

By: zoa (zoa) 2004-05-19 07:46:31

i just tried again with 40 persons, it locked again and only the h323 part, all sip calls went through just fine.

restart now and stop on CLI no longer worked.

I'm using debian, gcc-2.95, same pwlib and oh323 as LL cool JJ.

I'm using real traffic, tried both g729 and gsm, calls are coming from openphone, and are routed through a gatekeeper (gnugk).

By: darkthrn (darkthrn) 2004-05-20 16:02:48

Prelim testing looks good. First time I've run for an hour w/o core dump. I'll update as testing progresses.

By: zoa (zoa) 2004-05-21 03:10:17

i tried with 20 simultaneous calls yesterday, it hung again, several times.

Its a very strange deadlock as all iax2 and sip calls go through as normal, cli doesnt show any new calls being made, show channels doesnt show any channels that stay open.

cli still tells about calls that got hangup.

I saw jj made some new changes today, will try again.

By: Mark Spencer (markster) 2004-05-24 15:57:18

well?

By: zoa (zoa) 2004-05-25 04:22:15

didnt have any time yet to try again, is this solving it for you, nix ?

By: nix (nix) 2004-05-25 04:26:54

Unfortunately I was forced to pull out the Asterisk box and replace it with a Cisco due to the instability. I will try to test the code tonight, but I am quite busy right now and haven't had time for none esential testing :-(

By: zoa (zoa) 2004-05-25 08:05:56

meanwhile i tried to test it again, but had to stop it due to one way audio.
(new bug opened).

By: jerjer (jerjer) 2004-05-25 11:52:31

I cannot duplicate any one-way audio problems

By: darkthorn (darkthorn) 2004-05-25 13:22:21

Everything seems great. However, I've experienced the hung console bug that zoa experienced on only one of the seven servers that are running this code. As of yet, I haven't been near a console when it occured, and these are in production, so it was reset. The next time it happens, I should be able to get a backtrace.

By: darkthorn (darkthorn) 2004-05-25 22:26:29

Got another crash. This is 5/20 CVS-HEAD. Slight difference, dsp.c is from 3/30. This is a crash seperate from the console lock.

#0  ast_rtp_get_us (rtp=0x0, us=0x50399d00) at rtp.c:862
862             memcpy(us, &rtp->us, sizeof(rtp->us));
(gdb)
(gdb)
(gdb)
(gdb) bt
#0  ast_rtp_get_us (rtp=0x0, us=0x50399d00) at rtp.c:862
#1  0x41c941fc in create_connection (call_reference=1345953008)
   at chan_h323.c:952
#2  0x41c9aea7 in MyH323Connection::CreateRealTimeLogicalChannel(H323Capability const&, H323Channel::Directions, unsigned, H245_H2250LogicalChannelParameters const*) (this=0x85fd8e8, capability=@0x84378d0, dir=IsReceiver, sessionID=1)
   at ast_h323.cpp:679
#3  0x456f3e83 in H323RealTimeCapability::CreateChannel(H323Connection&, H323Channel::Directions, unsigned, H245_H2250LogicalChannelParameters const*) const ()
  from /usr/local/lib/libh323_linux_x86_r.so.1.12.2
#4  0x456d9509 in H323Connection::CreateLogicalChannel(H245_OpenLogicalChannel const&, int, unsigned&) () from /usr/local/lib/libh323_linux_x86_r.so.1.12.2
ASTERISK-1  0x456d1882 in H323Connection::OnReceivedSignalSetup(H323SignalPDU const&)
   () from /usr/local/lib/libh323_linux_x86_r.so.1.12.2
ASTERISK-2  0x41c9a656 in MyH323Connection::OnReceivedSignalSetup(H323SignalPDU const&)
   (this=0x85fd8e8, setupPDU=@0x5039b0b0) at ast_h323.cpp:557
ASTERISK-3  0x456d031b in H323Connection::HandleSignalPDU(H323SignalPDU&) ()
  from /usr/local/lib/libh323_linux_x86_r.so.1.12.2
ASTERISK-4  0x45716dc0 in H323Transport::HandleFirstSignallingChannelPDU() ()
  from /usr/local/lib/libh323_linux_x86_r.so.1.12.2
ASTERISK-5  0x45712f58 in H225TransportThread::Main() ()
  from /usr/local/lib/libh323_linux_x86_r.so.1.12.2
ASTERISK-6 0x41e5b3a0 in PThread::PX_ThreadStart(void*) ()
---Type <return> to continue, or q <return> to quit---
  from /usr/local/lib/libpt_linux_x86_r.so.1.5.2
ASTERISK-7 0x40024484 in start_thread () from /lib/tls/libpthread.so.0

By: Tilghman Lesher (tilghman) 2004-05-26 15:55:56

Another backtrace (two separate coredumps, exactly the same):

(gdb) bt
#0  0x43c4711f in PASN_OctetString::operator=(PBYTEArray const&) ()
  from /usr/src/pwlib/lib/libpt_linux_x86_r.so.1.5.2
#1  0x43c4f32e in PASN_OctetString::EncodeSubType(PASN_Object const&) ()
  from /usr/src/pwlib/lib/libpt_linux_x86_r.so.1.5.2
#2  0x443b34f1 in H323Connection::HandleTunnelPDU(H323SignalPDU*) ()
  from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
#3  0x443b79fe in H323Connection::SendSignalSetup(PString const&, H323TransportAddress const&) ()
  from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
#4  0x443bfc98 in H225CallThread::Main() () from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
ASTERISK-1  0x43d6c4a8 in PThread::PX_ThreadStart(void*) () from /usr/src/pwlib/lib/libpt_linux_x86_r.so.1.5.2
ASTERISK-2  0x40025811 in pthread_start_thread () from /lib/i686/libpthread.so.0

I'll try to get debug output on the next one.

By: Tilghman Lesher (tilghman) 2004-05-26 16:26:55

(gdb) bt
#0  0x43c4711f in PASN_OctetString::operator=(PBYTEArray const&) ()
  from /usr/src/pwlib/lib/libpt_linux_x86_r.so.1.5.2
#1  0x43c4f32e in PASN_OctetString::EncodeSubType(PASN_Object const&) ()
  from /usr/src/pwlib/lib/libpt_linux_x86_r.so.1.5.2
#2  0x443b34f1 in H323Connection::HandleTunnelPDU(H323SignalPDU*) ()
  from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
#3  0x443b79fe in H323Connection::SendSignalSetup(PString const&, H323TransportAddress const&) ()
  from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
#4  0x443bfc98 in H225CallThread::Main() () from /usr/src/openh323/lib/libh323_linux_x86_r.so.1.12.2
ASTERISK-1  0x43d6c4a8 in PThread::PX_ThreadStart(void*) () from /usr/src/pwlib/lib/libpt_linux_x86_r.so.1.5.2
ASTERISK-2  0x40025811 in pthread_start_thread () from /lib/i686/libpthread.so.0

Debug output:

   -- Executing Dial("Zap/74-1", "H323/xxxx01152646154xxxx@67.107.nnn.nnn") in new stack
May 26 20:11:19 DEBUG[2523159]: chan_h323.c:820 oh323_request: type=H323, format=4, data=xxx01152646154xxxx@67.107.nnn.nnn.
May 26 20:11:19 DEBUG[2523159]: chan_h323.c:869 oh323_request: Host: 67.107.nnn.nnn      Username: xxxx01152646154xxxx
May 26 20:11:19 DEBUG[2523159]: chan_h323.c:400 oh323_call: dest=xxxx01152646154xxxx@67.107.nnn.nnn, timeout=0.
   -- Called xxxx01152646154xxxx@67.107.nnn.nnn
CLI>
Disconnected from Asterisk server

By: nix (nix) 2004-06-11 12:58:00

I have just spent some time with the latest asterisk cvs and this bug is still very much present.. My h323 call generator can still kill chan_h323 in less than 20 calls :-(

edited on: 06-11-04 12:45

By: zoa (zoa) 2004-06-15 06:43:42

What generator are you using Nix?
Could all of you please tell us what distro you are using, + GCC version + kernel ?

I'm on 2.6.5 gcc 2.95 debian.

By: jerjer (jerjer) 2004-06-15 12:04:38

I'm on RedHat 9 (2.4.20-20.9smp) and gcc 3.2.2 and have ran chan_h323 thru 323Sim for 2 days straight doing inbound H.323 calls and another 2 days straight doing outbound H.323 calls and haven't had so much as a hickup.  I'm about ready to setup another test using a 5400 and real PRIs this time, I don't expect any trouble.

Some quick details I pulled from 323Sim:

Total calls processed:  35,502 (inbound) 36,821 (outbound)
Average call setup time:  1.58 sec (inbound)  1.32 sec (outbound)
Average call duration: 30 sec (inbound) 30 sec (outbound)

By: nix (nix) 2004-06-15 14:32:11

I wrote the call generator myself (actually it was initially designed and used to send voice alerts and/or advertising to a farm of cisco 5300 gateways). You can find the message program at http://sourceforge.net/projects/h323ac/
It places a call to a specified number and plays a wav file specified on the command line, then hangs up. I simply wrapped that is some perl code to spawn of X number processes (Where X is configurable and currently set to 10) and call a list of 10,000 numbers from my cdr database. chan_h323 ALWAYS crashes within 1500 calls and often in the first 20.
As crash can be caused with only ONE simultaneous call at once (although it takes more calls) by using "ohphone -n -q0 -Tf --gsmframes 1 --autorepeat 200000 --autodisconnect 1 ast" as derek's email states above.

For reference I have the same problem on both SuSE 9.0 and 9.1 (both stock + security updates) and I believe Derek uses a couple of different versions of RedHat. (My GCC on SuSE 9.1 is gcc-3.3.3-41, and kernel is 2.6.4)

Jeremy. I will be happy to take you up on your offer from a few months ago to crash your machine. Just give me the IP address and tell me what number(s) to call. I appologise but I was in Africa when you offered. I will be in another city all day tomorrow, but back tomorrow night around 11pm, (I am GMT+03) so I would be available late afternoon...

By: jerjer (jerjer) 2004-06-18 13:34:42

198.22.67.67

You can call toll-free numbers and extension 500 is miliwatt and 600 is echo.

By: jerjer (jerjer) 2004-06-18 13:36:19

and 999 is a meetme

By: jerjer (jerjer) 2004-06-18 13:38:52

700 dials out to the PSTN via TE410P to my switches milliwatt

By: jerjer (jerjer) 2004-06-18 13:39:57

maximum 46 channels on exten 700.... i'm done now :)


The ball is in your hands, nix

By: jerjer (jerjer) 2004-06-25 17:49:30

status?  Nix?

By: casey0999 (casey0999) 2004-06-25 19:16:25

Hi Jeremy-  Lots of people are reporting no audio now (see asterisk-users list). What information would be useful to you to debug this, ie: what level "trace" etc.  This used to work for me a month or two ago, but now I have the same problem when connecting via a Cisco 5300.

Thanks, Scott Stingel

By: jerjer (jerjer) 2004-06-25 20:44:45

I cannot duplicate any problems..I've tried 5300, Quintum, ohphone, Hammer H.323 call Generator, an H.323 call generator from RadCom and Nix's h323ac application. All worked 100% successfully for as long as I decided to let them run.

Even runing on my mini-itx box, so its not just my dual xeons that work.

Then at least the one guy on the list responded to himself saying he fixed his own problem.

By: casey0999 (casey0999) 2004-06-25 20:55:46

I think that guy reported fixing it by going back to the "Stable" branch version, which doesn't work for most of us.  I noticed that the highest level of trace provides huge amounts of data - just wanted to make sure that you need this level before I post it (as an attachment).  Note that handshaking seems to be working - there's just no audio when you listen - ie doesn't seem to be load related - happens on the very first (and all) calls.  Thanks, Scott

By: jerjer (jerjer) 2004-06-25 21:09:33

Think again:

"I recently had a problem with h.323 using an MVP110.  I found out the problem had to do with quickstart signalling.  I switched it off and everything worked as expected." -Daniel Freysinger

By: casey0999 (casey0999) 2004-06-25 22:55:57

Maybe that's it.  Is there no support for Quick Start?  Unfortunately, I have limited control on how my customer generates the calls from the Cisco 5300.  Thanks

By: Konstantin Prokazoff (oryx) 2004-06-26 05:54:50

Thanx casey & Jer. It's true, that different equipment (like DEFINITY PBX, maybe CISCOes, so on) don't serves (or partially serves) FastStart signalling.
Now all works fine, w'be glad if no more coredumps w'be produced by chan_h323. Audio works fine if in ast_h323.cpp MyH323EndPoint::CreateConnection, options w'be set to H323Connection::FastStartOptionDisable. Can we(your) implement option like for type=user/peer, type=alias and noFastStart=yes? Still testing, I have traffic under 10-30K calls/day.

edited on: 06-26-04 05:41

By: Brian West (bkw918) 2004-07-01 22:57:43

If this is still a problem comment on it please.

By: zoa (zoa) 2004-07-05 14:14:53

i found some time to test this again, i get no more deadlocks, on a show channels it now shows avoiding h323 deadlock.

However, after a while doing 1 call a second on average, 40 simultaneous calls (gsm passthru) i get one way audio after approx 15 minutes.

h323 -> iax2 -> te410p.

The h323 end can hear the called person, the called person doesnt hear a thing.

I increased the ulimit, no luck.

The weird thing is, when i stop making new calls i still get one way audio for a while, even with only 1 simultaneous call, but this automatically resolves itself after some minutes and everything works fine again. (without a restart!!)

Any suggestions on what this could be ?

Load on the server is 0,09 when doing 40 simultaneous calls btw.

By: zoa (zoa) 2004-07-07 07:25:03

looks like i was a bit too early with the hurray, the one way audio was only a 1 time problem.

It still locks up for me (at least the h323 part does).

By: zoa (zoa) 2004-07-07 09:54:41

jerjer, i have another idea,

Maybe you could post some binaries to be sure that this is not compiler related ?

Joachim.

By: twisted (twisted) 2004-07-23 20:46:54

I'm gonna go out on a limb here and say this is probally still a problem.. Anyone confirm?  

*Before* you confirm, make sure that:

1) you are running latest cvs
2) you are following the instructions for h323 EXACTLY
3) you are not trying to overload the box you are testing on.

Also, Zoa, since this is assigned to you, you are responsible for keeping this activity going.  Thanks!

By: zoa (zoa) 2004-07-25 10:29:35

i didnt check the latest changes to chan_h323 yet, but before that i was always running the latest version of chan_h323 + the correct version of openh323.

I am running on kernel 2.6.7 though, but i don't think this should affect anything...

I have no idea what else to do to try to resolve this bug... and everytime i try to deadlock the server with real traffic the callcenter managers want to kill me.

I only have one server available in this country for testing (new server arrived broken and now i have to ship it back to belgium for repair :/)

I'll try to test the latest changes to chan_h323 later this week.

By: Mark Spencer (markster) 2004-07-30 16:18:06

There's been a recent fix in CVS about h.323, can you guys see if it resolves your issues?

By: zoa (zoa) 2004-07-31 05:02:45

most certainly !

I was hoping all night yesterdays patch would fix this.
Will try to test it this afternoon.

Joachim.

By: zoa (zoa) 2004-07-31 08:00:24

i just tried with around 15 simultaneous connections, didnt deadlock.

but

After 1,5 hours, i had 52 iax2 connections, but only 15 people were calling with h323.

also i got some warnings like these:

Jul 31 15:40:09 WARNING[2310193]: channel.c:489 ast_channel_walk_locked: Avoided deadlock for 'H323/ip$10.5.0.4:52580/14670', 10 retries!


My setup is openphone -> * -> iax2 -> * -> E1

By: zoa (zoa) 2004-07-31 08:09:39

Oké, it just deadlocked anyway...

Maybe this is something related:

I get this warning every time i do a "show channels" now:
Jul 31 15:50:51 WARNING[2510871]: channel.c:489 ast_channel_walk_locked: Avoided deadlock for 'H323/ip$10.5.0.4:52580/14670', 10 retries!

By: jerjer (jerjer) 2004-08-01 16:11:47

Zoa, I think you have bigger problems.  

I acquired a single processor 2ghz P4 machine for another project, which has stalled, so I decided to test chan_h323 with it. This box has been constantly processing a maximum of 50 simulatenous inbound and outbound H.323 calls, with random setup times and call durations as well as doing transcoding for 5 days straight and counting.

I have not had a single deadlock, strange behavor, leaked file handle or warning message like you reported.  I simply cannot break it, using a reasonable load for the box.

test call flow:

hardware call generator --(T-1 PRI)--> 5300 ---(H.323)--> Asterisk  
and Asterisk ---(H.323)--> 5300 -> (T-1 PRI) Calling a test number on our switch


Then just now, for kicks, I used SecureCRT's scripting abilities to send 'show channels' to the asterisk cli once every 5 seconds and so far I have not received that warning message (yes warnings are enabled in logger.conf). I tried asterisk -rx 'show channels' in a quicky command line shell script, but more often than not I recieved short results, which is for another bug :)

By: zoa (zoa) 2004-08-02 02:51:01

hehe, yeah i know about that -rx patch :)

i'm still puzzled what could cause this :(
(BTW: those 52 open connections were not caused by chan_h323 but were due to a problem on the receiving asterisk server... i saw conferencing errors, although i didnt use conferencing.

I will also try to connect to it with a callgenerator.

Jerjer, could you try to use your callgenerator to to callgenerator to as5300 -> * -> iax2 -> * ?

Meanwhile i'm still waiting for a replacement raid card for my new server... geeh its fun to work in a country where you even have to order a power cable :/

By: Brian West (bkw918) 2004-08-22 23:17:54

/me runs and SCREAMS h.323 NO HOPE NO HOPE NO HOPE.. god save us from it!!!  :P

</sillyness>

is this still an issue?

By: zoa (zoa) 2004-08-24 13:05:13

Yeah this still happens and is quite easy to duplicate,

Looks like we narrowed it down to multiple call setups on a very short period of time.

No idea how this could get resolved.

When using 50 simultaneous channels with average call time around 20 seconds or so, the statistical distribution of the calls might cause several call setups in 1 second, then no calls for 20 seconds...

By: jerjer (jerjer) 2004-09-21 14:43:48

I am marking this bug as fixed.  Please test latest changes and re-open this bug, if necessary.