Summary:ASTERISK-11258: iax2 deadlock?
Reporter:Mostyn Bramley-Moore (mostyn)Labels:
Date Opened:2008-01-17 19:23:45.000-0600Date Closed:2008-02-15 16:33:11.000-0600
Versions:Frequency of
Environment:Attachments:( 0) debug.lock1.txt
Description:We have an asterisk 1.4.17 machine that appears to trigger an IAX deadlock several times per day.  We have recently increased the number of clients using the machine, the issue did not appear when the load was lower.  At the moment have about 160 IAX peer/user pairs (of which about 130 are online) and 1000 SIP peer/user pairs (of which about 575 are online).  We are attempting to migrate from an asterisk 1.2 installation, which handled this load OK.  

When the issue occurs, "iax2 show peers" gradually shows more and more peers as UNKNOWN.  At the same time, SIP user/peers also become more lagged.  CPU load jumps from ~20% to above 100% (SMP machine). The asterisk console had a message saying the following:

NOTICE[10020]: chan_iax2.c:6699 socket_read: Out of idle IAX2 threads for I/O, pausing!

If we issue an "iax2 reload", it does not seem to help and "iax2 show peers" does not show any output at that point.  We have to restart asterisk to get going again.  

We attempted to increase iaxthreadcount and iaxmaxthreadcount in iax.conf, but they appear to max out at 256 even if the iax.conf settings are higher.  

Based on suggestions by russellb on the irc channel, we changed MAX_PEER_BUCKET from 1 to 563 in chan_iax2.c and recompiled, with thread debugging enabled.  We are still experiencing the issue.  I will attach the output of "core show locks" from when the issue is occuring.  

CPU: dual intel xeon 5160 3GHz
OS: debian 4.0r2 amd64
Comments:By: pj (pj) 2008-01-18 01:16:11.000-0600

I had similar problem, like: Out of idle IAX2 threads for I/O, pausing!, see my bugreport: http://bugs.digium.com/view.php?id=11550#76392
but because nobody interesed with issue, I was migrated all my iax trunks from iax to sip and completelly removed iax module from asterisk. now it working fine, except some memory leak issue (I'm using asterisk trunk).

By: Russell Bryant (russell) 2008-01-19 15:49:25.000-0600

In an email, you indicated that you had moved your production systems to Asterisk 1.2.  Please let me know if you are able to reproduce this error on a test system.  It may be helpful for me to log in and look at the situation with gdb.

Another thing you can do to provide potentially useful debug information is to run Asterisk under valgrind.  Only do this in a test environment, though, as valgrind slows things down a lot.

See doc/valgrind.txt for more information.

By: Mostyn Bramley-Moore (mostyn) 2008-01-19 18:01:25.000-0600

I rolled back to asterisk 1.2 on an different machine yesterday, so the asterisk 1.4.17 machine is free for stress testing.  However, I'm not sure how to generate enough test clients/calls to reproduce the error, do you know of any good IAX call generators, or should I just setup another asterisk machine?

By: Russell Bryant (russell) 2008-02-15 16:33:11.000-0600

I'm going to close this out as "unable to reproduce".  If you are able to replicate this on a test server and would like to pursue it, just let me know.