[Home]

Summary:ASTERISK-30313: threadpool: Control taskprocessor is blocked waiting for idle thread to terminate
Reporter:Mark Murawski (kobaz)Labels:
Date Opened:2022-11-14 12:19:37.000-0600Date Closed:
Priority:MajorRegression?
Status:Open/NewComponents:Core/General
Versions:18.14.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Attachments:( 0) core-asterisk-2022-11-19T15-12-39Z-brief.txt
( 1) core-asterisk-2022-11-19T15-12-39Z-full.txt
( 2) core-asterisk-2022-11-19T15-12-39Z-info.txt
( 3) core-asterisk-2022-11-19T15-12-39Z-locks.txt
( 4) core-asterisk-2022-11-19T15-12-39Z-thread1.txt
( 5) full-core-debug.log
( 6) pjsip_wizard.conf
( 7) pjsip.conf
Description:Unfortunately very random
This can happen when there's:
0 calls
1 call
20 calls
100 calls

I can't find a pattern to this yet.

Attached core dumps
Comments:By: Asterisk Team (asteriskteam) 2022-11-14 12:19:43.392-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Joshua C. Colp (jcolp) 2022-11-14 12:22:26.365-0600

And how is configuration done?

By: Mark Murawski (kobaz) 2022-11-14 12:25:19.232-0600

Using static pjsip configs

Attached

By: Mark Murawski (kobaz) 2022-11-14 12:29:32.395-0600

About 2000 endpoints.  Examples attached


By: Joshua C. Colp (jcolp) 2022-11-14 12:34:47.348-0600

The threadpool manager is blocked waiting on a thread to exit, what that thread is I don't know because it's optimized out. You can workaround this by setting a fixed threadpool size for PJSIP.

By: Mark Murawski (kobaz) 2022-11-14 12:35:42.064-0600

core dump is generated from a SIGQUIT when one of my health checks fails.
The health check will try and make a pjsip call, if the call fails to start after 5 attempts then asterisk gets a SIGQUIT and is restarted


By: Mark Murawski (kobaz) 2022-11-14 12:36:40.784-0600

Thanks, I'll rebuild without optimize and have core debug going as well.

By: Mark Murawski (kobaz) 2022-11-14 12:42:52.463-0600

core debug 5

By: Mark Murawski (kobaz) 2022-11-14 14:08:39.804-0600

attached backtrace with DONT_OPTIMIZE

By: Joshua C. Colp (jcolp) 2022-11-16 05:01:33.197-0600

The attached full log is missing messages from "threadpool.c" to indicate what was precisely going on with the threadpool and its state changes including idle threads being destroyed. This is needed in order to understand the specific circumstances leading to the threadpool blocking on a thread.

By: Mark Murawski (kobaz) 2022-11-23 19:54:36.844-0600

Start to finish on core debug, plus new matching core

By: Mark Murawski (kobaz) 2022-11-23 19:57:55.289-0600

Also.. this is running with

[system]
threadpool_initial_size = 25

I'm not sure if this is what you were referring to with setting the threadpool size.



By: Joshua C. Colp (jcolp) 2022-11-24 03:21:11.578-0600

It wasn't, though part of it. The threadpool by default will terminate idle threads - which is what is happening in your issue. Setting the idle timeout to 0 should disable such functionality, and thus the threads won't go away, and the issue won't occur.

By: Mark Murawski (kobaz) 2022-11-24 08:08:09.065-0600

Okay, so based on your update to the title, and the idle timeout setting (threadpool_idle_timeout).  

If one were to go about fixing this issue.  It sounds like there's one or more threads that are idle, but are not ending.  Are they really idle?  Maybe they were marked idle incorrectly and are still doing something?

Probably more logging needs to be added to resolve this?  There's only information regarding idle threads and destroying them. We don't have startups and pending waits.  It doesn't seem obvious which thread is being waited on that's blocking pjsip (maybe DEBUG_THREADS?)

I'll do some poking here.

P.S.  Also... should it really block on waiting for an idle thread for doing something as critical as responding and sending PJSIP traffic?  Is there a drawback to just allocating a new thread when needed?  I suppose it could run into hitting max threads, but then... at some point eventually the truly idle threads will end?

By: Joshua C. Colp (jcolp) 2022-11-24 08:59:20.109-0600

PJSIP is not DIRECTLY blocked on a thread. It is blocked waiting for synchronous tasks to complete. These aren't completing because the thread which manages all of this is blocked waiting on an idle thread to go away.

You need to look at and learn the threadpool implementation before you can make suggestions or come up with ideas.

By: Mark Murawski (kobaz) 2022-11-24 09:04:12.572-0600

Prior to diving in I was looking for some overview, and I just got it from your last comment.  Obviously, yeah before any specific changes can be made the design needs to be understood, but since this is my first rodeo with threadpool, this is a good starting position.

Thanks.