[Home]

Summary:ASTERISK-19799: Apparent deadlock between ODBC Queue log and ODBC CDR
Reporter:anonymouz666 (anonymouz666)Labels:
Date Opened:2012-04-26 13:40:48Date Closed:2018-01-02 08:30:32.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_queue CDR/cdr_odbc
Versions:1.8.11.0 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:CentOS 5.5 - 64-bitAttachments:( 0) backends-output.txt
( 1) backtrace-iax2-deadlock-26-04-2012.txt
( 2) cdr_adaptive_odbc.txt
( 3) cdr.txt
( 4) CLI-output.txt
( 5) core-show-channels-26-04-2012.txt
( 6) extconfig.txt
( 7) iax2-show-channels-26-04-2012.txt
( 8) queue.txt
( 9) res_odbc.txt
(10) sip-show-channels-26-04-2012.txt
Description:The scenario is just simple but with a high demand (about 200 active calls).
The call arrives in Queue(). App_queue calls a Local channel to check if a member is local registered, if so, then Dial(SIP/agent). If is remote registered, then call through IAX2 to another box (Dial((IAX2/user:topsecret@10.10.10.70/agent)).

Occasionally, we have seen this machine deadlocking.

Looking at the CLI logs, this time the issue happened in this order:

The local members (SIP) started to print this message:

[Apr 26 08:34:12] VERBOSE[5062] pbx.c:     -- Executing [7306@MemberConnector:5] ExecIf("Local/7306@MemberConnector-34c9;2", "1?Dial(SIP/7306)") in new stack
[Apr 26 08:34:12] NOTICE[5062] chan_sip.c: Call to peer '7306' rejected due to usage limit of 1
[Apr 26 08:34:12] VERBOSE[5062] app_dial.c:     -- Couldn't call SIP/7306
[Apr 26 08:34:12] VERBOSE[5062] app_dial.c:   == Everyone is busy/congested at this time (0:0/0/0)

We use call-limit=1 in all peers, but as soon as the sip channels got stuck (listed in sip show channels) the app_queue was not able to delivery calls anymore to devices that is still "active".
We saw a lot of stuck channels using the command "sip show channels".

And then after running more with the problem above, the IAX2 members (remote) started to print this message:

ExecIf("Local/7424@MemberConnector-f367;2", "1?Dial(IAX2/user:topsecret@10.10.10.70/7424,,tTwW)") in new stack
[Apr 26 10:53:13] ERROR[2407]: chan_iax2.c:2384 peercnt_add: maxcallnumber limit of 2048 for 10.10.10.70 has been reached!
[Apr 26 10:53:13] WARNING[2407]: chan_iax2.c:12127 iax2_request: Unable to create call
[Apr 26 10:53:13] WARNING[2407]: app_dial.c:2218 dial_exec_full: Unable to create channel of type 'IAX2' (cause 34 - Circuit/channel congestion)

And lots of IAX2 channels stuck in "iax2 show channels" - that's why the limit of 2048 was reached.

After that, the app_queue completely stopped delivering calls.

Attached there is a backtrace (can't run in debug_threads mode), core show channels, sip show channels and iax2 show channels.

Comments:By: anonymouz666 (anonymouz666) 2012-04-26 13:48:39.827-0500

Attached files.

By: anonymouz666 (anonymouz666) 2012-04-26 13:52:11.609-0500

Since it happened with IAX2 and SIP I don't think that is something related to these channel drivers but something that is common to both.
Kobaz also comment on #asterisk-dev that "looks like maybe a deadlock in bridge and app_queue both using the database"

By: Matt Jordan (mjordan) 2012-05-01 08:40:19.497-0500

Can you attach the log that the snippets above were taken from?

Without a core show locks this may be difficult to resolve.  I understand if you can't get that due to the performance hit - but if you can, it would help a lot.

By: anonymouz666 (anonymouz666) 2012-05-02 13:13:03.409-0500

Attached the CLI output. I tried to make the debug easier for you. I added comments through the CLI log. Looking at the SIP issue, I had an impression that has something to do with creating and cancelling SIP transactions (DND button), but this seems to affect also IAX2 so makes me thing about something common to both channel drivers. Let me know if it helps - I know pretty well that "core show locks" helps a lot, but in this system I already tried to enable and stopped the whole PBX.

By: Matt Jordan (mjordan) 2012-05-21 13:55:56.321-0500

Do you think you can supply your realtime, app_queue, and cdr configuration files?

By: anonymouz666 (anonymouz666) 2012-05-21 14:48:03.927-0500

No problem. Here it is confs attached. If I missed something, please let me know.

By: anonymouz666 (anonymouz666) 2012-06-04 13:47:43.004-0500

I know, I didn't provide any reliable information like 'core show locks' output, but if there's a patch to try, I would be glad to test. Thanks.

By: Joshua C. Colp (jcolp) 2017-12-19 05:57:30.323-0600

Have you experienced this problem under a recent supported version of Asterisk? There's been quite a lot of changes to all of these modules involved, including from a locking perspective.

By: Asterisk Team (asteriskteam) 2018-01-02 08:30:32.769-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].
[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines