Summary: | ASTERISK-27909: cdr: Deadlock with submit_scheduled_batch and submit_unscheduled_batch | ||||
Reporter: | Denis Lebedev (coredumped) | Labels: | |||
Date Opened: | 2018-06-08 07:39:36 | Date Closed: | 2018-07-02 06:41:36 | ||
Priority: | Minor | Regression? | |||
Status: | Closed/Complete | Components: | CDR/General | ||
Versions: | 15.4.0 | Frequency of Occurrence | Occasional | ||
Related Issues: |
| ||||
Environment: | CentOS Linux 7 (Core) Linux *** 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Asterisk versions: 15.4.0 | Attachments: | ( 0) gdb.txt | ||
Description: | We faced with deadlock in cdr.c in functions:
{noformat} static int submit_scheduled_batch(const void *data) static void submit_unscheduled_batch(void) {noformat} Previously there was another deadlock which was fixed in ASTERISK-21162. That task added pretty the same mutex {{cdr_sched_lock}} on which asterisk is stucked in deadlock in consequent versions. The problem is quite rare so it's almost impossible to reproduce it under some artificial circumstances. Symptoms: * asterisk stops to flush cdr items into DB * pings to cdr are performed in 5s (afaiu, they are timed out) {noformat} *CLI> core ping taskprocessor subm:cdr_engine-00000003 pinging subm:cdr_engine-00000003 ... subm:cdr_engine-00000003 ping time: 5.000129 sec {noformat} * asterisk begins to "eat" memory on the host under load * but proceeds serving incoming calls traffic Also asterisk can't be restarted from cli. | ||||
Comments: | By: Asterisk Team (asteriskteam) 2018-06-08 07:39:38.325-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Denis Lebedev (coredumped) 2018-06-08 07:43:25.291-0500 Threads states attached in gdb.txt By: Matthew Fredrickson (mattf) 2018-06-21 13:13:27.814-0500 Hey Denis, I just pushed up a review for this issue that I think should resolve the deadlock. Do you think you can try it out? It's at https://gerrit.asterisk.org/#/c/9270/ Thanks, Matthew Fredrickson By: Denis Lebedev (coredumped) 2018-06-26 10:10:10.155-0500 Matthew, hi! Thanks for the fix! Unfortunately we don't have some sane environment for call traffic testing. As I understand you'll (possibly) perform some changes after review by @Richard Mudgett. Could you please clarify which version (tag name) will contain this fix (rough estimation is enough)? By: Richard Mudgett (rmudgett) 2018-06-26 11:05:47.773-0500 The fix will go into the 13, 15, and master branches. The next 15 release will be 15.5.0 which is due to be cut in a couple weeks. If the patch is merged before the next release is cut then it will be in that release. Otherwise it will be in the one following. By: Denis Lebedev (coredumped) 2018-06-26 15:31:44.294-0500 Thanks a lot guys! Well done! We waiting for the fix in 15.5.0 :) By: Friendly Automation (friendly-automation) 2018-07-02 06:41:38.354-0500 Change 9316 merged by Jenkins2: main/cdr.c: Alleviate CDR deadlock [https://gerrit.asterisk.org/9316|https://gerrit.asterisk.org/9316] By: Friendly Automation (friendly-automation) 2018-07-02 06:49:54.428-0500 Change 9270 merged by Jenkins2: main/cdr.c: Alleviate CDR deadlock [https://gerrit.asterisk.org/9270|https://gerrit.asterisk.org/9270] By: Friendly Automation (friendly-automation) 2018-07-02 06:55:32.147-0500 Change 9317 merged by Joshua Colp: main/cdr.c: Alleviate CDR deadlock [https://gerrit.asterisk.org/9317|https://gerrit.asterisk.org/9317] |