[Home]

Summary:ASTERISK-29893: deadlock during bridge
Reporter:Pascal Cadotte Michaud (pc-m)Labels:
Date Opened:2022-02-01 14:06:05.000-0600Date Closed:2022-02-01 14:29:28.000-0600
Priority:MinorRegression?
Status:Closed/CompleteComponents:Bridges/bridge_builtin_features
Versions:18.9.0 Frequency of
Occurrence
Occasional
Related
Issues:
duplicatesASTERISK-29821 Deadlock in bridge_channel_internal_join() on local channels.
Environment:Attachments:
Description:With Asterisk 18.9.0 we have a dead lock that happened twice in 2 days.

Given an incoming call from the operator
Given a phone with a call forward
When Dial the user's phone
Then there's a deadlock

After some debugging I've found that 2 threads were waiting in bridge_channel_internal_join. Each of the has a bridge_channel variable with a chan and it's peer referencing each other.

After locking it's bridge and it ->chan both thread try to lock the peer which is already locked by the other thread.

I think this bug has been introduced here https://gerrit.asterisk.org/c/asterisk/+/17543/2/main/bridge_channel.c#2861
Comments:By: Asterisk Team (asteriskteam) 2022-02-01 14:06:06.368-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Pascal Cadotte Michaud (pc-m) 2022-02-01 14:13:56.782-0600

I have some back traces. The line numbers might be off since I did not reproduce without patches before reporting the issue.
```
[Switching to thread 115 (Thread 0x7f1beb5a0700 (LWP 2404533))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
103 in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007f1bedfd87d1 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f1bd48616b0)
   at ../nptl/pthread_mutex_lock.c:115
#2  0x000056105271b298 in __ast_pthread_mutex_lock (filename=filename@entry=0x56105285aff9 "bridge_channel.c",
   lineno=lineno@entry=2885,
   func=func@entry=0x56105285c0f0 <__PRETTY_FUNCTION__.19010> "bridge_channel_internal_join",
   mutex_name=mutex_name@entry=0x561052870b51 "peer", t=t@entry=0x7f1bd48616b0) at lock.c:326
#3  0x00005610526715da in __ao2_lock (user_data=user_data@entry=0x7f1bd48616f0,
   lock_how=lock_how@entry=AO2_LOCK_REQ_MUTEX, file=file@entry=0x56105285aff9 "bridge_channel.c",
   func=func@entry=0x56105285c0f0 <__PRETTY_FUNCTION__.19010> "bridge_channel_internal_join",
   line=line@entry=2885, var=var@entry=0x561052870b51 "peer") at astobj2.c:241
#4  0x000056105269a251 in bridge_channel_internal_join (bridge_channel=bridge_channel@entry=0x7f1bd4694f90)
   at bridge_channel.c:2885
#5  0x00005610526877ad in bridge_channel_ind_thread (data=data@entry=0x7f1bd4694f90) at bridge.c:1823
#6  0x00005610527c58bf in dummy_start (data=<optimized out>) at utils.c:1572
#7  0x00007f1bedfd5fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#8  0x00007f1bedbac4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

[Switching to thread 113 (Thread 0x7f1b61d54700 (LWP 2404526))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
103 in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007f1bedfd87d1 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f1bd43ac350)
   at ../nptl/pthread_mutex_lock.c:115
#2  0x000056105271b298 in __ast_pthread_mutex_lock (filename=filename@entry=0x56105285aff9 "bridge_channel.c",
   lineno=lineno@entry=2885,
   func=func@entry=0x56105285c0f0 <__PRETTY_FUNCTION__.19010> "bridge_channel_internal_join",
   mutex_name=mutex_name@entry=0x561052870b51 "peer", t=t@entry=0x7f1bd43ac350) at lock.c:326
#3  0x00005610526715da in __ao2_lock (user_data=user_data@entry=0x7f1bd43ac390,
   lock_how=lock_how@entry=AO2_LOCK_REQ_MUTEX, file=file@entry=0x56105285aff9 "bridge_channel.c",
   func=func@entry=0x56105285c0f0 <__PRETTY_FUNCTION__.19010> "bridge_channel_internal_join",
   line=line@entry=2885, var=var@entry=0x561052870b51 "peer") at astobj2.c:241
#4  0x000056105269a251 in bridge_channel_internal_join (bridge_channel=bridge_channel@entry=0x7f1bd802a980)
   at bridge_channel.c:2885
#5  0x0000561052683795 in ast_bridge_join (bridge=bridge@entry=0x7f1bd804e170, chan=chan@entry=0x7f1bd48616f0,
   swap=swap@entry=0x0, features=features@entry=0x7f1b61d4fcd0, tech_args=tech_args@entry=0x0,
   flags=flags@entry=(AST_BRIDGE_JOIN_PASS_REFERENCE | AST_BRIDGE_JOIN_INHIBIT_JOIN_COLP)) at bridge.c:1751
#6  0x00005610528080d2 in ast_bridge_call_with_flags (chan=chan@entry=0x7f1bd48616f0,
   peer=peer@entry=0x7f1bd8034430, config=config@entry=0x7f1b61d50020, flags=flags@entry=0) at features.c:687
#7  0x0000561052808277 in ast_bridge_call (chan=chan@entry=0x7f1bd48616f0, peer=peer@entry=0x7f1bd8034430,
   config=config@entry=0x7f1b61d50020) at features.c:733
#8  0x00007f1be9c38269 in dial_exec_full (chan=0x7f1bd48616f0, data=<optimized out>,
   peerflags=peerflags@entry=0x7f1b61d50ab0, continue_exec=continue_exec@entry=0x0) at app_dial.c:3331
#9  0x00007f1be9c39066 in dial_exec (chan=<optimized out>, data=<optimized out>) at app_dial.c:3399
#10 0x0000561052749262 in pbx_exec (c=c@entry=0x7f1bd48616f0, app=app@entry=0x5610567bed60,
   data=data@entry=0x7f1b61d51c00 "PJSIP/<NUMBER>@<OPERATOR>_trunk_c5545e66-d51b-431b-912d-584bf5ab13d7_49851,,o(<NUMBER>)b(wazo-pre-dial-hooks^s^1)") at ./include/asterisk/strings.h:67
#11 0x000056105273da7f in pbx_extension_helper (c=c@entry=0x7f1bd48616f0, context=0x7f1bd48620b0 "outcall",
   exten=exten@entry=0x7f1bd4862100 "dial", priority=priority@entry=8, label=label@entry=0x0,
   callerid=callerid@entry=0x7f1bd42ce2c0 "<CID>", action=E_SPAWN, found=0x7f1b61d53cac,
   combined_find_spawn=1, con=0x0) at pbx.c:2947
#12 0x000056105273f733 in ast_spawn_extension (combined_find_spawn=1, found=0x7f1b61d53cac,
   callerid=0x7f1bd42ce2c0 "<CID>", priority=8, exten=0x7f1bd4862100 "dial", context=<optimized out>,
   c=0x7f1bd48616f0) at pbx.c:4206
#13 __ast_pbx_run (c=c@entry=0x7f1bd48616f0, args=args@entry=0x0) at pbx.c:4380
#14 0x0000561052740e0b in pbx_thread (data=data@entry=0x7f1bd48616f0) at pbx.c:4704
#15 0x00005610527c58bf in dummy_start (data=<optimized out>) at utils.c:1572
#16 0x00007f1bedfd5fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#17 0x00007f1bedbac4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

By: Joshua C. Colp (jcolp) 2022-02-01 14:29:29.043-0600

This is a duplicate of ASTERISK-29821 which has already been fixed, release candidates will be coming soon.