|Summary:||ASTERISK-21040: Deadlock involving chan_sip.c, pbx.c and autoservice.c, locking on chan and &conclock|
|Reporter:||Andrew Nowrot (andrutto)||Labels:|
|Date Opened:||2013-02-06 04:13:05.000-0600||Date Closed:||2013-11-10 20:30:57.000-0600|
|Status:||Closed/Complete||Components:||Channels/chan_sip/General Core/PBX PBX/pbx_realtime|
|Versions:||11.2.0 11.2.1||Frequency of|
|Environment:||Linux Deban wheezy, kernel 3.2.34, x86_64 GNU/Linux||Attachments:||( 0) backtrace-threads.txt|
( 1) core-show-locks.txt
|Description:||Occasionally Asterisk is being deadlock (no calling response, no invite/registers thru sip). Sometimes it works for a week and sometimes only for several hours. The system load, CPU, RAM are fine.|
|Comments:||By: Rusty Newton (rnewton) 2013-02-14 19:39:45.506-0600|
Thanks. What else can you tell us about the system? What channel types are being used? Outbound/Inbound only? Etc.
Is this a really high volume scenario?
Could you provide an Asterisk full log excerpt with VERBOSE and DEBUG at level 5 showing right when the deadlock occurs?
By: Andrew Nowrot (andrutto) 2013-02-15 04:54:02.445-0600
It is linux Debian, running on Intel platform. Average load is between 0.02 and 0,1. Maximum 5 simultaneous calls. So it is not high volume system. I am using only SIP channels in both ways (inbound/outbound). Next time it will occur I will send some logs. As for now it works for six days, so my guess is that it can happen any time.
By: Modulus (modulus) 2013-02-25 10:41:08.652-0600
We came into a similar situation (deadlock with no calling response, no invite/registers thru sip) on a Asterisk 10.12.1 system with Linux Debian Wheezy.
Since this happened on a production system we had to restart immediately, without taking any backtraces.
This happened only once (after 8 hours of Asterisk running) and until now (2-3 days since the event) it has not occurred again.
Now, we have written a script that checks the logs for SIP registrations, and if there are no any of them for some time, it will run gdb to take backtraces, and restart Asterisk.
We are using SIP channels, as well as Local channels.
In the time of the event, there were some active calls that started to hangup after a while, except for one which stuck indefinitely (until restart) into the system. However, according to the CDRs of our upstream provider, that particular call should have ended at 16 secs after being answered, exactly at the time that sip dialogues stopped coming.
By reproducing that particular call after the restart, 'core show channels' showed two SIP channels and four Local channels as expected, while, just before the restart, 'core show channels' showed only two SIP and one Local channels (the other three local channels were missing).
We are thinking that a deadlock happened at the time that the stuck call should have hangup, which maybe is related to the local channels associated with that call. Perhaps it would be useful to know if local channels are also used in Andrew Nowrot's case.
We will come back with backtraces, when that happens again.
By: Rusty Newton (rnewton) 2013-02-25 18:23:13.508-0600
Are either of you using Asterisk Realtime and if so, in what way, and with what backend?
Are you using realtime for dialplan?
If you are using realtime, do you have a way to attempt reproduction without realtime in the mix?
By: Modulus (modulus) 2013-02-25 18:48:42.375-0600
We are using realtime for sip users and peers with mysql backend:
sipusers => mysql,general,sip_buddies
sippeers => mysql,general,sip_buddies
but not for dialplan (extensions.conf is a static file).
Also we are currently using rtcachefriends=yes option in sip.conf
By: Andrew Nowrot (andrutto) 2013-02-26 02:56:24.478-0600
We are using realtime for sip and for extensions with postgresql backend.
sipusers => pgsql,asteriskdb,sip
sippeers => pgsql,asteriskdb,sip
extensions => pgsql,asteriskdb,extensions
rtcachefriends=yes option is set in sip.conf
System works now for 17 days and not causing any problems.
By: Modulus (modulus) 2013-03-02 06:11:06.286-0600
Finally, after one week, our installation (Asterisk 10.12.1) had a new deadlock.
We attach the backtraces.
If our deadlock is irrelevant with the current thread, please feel free to split it.
By: Modulus (modulus) 2013-03-07 03:09:29.677-0600
After analyzing our backtrace, it seems that it is a fax gateway problem.
I removed our backtrace from the current thread, not to be confusing.
We will open a new thread about the bug we found.
By: Dare Awktane (awktane) 2013-04-22 00:33:22.077-0500
Related to ASTERISK-21228 ?
By: Matt Jordan (mjordan) 2013-11-10 20:30:57.175-0600
Closing out as a duplicate of ASTERISK-21228. Since that issue has received more traffic, this issue will be tracked there. Thanks!