[Home]

Summary:ASTERISK-11791: Asterisk becomes unresponsive after locking 1.4.19
Reporter:Doug (doug)Labels:
Date Opened:2008-04-07 09:26:41Date Closed:2008-05-27 11:42:50
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 12376_chanspy_uid.diff
( 1) backtrace.txt
( 2) backtrace1.txt
( 3) core_show_locks.txt
( 4) core_show_locks-1.txt
( 5) core_show_locks-2.txt
( 6) LockinfoCN.rtf
Description:We are runing a high load system using 1.4.19 and have times where asterisk becomes completely unresponsive and will not allow any incoming or outgoing calls to or from the system. After a "core show channels" there is no end channel count and the CLI will become completely unresponsive. I have attached "core show locks" output.

Comments:By: Joshua C. Colp (jcolp) 2008-04-07 11:56:09

What are you testing exactly? What applications are being executed? I saw audiohook usage, are you recording calls or spying? How are you initiating calls? What type of calls?

By: Mark Michelson (mmichelson) 2008-04-07 12:29:19

I'm not 100% sure based on the core show locks output, but this may be the same deadlock that was reported in issue 12307, for which I just committed a fix this morning. The commit was made to 1.4 in revision 113065. If you could, please upgrade to that revision and see if this happens again. Thanks!

By: Doug (doug) 2008-04-08 01:32:42

Putnopvut: thanks I will give it a try and come back to you.

By: destiny6628 (destiny6628) 2008-04-08 06:56:33

same problem is coming with me as well when i run show channels command on asterisk-1.4.19 its becomes unresponsive.

By: destiny6628 (destiny6628) 2008-04-08 06:56:56

I am using asterisk-1.4.19

By: Jason Parker (jparker) 2008-05-01 15:10:08

Has anybody been able to test with revision 113065?

By: destiny6628 (destiny6628) 2008-05-17 06:05:43

I am using asterisk-1.4.19 and facing the same problem of show channels becoming unresponsive and after that in order to start the calls have to stop asterisk and start again .

Does latest svn comes with 113065.

By: destiny6628 (destiny6628) 2008-05-17 06:08:13

Had not tested with exactly 116038 revision but have patch the chan_local.c which was mentioned by russell as

Repository: asterisk
Revision: 116038

U branches/1.4/channels/chan_local.c

------------------------------------------------------------------------
r116038 | russell | 2008-05-13 16:12:25 -0500 (Tue, 13 May 2008) | 24 lines

Fix a deadlock involving channel autoservice and chan_local that was debugged
and fixed by mmichelson and me.

We observed a system that had a bunch of threads stuck in ast_autoservice_stop().
The reason these threads were waiting around is because this function waits to
ensure that the channel list in the autoservice thread gets rebuilt before the
stop() function returns. However, the autoservice thread was also locked, so
the autoservice channel list was never getting rebuilt.

The autoservice thread was stuck waiting for the channel lock on a local channel.
However, the local channel was locked by a thread that was stuck in the autoservice
stop function.

It turned out that the issue came down to the local_queue_frame() function in
chan_local. This function assumed that one of the channels passed in as an
argument was locked when called. However, that was not always the case. There
were multiple cases in which this channel was not locked when the function was
called. We fixed up chan_local to indicate to this function whether this channel
was locked or not. The previous assumption had caused local_queue_frame() to
improperly return with the channel locked, where it would then never get unlocked.


But the problem remains same as before .

Please suggests .

By: destiny6628 (destiny6628) 2008-05-17 06:09:18

DougUDI  :---- Does your issue is resolved mate .I am facing the same problem as yours .

By: destiny6628 (destiny6628) 2008-05-19 01:07:53

Please reply m still waiting for same .

Its a but critical for me , every time once in a day it happens .

By: destiny6628 (destiny6628) 2008-05-19 15:38:19

hi i am eagerly waiting for reply plz reply or suggests whats the solution thnks

By: destiny6628 (destiny6628) 2008-05-20 07:56:44

any hope of getting reply on same .

By: Mark Michelson (mmichelson) 2008-05-20 10:17:06

destiny6628: What svn revision are you using now? The next time this happens, could you please issue the "core show locks" command and attach the output to this issue? Thanks.

By: Doug (doug) 2008-05-21 01:36:05

Destiny - our problem had to do with using the Local channel for queues and would only occur under high call volume. We had 2 servers that had this problem 1 of wich we changed to direct channel queues. The second needs to be on Local channel and we have patched this server. I have not had any problems since but this server would only lock upo each 2 weeks or so. Because of this I cannot say for sure but so far so good.

By: destiny6628 (destiny6628) 2008-05-21 02:49:54

Thanks for the reply .

I was using earlier with asterisk-1.4.19 with the patch which was updated by Russel in revision 116038.

Today i have updated to asterisk-1.4.20 and will observe for today and will let u know the results.

By: destiny6628 (destiny6628) 2008-05-22 01:20:20

Hi

After upgrading asterisk-1.4.20 , same problem came again as well , today will upload core show locks output when problem arises .

Thanks

By: destiny6628 (destiny6628) 2008-05-22 05:29:45

putnopvut :-- Thanks a lot for solving earlier issue which was related to the frame.h.Once that was uploaded didnt had any issues with the asterisk getting crash or anything else .

But as i  have mentioned earlier as well i am having issues with the inbound and outbound calling both stops and one way of starting it as stopping asterisk and starting again .

Today also the same happened and have taken the output of core show locks which i am attaching as core show locks.txt

I am using asterisk-1.4.20
zaptel-1.4.10.1
libpri-1.4.4

Please reply asap if possible because dialing gets hamper most of the times like this only .

By: destiny6628 (destiny6628) 2008-05-22 05:31:21

If you want i can open a new bug .

Thanks in advance .

By: destiny6628 (destiny6628) 2008-05-22 10:59:07

will wait for the suggestions because dialing is not happening properly at all.

thanks a lot.
r

By: Mark Michelson (mmichelson) 2008-05-22 11:23:15

Thanks for the output. In both scenarios, the channel list lock is held by a thread calling the function ast_channel_free and is blocking access to that lock by other threads. That thread, however, does not appear to be waiting to acquire other locks.

This indicates that some operation or set of operations attempted by ast_channel_free is taking an excessive amount of time or may even be blocking indefinitely. In order to pinpoint where the problem is occurring, I'll need to see a backtrace of that thread when Asterisk deadlocks. The best way to do this is when the process deadlocks, open a terminal and use gdb to attach to the running process. Then get the output of "thread apply all bt full." If you can attach that output here, that would be helpful. Thanks.

By: Ronald Chan (loloski) 2008-05-22 12:09:04

destiny6628,

I'm just curious since i can't reproduce this behavior on my side, how many concurrent connection do you have at any given time, specially on peak time?

Is this bug reproducible any time? thanks

By: destiny6628 (destiny6628) 2008-05-22 12:32:56

Thanks for the prompt reply .

I will surely attach the backtrace at that point of time and also upload the max output which will be helpfull.

Hi loloski ,

The strange part is this there not much traffic on the server on which i am having the problem .

Will keep you posted as soon as the situation gets reproducible because its coming  many times during the calling .

By: destiny6628 (destiny6628) 2008-05-23 07:09:36

Today had the same problem where show channels command stopped working and both incoming and outgoing calls stopped .

At that time have taken the output of core show locks as well and have upload the output by the name of core show locks-1.txt .

2) Have the taken the gdb output of the asterisk process using thread apply all bt full as well .

The output is attached by the name backtrace.txt .

Hope this helps us in getting to the cause of the problem .

Thanks

By: destiny6628 (destiny6628) 2008-05-23 07:22:44

Immediately after i started asterisk again , after 10 min same problem came .

Have attached the backtrace by the name of backtrace2.txt and core show locks by the name of core show locks2.txt .

Thanks

By: Mark Michelson (mmichelson) 2008-05-23 09:38:45

destiny6628: Thanks for the otuput. I may be incorrect, but it appears that the two deadlocks you experienced manifested themselves in two different ways, but they both seem to be related to locking used in app_chanspy. I will take a further look and see if I can make the necessary corrections.

By: Mark Michelson (mmichelson) 2008-05-23 10:02:25

It would appear that the deadlock caused here is due to the fact that there are multiple spies listening to the same channel and that a portion of app_chanspy was written under the assumption that there would be only one spy listening to a channel. I am working on a patch which should address this.

By: Mark Michelson (mmichelson) 2008-05-23 10:42:17

I have uploaded 12376_chanspy_uid.diff which will give each datastore from app_chanspy a unique id (the address of the chanspy_ds structure) so that the correct one will be retrieve when searching. This should at least solve the second deadlock you reported recently (represented by backtrace1.txt and core_show_locks-2.txt). I think it may also solve the first of the two deadlocks you reported. Please test and let me know if this solves the problem.

By: destiny6628 (destiny6628) 2008-05-24 01:41:02

Hi

Thanks for the patch .

I have done the patch and will observe now , will observe till monday for the problem and keep you updated on same .

Thanks once again for the prompt response and patch .

By: destiny6628 (destiny6628) 2008-05-26 23:01:43

Dear putnopvut

Thanks once again for solving the problem which was causing trouble almost during the whole day of calling .

I have uploaded the patch as well on the very same day when you uploaded and since then there has been no issues what so ever .

Once again thank you very much and its great fun working with the product which has such a tremendous support .

contaque*CLI> core show uptime
System uptime: 3 days, 11 hours, 15 minutes, 15 seconds
contaque*CLI>

Thanks

By: Digium Subversion (svnbot) 2008-05-27 11:32:20

Repository: asterisk
Revision: 118365

U   branches/1.4/apps/app_chanspy.c

------------------------------------------------------------------------
r118365 | mmichelson | 2008-05-27 11:32:19 -0500 (Tue, 27 May 2008) | 14 lines

Add a unique id to the datastore allocated in app_chanspy since
it is possible that multiple spies may be listening to the same
channel.

(closes issue ASTERISK-11791)
Reported by: DougUDI
Patches:
     12376_chanspy_uid.diff uploaded by putnopvut (license 60)
Tested by: destiny6628

(closes issue ASTERISK-11668)
Reported by: atis


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=118365

By: Digium Subversion (svnbot) 2008-05-27 11:37:25

Repository: asterisk
Revision: 118371

_U  trunk/
U   trunk/apps/app_chanspy.c

------------------------------------------------------------------------
r118371 | mmichelson | 2008-05-27 11:37:24 -0500 (Tue, 27 May 2008) | 22 lines

Merged revisions 118365 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r118365 | mmichelson | 2008-05-27 11:38:38 -0500 (Tue, 27 May 2008) | 14 lines

Add a unique id to the datastore allocated in app_chanspy since
it is possible that multiple spies may be listening to the same
channel.

(closes issue ASTERISK-11791)
Reported by: DougUDI
Patches:
     12376_chanspy_uid.diff uploaded by putnopvut (license 60)
Tested by: destiny6628

(closes issue ASTERISK-11668)
Reported by: atis


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=118371

By: Digium Subversion (svnbot) 2008-05-27 11:42:50

Repository: asterisk
Revision: 118382

_U  branches/1.6.0/
U   branches/1.6.0/apps/app_chanspy.c

------------------------------------------------------------------------
r118382 | mmichelson | 2008-05-27 11:42:49 -0500 (Tue, 27 May 2008) | 30 lines

Merged revisions 118371 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r118371 | mmichelson | 2008-05-27 11:43:36 -0500 (Tue, 27 May 2008) | 22 lines

Merged revisions 118365 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r118365 | mmichelson | 2008-05-27 11:38:38 -0500 (Tue, 27 May 2008) | 14 lines

Add a unique id to the datastore allocated in app_chanspy since
it is possible that multiple spies may be listening to the same
channel.

(closes issue ASTERISK-11791)
Reported by: DougUDI
Patches:
     12376_chanspy_uid.diff uploaded by putnopvut (license 60)
Tested by: destiny6628

(closes issue ASTERISK-11668)
Reported by: atis


........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=118382