Summary:ASTERISK-03958: deadlocks when manager connection dies without sending disconnect
Reporter:dviper01 (dviper01)Labels:
Date Opened:2005-04-20 06:05:49Date Closed:2008-01-15 15:35:13.000-0600
Versions:Frequency of
Environment:Attachments:( 0) manager_carefulwrite_rev1.diff.txt
Description:When a manager is connected and the connection just dies (like the client machine freezes, network is disconnected, etc.), Asterisk seems to block while trying to dispatch it's event to the client until a timeout has been reached. During this time, all new calls are deadlocked and no channel states are being changed.

In our case, the whole PBX freezes all new calls for about 2 minutes and then continues "normally". I tested this behaviour by pulling out the network cable of a connected client.


Sorry for not providing a core dump but I had to solve that problem quickly and wrote a little proxy that works asynchronously. If you really need it, I'll generate one outside office-hours.
Comments:By: Kevin P. Fleming (kpfleming) 2005-04-20 10:10:33

Please try the attached patch to see if it helps your problem... it seems that we attempt to write to the manager socket before checking to see if it will accept any data.

By: Kevin P. Fleming (kpfleming) 2005-04-27 01:00:06

Have you tested the patch I supplied?

By: Michael Jerris (mikej) 2005-05-14 21:39:51

No reply.  I think this patch is useful regardless.  Can we commit and close out pending additional info from the reporter?

By: Michael Jerris (mikej) 2005-05-14 21:40:42

DViper01 can we get an update on this please.

By: Kevin P. Fleming (kpfleming) 2005-05-14 22:49:33

I've committed the posted patch to CVS HEAD in spite of the lack of response.

By: Michael Jerris (mikej) 2005-05-17 09:37:58

This fixed caused bad CLI probs (see ASTERISK-4188) we need to take a step back on this one I think.

By: Mark Spencer (markster) 2005-05-17 14:13:43

This patch is wrong.  If you're calling carefulwrite, it is assumed you're calling it on a file descriptor that *does* have nonblock set.  After applying this patch, it now will cause two system calls per write even in the general case when the write will succeed.  By attempting the write first, you only make one system call, except when there is actually a *need* to wait, providing better performance.  Please completely remove this patch, and fix whatever issue is there by making sure that the file desciptors that carefulwrite is being called on are non blocking.

By: Russell Bryant (russell) 2005-05-17 14:46:02

this patch has been reverted, including a note summing up what Mark said, explaining why this is the way it is.

We need to find where carefulwrite is called on a blocking file descriptor.

By: Mark Spencer (markster) 2005-05-18 10:43:35

The only way this can occur is if the socket being written to has not had O_NONBLOCK set on it.

By: Russell Bryant (russell) 2005-05-18 22:01:38

O_NONBLOCK is always set on the socket unless the option 'block-sockets' is enabled.

I'm seriously starting sense some bogosity on this bug, especially since the supplied patch couldn't have fixed a problem.

By: Kevin P. Fleming (kpfleming) 2005-05-19 11:27:02

Given that we've also had no response from the original poster in over a month, I'm closing this bug. Sorry for the busted fix that I put in... I guess those discussions about commenting code do have some value :-)

By: Digium Subversion (svnbot) 2008-01-15 15:34:29.000-0600

Repository: asterisk
Revision: 5661

U   trunk/manager.c

r5661 | kpfleming | 2008-01-15 15:34:29 -0600 (Tue, 15 Jan 2008) | 2 lines

fix for dead manager connections to avoid deadlock (bug ASTERISK-3958)



By: Digium Subversion (svnbot) 2008-01-15 15:35:13.000-0600

Repository: asterisk
Revision: 5710

U   trunk/manager.c

r5710 | russell | 2008-01-15 15:35:12 -0600 (Tue, 15 Jan 2008) | 3 lines

remove call to pall on unitialied fds
This function assumes that the fd is nonblocking (bug ASTERISK-3958)