[Home]

Summary:ASTERISK-05212: Manager connections randomly disconnected
Reporter:wunderkin (wunderkin)Labels:
Date Opened:2005-10-02 03:41:11Date Closed:2008-01-15 15:51:13.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/ManagerInterface
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:I always run 2 manager connections.  1 of them generates the call to the customer and the other generates a call to an agent.  The connection towards the agent randomly gets disconnected.  The connection to the agent was idle and got disconnected apparantly after generating an originate on the customer connection.  I've also seen it disconnect the agent connection seemily after it generated a call for an agent.  I've never seen the connection for the customers get disconnected yet.

When this happens I see this on the console:
 == Manager 'listen' logged off from 192.168.1.3
Oct  2 01:14:32 WARNING[7202]: manager.c:1501 manager_event: Disconnecting slow (or gone) manager session!

This seems to be caused from some of the latest manager changes.  There is a bug  ASTERISK-5205247 (pending stable) relating to manager disconnections.  Is this related - I'm not sure.  There is a different error message on that one.
Comments:By: Clod Patry (junky) 2005-10-03 22:56:18

Which version of manager.c?
Could you try with Revision: 1.121 ?
Do you see any LOG_DEBUG message?

Paste your 2 managers users.

By: wunderkin (wunderkin) 2005-10-04 18:11:57

I'll try this tonight with the newest HEAD, I guess.

By: wunderkin (wunderkin) 2005-10-05 13:14:50

Running the latest cvs HEAD as of this post.  I still have the problem.  I'm using version 1.123 of manager.c.  I don't have debug turned on, I can try that tonight.

Here's the output when the problem occurs:

   -- Executing SetCallerID("Local/*12222222222@autodial-outagent-a779,2", "<1111111111>") in new stack
Oct  5 09:31:24 WARNING[21343]: manager.c:1511 manager_event: Disconnecting slow (or gone) manager session!
   -- Executing GotoIf("Local/*12222222222@autodial-outagent-a779,2", "0?g2") in new stack
 == Manager 'listen' logged off from 192.168.2.2
Oct  5 09:31:24 WARNING[21343]: manager.c:1511 manager_event: Disconnecting slow (or gone) manager session!
   -- Executing Set("Local/*12222222222@autodial-outagent-a779,2", "GROUP()=g1") in new stack
Oct  5 09:31:24 WARNING[21343]: manager.c:1511 manager_event: Disconnecting slow (or gone) manager session!
   -- Executing NoOp("Local/*12222222222@autodial-outagent-a779,2", "2") in new stack
Oct  5 09:31:24 WARNING[21343]: manager.c:1511 manager_event: Disconnecting slow (or gone) manager session!
   -- Executing SetAccount("Local/*12222222222@autodial-outagent-a779,2", "1") in new stack
Oct  5 09:31:24 WARNING[21343]: manager.c:1511 manager_event: Disconnecting slow (or gone) manager session!
   -- Executing Dial("Local/*12222222222@autodial-outagent-a779,2", "Zap/g1/12222222222|17") in new stack
Oct  5 09:31:24 WARNING[21343]: manager.c:1511 manager_event: Disconnecting slow (or gone) manager session!

It looks like the problem is only occuring on the manager connection that is normally idle.  It seems to immediately disconnect after sending the originate.  It is no longer listed in show manager connections.  During the testing, I've been disconnecting the primary connection which normally contiunously originates calls.

During the other type of testing, I always had both manager connections running.  The primary was continously generating calls and the secondary was idle.  They both always stayed connected.

Here's my relevant manager.conf postings (requested by cpatry):
[general]
enabled = yes
port = 5038
bindaddr = 0.0.0.0
;displayconnects = yes

; agent connection [mostly idle]
[listen]
secret = listen
deny=0.0.0.0/0.0.0.0
permit=192.168.2.2/255.255.255.255
permit=192.168.1.3/255.255.255.255
read = agent,call
write = system,call,log,verbose,command,agent,user

; primary connection - normally continously generating originates
[connect]
secret = 1234
deny=0.0.0.0/0.0.0.0
permit=192.168.2.2/255.255.255.255
permit=192.168.1.3/255.255.255.255
read =
write = system,call,log,verbose,command,agent,user

and thank you for asking, that reminded me to change the bind to only the local IP :)

cpatry mentioned that he will login to my system tonight to help diagnose.

By: wunderkin (wunderkin) 2005-10-07 12:25:10

Unfortunately I was out all day on the 5th so I haven't met up with cpatry yet.  It actually looks like it may not always be immediately disconnected.  I'm using the same version of HEAD.

Oct  7 10:20:20 WARNING[6972]: app_sql_postgres.c:253 aPGSQL_connect: adding identifier
   -- Executing PGSQL("Local/*11111111111@autodial-dial-471e,1", "Query resultid 59 UPDATE leads SET disp='99' WHERE leadid='365'") in new stack
   -- Executing PGSQL("Local/*11111111111@autodial-dial-471e,1", "Query resultid 59 UPDATE autodial_calls SET status='QUEUE' WHERE leadid='365'") in new stack
   -- Executing PGSQL("Local/*11111111111@autodial-dial-471e,1", "Disconnect 59") in new stack
   -- Executing Queue("Local/*11111111111@autodial-dial-471e,1", "markq|r|||20") in new stack
   -- Executing GotoIf("Local/*12222222222@autodial-outagent-819e,2", "0?g2") in new stack
   -- Executing Set("Local/*12222222222@autodial-outagent-819e,2", "GROUP()=g1") in new stack
   -- Executing NoOp("Local/*12222222222@autodial-outagent-819e,2", "2") in new stack
   -- Executing SetAccount("Local/*12222222222@autodial-outagent-819e,2", "1") in new stack
   -- Executing Dial("Local/*12222222222@autodial-outagent-819e,2", "Zap/g3/12222222222|17") in new stack
   -- Requested transfer capability: 0x00 - SPEECH
   -- Called g3/12222222222
   -- Zap/50-1 is proceeding passing it to Local/*12222222222@autodial-outagent-819e,2
Oct  7 10:20:21 NOTICE[6976]: channel.c:2166 __ast_request_and_dial: Don't know what to do with control frame 15
   -- Zap/50-1 is ringing
   -- Zap/50-1 answered Local/*12222222222@autodial-outagent-819e,2
      > Channel Local/*12222222222@autodial-outagent-819e,1 was answered.
   -- Executing PGSQL("Local/*12222222222@autodial-outagent-819e,1", "Connect db host=192.168.2.2 user=www dbname=autodial-test") in new stack
Oct  7 10:20:34 WARNING[6976]: manager.c:1511 manager_event: Disconnecting slow (or gone) manager session!
 == Manager 'listen' logged off from 192.168.2.2
Oct  7 10:20:35 WARNING[6982]: manager.c:1511 manager_event: Disconnecting slow (or gone) manager session!

Looks like 15 seconds for this one.

By: Kevin P. Fleming (kpfleming) 2005-10-13 19:35:49

Can you try increasing the 'writetimeout' setting for that connection to see if it makes any difference? Otherwise I can't see what could cause this...

By: wunderkin (wunderkin) 2005-10-14 12:30:12

I updated to the latest -HEAD last night and added the following to the general section of manager.conf:

writetimeout = 2000

I made this change in manager.conf before I started up the new version.

This still seems to be occuring.  Shortly after the primary connection originated a call, the secondary connection disconnected.  It looks like 4-5 seconds after.

I always get this message when calling through Zap:
Oct 14 10:20:16 NOTICE[30908]: channel.c:2160 __ast_request_and_dial: Don't know what to do with control frame 15

then..
Oct 14 10:20:20 WARNING[30914]: manager.c:1511 manager_event: Disconnecting slow (or gone) manager session!
 == Manager 'listen' logged off from 192.168.2.2

By: Mark Spencer (markster) 2005-10-14 16:44:58

It's quite possible the issue exists within your application, but I just found one potential snafu in CVS head that could have caused it.  I've put a fix into CVS head, let me know if that takes care of it.

By: wunderkin (wunderkin) 2005-10-14 17:12:26

Ok, I was wondering about that too since it always seems to be this one in particular and never the primary one.  I was going to try logging in as an additional manager user which has all read access so I can maybe see if there is anything in particular that goes by every time when it disconnects.

I'm going out for the weekend but I should be able to try it at some point this weekend though so I'll let you know.

I also tried changing the read line to blank for the problem user and it didn't seem to disconnect anymore.  I'll update and do more testing.

Everyone have a safe trip back from Astricon, I wish I could have gone.



By: wunderkin (wunderkin) 2005-10-16 01:38:25

Ok, I updated to the latest HEAD again.  I tested with the read= line blank for the listen user for 30 minutes.  It stayed connected.

I tried with read=agent,call again and I was able to replicate it twice.  It is random but happens within a few calls.  I only tested for about 5-10 minutes.

It's just a PHP application that connects to the manager, logs in, and sends async originates.  It doesn't try to read from the socket or anything at all.  The base code for the connecting is the same as the primary application that runs fine.

I just noticed that the connect user is also blank for read.  Hmm.. interesting.

The same thing happens when it disconnects, the listen user is disconnected immediately after sending an originate (and says disconnecting slow user).

It must be something with my program, that when something is sent to the socket that it all eventually queues up and dies?  

Ok, well it looks like everything else is set at this point.  Should be ok to close now.  Thanks for the help.

By: Digium Subversion (svnbot) 2008-01-15 15:51:13.000-0600

Repository: asterisk
Revision: 6792

U   trunk/manager.c

------------------------------------------------------------------------
r6792 | markster | 2008-01-15 15:51:12 -0600 (Tue, 15 Jan 2008) | 2 lines

Fix carefulwrite and its usage within manager (bug ASTERISK-5212, maybe)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=6792