Summary:ASTERISK-07612: [branch] Change of channel state takes several minutes
Reporter:jogi (jogi)Labels:
Date Opened:2006-08-28 11:11:56Date Closed:2011-06-07 14:02:44
Versions:Frequency of
Environment:Attachments:( 0) Hint-Log.zip
( 1) SIP-Debug.txt
Description:On our core * switch we have activated the hint extension for our virtual pbx customers. If a registered channel changes the state from idle to busy - it takes several minutes (e.g. 3 minutes) until asterisk detects the new channel state (show hints reports for several minutes the idle-state, before it is changed to busy) and sends the notify message to the sip device. The same happens vice versa: if the channel gets idle again, it takes several minutes until asterisk detects the new state and sends the notify message to the sip device.


On the test system the detection works without problem.

On both systems we are using realtime.

Can someone give me some advice, how to debug this problem? It is a productive system.
Comments:By: Serge Vecher (serge-v) 2006-08-28 12:14:50

jogi: Well you could start with a SIP debug trace to see if anything interesting pops up. Please do the following:
1) Prepare environment for testing(reduce the ammount of unrelated traffic on the server);
2) Make sure your logger.conf has the following line:
  console => notice,warning,error,debug
3) restart Asterik.
4) Enable SIP transaction logging with the following CLI commands:
set debug 4
set verbose 4
sip debug
5) Save complete console log to file and _attach_ said file to the bug.

By: jogi (jogi) 2006-08-28 13:04:55

vechers: we have a test environment and there everything works as expected. Only on the productive system the problem exists. That means - no load - no problem. Can I reach you on IRC?

By: Serge Vecher (serge-v) 2006-08-28 13:19:48

1. Well I didn't say no load, just light load.
2. Yes, serge-v in #asterisk-bugs

By: Olle Johansson (oej) 2006-08-30 09:49:18

How do you combine realtime with hints? As far as I know, this is not a supported solution.

By: jogi (jogi) 2006-08-30 09:59:10

We generate Text-Files from the database and include them in the extensions.conf. We try to get some more debug information. As I have mentioned earlier the hint does work on our test system. Therefore it is not a general problem.

By: Serge Vecher (serge-v) 2006-08-31 12:14:39

also, would be great to test this with 1.2.11 ...

By: jogi (jogi) 2006-08-31 12:26:57

Serge, we did some testing again and have uploaded a SIP-Debug for this problem. We have traced two devices. The first device (0443332248) initiates a call to 445 (test number). The second device has sent subscriptions for the extension 48, which is mapped to 0443332248. We have inserted time stamps.
Once the call is established, it took 2:45 until the LED of the second device was switched on. We did some show hints commands, so it can be seen, that * is reporting a wrong state of the device.
If we hang up again, it takes again over 2 minutes, until * reports the correct state of the channel.

By: jogi (jogi) 2006-09-16 06:55:38

We have done the testing with the latest Release The problem is still the same. We have a lot of sip register messages and subscribe messages. May this lead to this problem?

By: jogi (jogi) 2006-09-30 09:21:41

Anyone who can support me ?

By: Serge Vecher (serge-v) 2006-10-02 08:51:55

jogi: any possibility of testing this on 1.4-beta?

By: jogi (jogi) 2006-10-12 11:57:55

We did some more testing and got the following result:

1. 1.4Beta has a bug in the devicestate (ASTERISK-7238) and the state does not change to busy

2. The problem described with version (and earlier) seems to be a load problem. We can reproduce it in our lab. As soon as the number of registration requests is increased, the duration until * changes the decivestate increases as well.

By: jogi (jogi) 2006-10-18 14:20:10

Serge-v - any idea, what else we can do? We can now do more detailed debugging if required

By: Olle Johansson (oej) 2006-11-13 13:33:00.000-0600

How many users/registrations/subscriptions do you have on the server?

By: jogi (jogi) 2006-11-13 14:00:32.000-0600

In the lab the problems begin with around 55 registrations/second (tested with SIPp). Below this number, the hint works as wished.

By: Olle Johansson (oej) 2006-11-14 03:46:44.000-0600

Would you mind sharing the configuration for SIPP and Asterisk used in your test, so I can test too? If so, please e-mail to oej@edvina.net. Thanks!

By: Olle Johansson (oej) 2006-11-30 01:54:30.000-0600

No answer from reporter.

By: jogi (jogi) 2006-11-30 04:03:56.000-0600

I have sent the files on the 14th of November to Olle

By: Olle Johansson (oej) 2006-11-30 04:04:45.000-0600

hmmm. Please re-send, I don't remember seeing them. Thanks. Sorry for the trouble.

By: jogi (jogi) 2006-11-30 04:23:48.000-0600

I have sent the email again - can you please confirm the arrival

By: Olle Johansson (oej) 2006-11-30 10:55:00.000-0600

Just a brief recap:

All changes to device states are sent to the devicestate engine, which executes in a separate thread. THis thread queues things up. I guess we need to look into that part and see what goes on. I'll add some debugging for you to test so we can see.

By: Olle Johansson (oej) 2006-11-30 10:58:13.000-0600

Creating test branch statechange-7816 based on 1.2

By: jogi (jogi) 2006-11-30 11:04:11.000-0600

Olle - thanks for your support. We urgently need this to be solved. We look into the branch.

By: Olle Johansson (oej) 2006-11-30 12:06:22.000-0600

Ok, I've added a counter and debug logging to devicestate.c in the branch. You will now get a message when we queue an item (with an ID) and when we process the queue. You can propably see a latency showing up or get other messages. Try to run the tests, capturing all output with debug set to 4 and verbose set to 4. We don't need the SIP debugging right now.

Of course adding realtime to registrations means that you add database processing in addition to other things, since we update the database for every registration.

At some point, I think we have to process registrations in a separate thread from the rest of the SIP signalling.

By: Olle Johansson (oej) 2006-12-05 13:03:23.000-0600

Waiting for feedback...

By: jogi (jogi) 2006-12-11 12:35:34.000-0600

Attached you find the output of the logs of the patched 1.2-version. I have stress tested the system with up to 100 reg/sec. During the load test I have not seen hint delays taking longer than 6-8 seconds. This is far less than during my other tests few weeks ago. Olle, do you have done some changes in the code or just debug output?

By: Olle Johansson (oej) 2006-12-15 05:21:57.000-0600

In this branch, I've just added debug output, nothing else. There has been quite a few changes to chan_sip in general though.

By: Olle Johansson (oej) 2007-02-01 15:54:58.000-0600

Seems like the problem disappeared, since you haven't added any more reports to this issue. Please re-open if you still have problems.