Summary:ASTERISK-00938: libpri(q931.c) responds on CALLPROCEEDING with STATUS(wrong message)
Reporter:knut (knut)Labels:
Date Opened:2004-01-28 15:07:47.000-0600Date Closed:2004-09-25 02:52:16
Versions:Frequency of
Environment:Attachments:( 0) call-fail-logs.tar
( 1) wrong-message.log.gz
Description:On a highly loaded E100P with only outgoing call setup, Q931 protocol analyser complains about STATUS indicating WRONG MESSAGE is receive in wrong state.

Here is the sequence:
> SETUP   (to public switch)
> STATUS (cause 98 - wrong message)

This cause the call to fail.

Comments:By: knut (knut) 2004-01-28 15:15:11.000-0600

Running lastest CVS version of libpri

By: Mark Spencer (markster) 2004-02-01 19:38:41.000-0600

We're missing something in the logs here.  I don't see the SETUP going out, and it would appear that we think the call hasn't been setup at all.

By: knut (knut) 2004-02-02 07:56:19.000-0600

Look into the log called call-failed.pri-log and search for this comment:
(**** COMMENT: Here is the SETUP for 37292809 finally sent after due to resending.)

I have checked the bytes transmitted and they matches the bytes logged by the protocol analyser. The strange thing is that I can not find the original message sent, I only see when the call is placed. Search for this comment:
   (****COMMENT: Here the call is placed. But it gets stuck....

As I said, there seems to be another fault regarding the numbering of frames. It is not connected to the bug covered in this bug report, but is filed in another report.

By: izo (izo) 2004-02-04 12:52:13.000-0600

knut try to uncomment define in a makefile of libpri ALERTING_NO_PROGRESS,
i helped me.
p.s. what kind of switch is it ?

By: knut (knut) 2004-02-05 03:19:21.000-0600

We're running towards an AXE10 public switch. It was operator informing us about this unexpected behaviour.
Thanks for the workaround, I'll try it out. Do you have any idea about the root cause?
mark, are the logs ok?

By: Mark Spencer (markster) 2004-02-05 09:17:41.000-0600

I'm still confused by the lack of decoded SETUP in your logs.  In the log, we see Asterisk send "Wrong Message", but our call state is NULL meaning that we have no idea anything about the call being mentioned.  I'd like to know more about why that is happening.  Can you provide a full debug including the decoded SETUP?

By: knut (knut) 2004-02-06 05:34:49.000-0600

I have included a larger snapshot form the original log with several "Wrong Message" inside. The log name is message.log.gz
I hope you find what you want. If not, please specify which trace level is required. We have now agreed on a new test session with the public operator on Monday 9th of Feb. Then we can do new traces along with the protocol analyser

By: Mark Spencer (markster) 2004-03-05 09:11:05.000-0600

It sounds as though we're not sending the original SETUP, most likely because the transmit window is full.  I'd like to login, ideally, when you place the system under load, so I can verify that theory.  Will that be possible?

By: knut (knut) 2004-03-11 02:38:07.000-0600

That will be possible in some weeks since the system is currently isolated from Internet. I'll be back when we have the test setup with Internet access ready.


By: James Golovich (jamesgolovich) 2004-03-15 03:23:41.000-0600

We just added some code to libpri and chan_zap (CVS HEAD only not in stable) so you can do a 'pri show span <spanno>' from the cli.  If you can update and check this when the problem is happening.

You need to modify the libpri Makefile and uncomment the #define LIBPRI_COUNTERS to get it to keep track of Q921/Q931 counters and the q921 queue.

Perhaps this would shed a bit of light on your problem

By: Mark Spencer (markster) 2004-03-29 03:14:39.000-0600

Please update to latest CVS.  I've implemented (and tested) proper windowing so that will likely fix this issue.

By: Mark Spencer (markster) 2004-03-29 03:15:25.000-0600

and please be sure to "make clean ; make install" on libpri after you cvs update it.

By: Mark Spencer (markster) 2004-04-03 21:08:08.000-0600

any update, were you able to test the new code?

By: knut (knut) 2004-04-05 16:15:24

The patch is now installed and works fine - no sideeffects observed so far. We have not run the monster testcase yet, that will still last a few weeks.But we're optimistic that the reported problem is fixed, since we don't see any channel problems anymore during regular traffic.

I'll report back when all testcases has been run.

By: Mark Spencer (markster) 2004-04-05 16:58:29

Excellent.  I'll go ahead and close this out, but feel free to reopen it if it becomes a problem.

By: Mark Spencer (markster) 2004-04-05 17:00:18

Fixed in CVS