[Home]

Summary:ASTERISK-09034: [patch] REINVITE before 200ok causes a call to be ended
Reporter:atca_pres (atca_pres)Labels:
Date Opened:2007-03-16 13:28:44Date Closed:2007-11-15 05:29:17.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/Transfers
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 2_Invite_too_fast-clear.cap
( 1) BYE_b4_200OK.cap
( 2) BYE_b4_200OK.txt
( 3) BYE_b4_200OK-cancel.txt
( 4) BYE_b4_200OK-CancelRetransmit.cap
( 5) fix_confirmed.diff
( 6) log_sip.txt
( 7) probtrans.cap
( 8) sip_reinvite4.diff
( 9) sip_reinvite6.diff
(10) transOk.cap
(11) verbosedebug.txt
Description:When Flash hooking:
Box 1 sends a INVITE with a contact 0.0.0.0 (Hold)
Asterisk sends a invite to box 2
Box 2 sends a trying
Asterisk sends a second invite to box 2 with a differetn CSeq, Branch and Session version(SDP). This trigger a 500 msg.
The 500 is triggered because you cannot (according to RFC) send a second invite when you have an unfinished dialog

Note : No RTP Portal

****** ADDITIONAL INFORMATION ******

RFC 3261 :

The offer/answer
  model defines restrictions on when offers and answers can be made
  (for example, you cannot make a new offer while one is in progress).

 Once the UAS has sent or received an answer to the initial
        offer, it MUST NOT generate subsequent offers in any responses
        to the initial INVITE.  This means that a UAS based on this
        specification alone can never generate subsequent offers until
        completion of the initial transaction.

Concretely, the above rules specify two exchanges for UAs compliant
  to this specification alone - the offer is in the INVITE, and the
  answer in the 2xx (and possibly in a 1xx as well, with the same
  value), or the offer is in the 2xx, and the answer is in the ACK.
  All user agents that support INVITE MUST support these two exchanges.

This means that you cannot send a new INVITE as long as you didn't receive the 200 ok (offer-answer)
Comments:By: Serge Vecher (serge-v) 2007-03-16 13:31:40

As per bug guidelines, you need to attach a SIP debug trace illustrating the problem. Please do the following:
1) Prepare test environment (reduce the amount of unrelated traffic on the server);
2) Make sure your logger.conf has the following line:
  console => notice,warning,error,debug
3) restart Asterisk with the following command:
  'asterisk -Tvvvvvdddddngc | tee /tmp/verbosedebug.txt'
4) Enable SIP transaction logging with the following CLI commands (1.4/trunk commands in parenthesis):
set debug 4 (core set debug 4)
set verbose 4 (core set verbose 4)
sip debug (sip set debug)
5) Reproduce the problem
6) Trim startup information and attach verbosedebug.txt to the issue.

By: Clod Patry (junky) 2007-03-16 13:36:17

i will take care of this one, since I work with atca_pres.

By: Olle Johansson (oej) 2007-03-18 15:42:35

That's an interesting combination that I think we don't handle. I really need to see a full SIP debug to check what's going on here.

By: atca_pres (atca_pres) 2007-03-19 06:52:45

I'm adding the verbose debug for oej. I like people that are interested in what they do :)

Don't forget that Junky is already assigned to this incident so ... collaborate!

Thanks

By: Caio Begotti (caio1982) 2007-03-26 09:38:05

I've been testing the 1.4 branch last week for a customer and I got this strange behavior. I didn't know about this ticket at that time, so I couldn't save any logs.

Do you have any news regarding this issue, Junky? Thanks!

By: atca_pres (atca_pres) 2007-03-27 10:18:53

I just found something interesting.

If you change your dtmfmode from rfc2833 to info, problem doesn't happend.

By: Clod Patry (junky) 2007-03-28 21:23:30

I just realized this bug was introduced between 1.4.0 and 1.4.1, so latest svn is affected by this bug.
caio1982: thanks for follow-ups, could ya try with 1.4.0 (tarball) and confirm me everything is okay with that version.
Also, you tried with with hardwares?

By: Caio Begotti (caio1982) 2007-03-29 13:36:18

Junky, Mediatrix 1104 with firmware version 5.0.3.35 and running asterisk-1.4 revision 54103.
I'll try to take care of this in here in the next couple of days and then maybe I can report back :-)

By: Daniel Weatherford (ddub) 2007-03-29 13:45:24

I believe this is the same problem as what is seen in issue ID: 0009142.

By: Clod Patry (junky) 2007-03-29 13:50:01

great, im now doing a coop for mediatrix.
I will check this out with our different firmwares for the 1104.
( I've tested with 5.0.116.101 and worked fine with 1.4.0, but not with * svn 59257 )

By: Clod Patry (junky) 2007-04-05 15:18:53

Apparently, theres one extra pending reinvites.

I've tried that patch with Rev 54103 and 1104, and now it works fine.
caio1982: could ya test it in ur setup, too?

oej:
could it be associated with T38 re-invites?
since, there's probably an extra call to
 ast_set_flag(&p->flags[0], SIP_NEEDREINVITE);
or a missing call to
ast_clear_flag(&p->flags[0], SIP_NEEDREINVITE);
What's your feeling about it?

By: Clod Patry (junky) 2007-04-05 15:20:41

Since, I saw the problem in a lab, I turn this to confirmed.

By: Caio Begotti (caio1982) 2007-04-05 15:25:25

Amazing news, a patch! Just to let you know that I've tested it with 1.4.0 as you asked (yesterday) and it didn't work anyway. I'm gonna build it with your patch and I'll try to test it on monday (it's gonna be a long national holiday in here).

By: Clod Patry (junky) 2007-04-05 15:47:19

I tested with 1.4.2 too and it works so far, but I'm concern about breaking something, since I ignore pendings reinvites.



By: Caio Begotti (caio1982) 2007-04-05 15:54:34

So is there going to be some possible T.38 issue that I should be aware when I test the patch?

By: Yehavi Bourvine (yehavi) 2007-04-10 00:13:36

The patch works!

I've downloaded the latest SVN (61116), verified that the problem exists and
then applied the patch. So far all works ok.

                      Thanks! __Yehavi:

By: Clod Patry (junky) 2007-04-10 08:42:35

This patch isnt ready for trunk, since I didn't have a discussion with oej before.

But, I might have found something really interesting, will analyze further in the next 48 hours.

By: Steve Davies (one47) 2007-04-11 04:07:29

I suspect that a side-effect of this patch is that any SIP call where a reinvite is delayed awaiting another action will be treated as if "canreinvite=no" has been set for the call.

IMHO The reinvite should not be discarded as it is in the patch, it should be delayed instead until the 200 has been received.

The attached patch moves the pending reinvite check into the "Got a 200" code path, and out of all other cases. It is untested here (it compiles), but should be less harsh than the previous patch.

By: Yehavi Bourvine (yehavi) 2007-04-12 01:52:30

I've applied the new patch to the trunk version (61116) and so far it works ok.

                     Thanks, __Yehavi:

By: Steve Davies (one47) 2007-04-12 04:10:18

Can you confirm that you use canreinvite=yes, and that the calls are indeed re-INVITEd so that the media path no-longer goes through Asterisk?

By: Yehavi Bourvine (yehavi) 2007-04-12 04:23:09

I use canreinvite=yes and TCPDUMP confirmed that the media does not pass via Asterisk.

                    Thanks, __Yehavi:

By: Caio Begotti (caio1982) 2007-04-13 07:56:41

Ok, the patch sip_reinvite.diff worked for me as well using the latest trunk code.

However, some issues still remains but I couldn't figure out if it's something else or really related to this.  When I make the transfers for testing the CLI and the log gets filled with tons of messages like the following. Is it ok?

-- Native bridging SIP/5001-b6b033e8 and SIP/5003-0861f088
DEBUG[1978]: rtp.c:2942 bridge_native_loop: Oooh, something is weird, backing out

Also, if I set a MOH class to be used while I transfer one leg, sometimes the call just hangs up with no much information (logs were attached to this ticket). I even tried to change the format of my MOH and codecs used to make sure it was ok. If I then disable the MOH for the test everything goes fine using the patch.

By: Caio Begotti (caio1982) 2007-04-13 08:01:23

Ok, I can't attach my /var/log/asterisk/full log (which is 1.6mb):

Database query failed. Error received from database was ASTERISK-1147: Got a packet bigger than 'max_allowed_packet' bytes for the query:

By: Steve Davies (one47) 2007-04-13 08:15:25

Okay, I can see one possible reason for this, though it is a bit of a guess. Could you describe the process you go through to cause that trace in a little more detail?

In the meantime, here is a trivial update to the patch to see if it helps.

By: Steve Davies (one47) 2007-04-13 08:20:46

Apologies - sip_reinvite3.diff should be better than sip_reinvite2.diff. Could a maintainer please remove the old sip_reinvite.diff and sip_reinvite2.diff files.

Thanks

By: Caio Begotti (caio1982) 2007-04-13 08:42:14

About which situation you're talking about, one47? You mean the bridge_native_loop problem or the MOH one?

By: Steve Davies (one47) 2007-04-13 08:50:04

The bridge_native_loop one - The first sip_reinvite.diff inadvertently changed the logic so that a reinvite would be attempted, even if a BYE was about to be sent.

This meant that we might try to bridge a call that was in the process of being closed down. The errors probably continue until the call does get closed down. As I say, this is a bit of a guess, but I thought it best to remove it as a possible cause of the error.

By: Steve Davies (one47) 2007-04-16 08:18:45

I have been running sip_reinvite3.diff with some success on a test system here - I can reproduce the issue on an unpatched build of 1.2.17, and then fix it using the patch (slightly modified for 1.2.x) - I am using Aastra 480i and 9133i phones which have the same issue with double INVITEs.

Has anybody else tried this patch, and what degree of success/failure have you achieved. What errors are reported if any?

ciao 1982: Did the newer patch help with the bridge_native_loop error messages or the MoH issue? If not, can you describe in more detail how to reproduce the issue? If you still have the MoH problem, what format of MOH file are you using, and what codec is the phone using for the initial call? Is the codec being changed?

By: Olle Johansson (oej) 2007-04-18 15:35:10

one47: Please confirm your disclaimer, so that we can look at your files. Thank you.

By: Steve Davies (one47) 2007-04-18 16:31:41

Disclaimer is on file - Faxed in about 6 months back. Please let me know if I need to do anything else to log the fact on this system.

By: Caio Begotti (caio1982) 2007-04-25 07:00:49

Sorry for the delay, I couldn't update the issue before :-(

one47, the patch is ok and working pretty fine, despite that tons of bridge_native_loop messages still occuring and the MOH issue that I still could not figure out whether it's really due your patch or what.

You can reproduce the loop probem just by trying your sip_reinvite3.diff patch and keep monitoring the CLI while you do the call transfers with a Mediatrix equipament. The asterisk's logs will get filled anyway. We didn't try anything special for this to happen.

The MOH issue I still need to teste in deep but you'll track it if you just set up a MOH class with a .mp3 or .wav file inside it, then when you do the transfer and the MOH starts the call hangs up *sometimes*. The calls used "alaw" as default codec and the .wav file was even adjusted using the "file convert" command on CLI.

I'll try to get some extra informatiom about it, but I think we can concentrate in the loop thing.

By: Caio Begotti (caio1982) 2007-04-25 11:30:30

Well, seems something is a little bit unstable in here. I don't know what's wrong, but sometimes the transfer using your latest patch, one47, doesn't work. We did some testing with call coming from the PSTN and only after 10 call we noticed it stop working and get working back in the next call.

That's weird... do you guys have some idea about what I should monitor to make sure it's a environment issue and not patch's fault?

By: Steve Davies (one47) 2007-04-25 12:02:40

Okay, here is a reworked version of the patch, which makes even less of a change. It is _possible_ that the supression of a reinvite after an ACK was preventing some necessary re-invites from being sent in one direction or another.

sip_reinvite4.diff replaces sip_reinvite3.diff, and IMHO is clearer code than the previous version.

I am not sure I can reduce the change any further than this even if I wanted to!

By: Steve Davies (one47) 2007-04-25 12:10:00

In terms of monitoring, capturing the SIP trace of a working and of a not-working call setup would probably help. My personal preference is a PCAP trace from tcpdump or Ethereal, but using the usual asterisk methods (already detailed above) would also be okay.

I am really hoping that oej will take one look at this and spot the proper solution :)

By: Caio Begotti (caio1982) 2007-04-25 15:27:14

probtrans.cap is the capturing from a not working transfer
transOk.cap was taken from a working one

Anyway, I'll try your new patch, one47, thanks!
Those new uploaded files are just for documentation of the issue.

By: Steve Davies (one47) 2007-04-25 17:36:27

caio1982 - I think that the re-INVITE patch is behaving correctly, and that you are seeing a different problem. The problem occurs at the point of the transfer. The REFER message is sent to asterisk, which IMMEDIATELY responds "202 Accepted", and then send a "200 OK" to indicate completion, where in fact it has not yet started the transfer process in the background.

At this point MxSipApp decides to hangup the 2 channels it just joined. If it does this quickly enough, the BYE request disrupts the transfer (which is not yet completed.) If you are lucky, MxSipApp is slow enough that the transfer has begun, and can proceed normally.

I searched the bugtracker for this problem, but did not find it anywhere. Have you installed any other patches which might change the REFER behaviour? It may be worth creating a new bug and attaching the 2 captures if you cannot find any reference to the problem.

By: atca_pres (atca_pres) 2007-04-26 14:39:40

caio1982
This is a known issue with Mediatrix product (well from me anyway)
Enter a new issue for this one. The behavior has been there for a while. If the Mx bye all the calls, Asterisk start acting weird. You can see in your capture probtrans that Asterisk Bye a call before receiving the 200 Ok fron the invite he just sent. And then when the call is over, Asterisk seems to answer it's own INVITE with 200ok to the Mx unit.

I suggest you use canreinvite=no and set the Interop variable sipInteropReplacesConfig to useReplacesNoRequire. I know of two other variables that exists for transfers that you could try : sipInteropTransferVersion and sipInteropReplacesVersion

But I suggest to create a new issue with this. (And add the current comment concerning this to the new one if possible)

By: Caio Begotti (caio1982) 2007-05-02 07:00:58

Ok, done. Issue http://bugs.digium.com/view.php?id=9649
Thanks for the invaluable help guys.

By: Olle Johansson (oej) 2007-05-02 07:48:37

Finally reassigning this since Junky hasn't made a comment lately.

By: Olle Johansson (oej) 2007-05-02 07:52:04

junky: Going back to your comment - was T.38 enabled for these calls?

By: Olle Johansson (oej) 2007-05-02 07:56:04

ANother issue: getting 500 internal server error on a re-invite should not hang up the call...

By: atca_pres (atca_pres) 2007-05-02 08:23:18

Even if the re-invite is in first transaction of the first invite to this UA ?

Well, anyway it's "normal" the behavior is weird since this is forbidden in the RFC.

But I agree when the Mediatrix units don't know what to do, they don't try to make it work, they end it. Not necessarely a bad thing tho.

Thank you oej

By: Yehavi Bourvine (yehavi) 2007-05-02 12:00:54

Now that this patch has fixed my problem, is there a chance that it will be incorporated into the Trunk's SVN?

                   Thanks! __Yehavi:

By: Olle Johansson (oej) 2007-05-10 01:41:33

We saw exactly this scenario in the Asterisk SIP masterclass yesterday.

By: atca_pres (atca_pres) 2007-05-10 16:02:21

Sorry to ask, but masterclass ? Is it good or bad ? What does that mean ?

By: Olle Johansson (oej) 2007-05-15 11:13:49

It means that I can confirm this behaviour, since I saw it happen during training. I haven't had time to consider a proper fix.

By: Denis Galvao (denisgalvao) 2007-06-26 10:32:15

I tried the patch: sip_reinvite4.diff  and it seems to work, but I got another problem(that is not related to this one at all).

When doing the transfer with the flash button, the call is transferred without problems:
1. "A" call "B"
2. Reinvite from "A" to "B"
3. "B" transfer "A" to "C"
4. "A" listen MOH
5. "B" talks to "C"
6. "B" hangup
7. "A" speak to "C"

This scenario is based on a attendant transfer.

But when blind transfer the same call this happened:
1. "A" call "B"
2. Reinvite from "A" to "B"
3. "B" transfer "A" to "C"
4. "A" listen MOH
5. "B" hangup
6. "A" stop listen MOH and "C" start ringing
7. "A" doesn't listen anhything until "C" pickup the phone and start the conversation

I got some rtp debug, and the MOH is going to "A" but I dont listen anything.

Someone got the same problem not using reinvite on 1.2, maybe it is not related to trunk neither to this patch.

By: atca_pres (atca_pres) 2007-06-29 12:49:01

Hi oej,

Is there any chance of seeing this included in the trunk ?

Thanks

By: Caio Begotti (caio1982) 2007-06-29 12:57:58

If there's any, I suppose one47 still need to sign his license agreement for sip_reinvite4.diff in the new Mantis setup...

By: Steve Davies (one47) 2007-06-29 14:17:59

I have submitted the new disclaimer form just to make the system happy (This file pre-dates the change, so should not need it!)

If it helps, I have 10 smallish customers, about 300 snom/aastra phones all running this patch with no ill effects. Eyebeam might be upset by it slightly, but I cannot confirm that that is caused by this patch as I have not tried excluding it.



By: Steve Davies (one47) 2007-07-10 11:31:17

*bump*

This sould be okay for oej to take a look at now. Thanks to the help on IRC in getting the license on the patches cleaned up.

By: atca_pres (atca_pres) 2007-08-09 08:30:58

By the way, this is more general that I first anticipated.

Asterisk doesn't seem to be aware if there is a transaction going on with a UA at all time.

Specificaly, the exact same behavior for the reinvite happens for BYE as well (BYEs are transactions too). A BYE can be sent to a UA before the 200 OK to an INVITE. And this doesn't seem to be corrected by the patch of One47 (thank you for the patch btw). On BYEs, this is not noticeable right away, but after a transfer, I see trailing 200 OK from my Mediatrix units because of this.

Maybe One47 would also want to take a look at this ?

Thanks

By: Steve Davies (one47) 2007-08-10 05:45:55

I will be happy to look into this. I assume that all of the captures etc are recorded on bug ASTERISK-9366, so will follow up there when I have some feedback.

By: Steve Davies (one47) 2007-08-10 12:35:34

Where are you seeing this early-BYE behaviour happening? When asterisk wants to stop a call, it has 2 choices, for an established connection, it will send a "BYE" and for a not-yet established connection it will send a CANCEL. Sometimes it will wait and send a BYE when permitted.

This was not done quite right in 1.2, but looks fine in 1.4.10.1

By: atca_pres (atca_pres) 2007-08-13 07:02:10

Hi,

I attached 2 files "BYE b4 200OK". One is the Asterisk debug output the other, the ethereal capture.

You can see in the ethereal, packet 60 a Re-Invite CSeq 107.
Packet 72 : BYE CSeq 108.
Packet 73 : 200 OK CSeq 107

So, like I was saying this is forbidden. Asterisk should wait for the 200OK before sending the BYE. I think this is the same problem : Asterisk does not alwasy know when a transaction is in progress or not.

I hope these files will answer your question.

And thanks for taking a look at it !

By: Steve Davies (one47) 2007-08-13 10:49:26

Having seen how oej (I assume) has cleaned up the tracking of the state of an INVITE (or re-INVITE) in the newest 1.4 code, I am submitting a new patch. This should be standalone, and should fix the original early-reINVITE, AND the early BYE issues.

Basically, I think that the INVITE state-tracking was not being reset for a reINVITE - This is a bit of a guess, so be forgiving eh? :) Also, I do not have a 1.4 test enviroment at present, but this does compile. Let me know if it works.

Steve

By: atca_pres (atca_pres) 2007-08-13 15:01:39

Hi One47,

It's better, but not perfect :)

I'm attaching an ethereal file with your sip_reinvite applied. The asterisk sends a cancel, which is good. But then the unit had already sent it's 200OK (small race condition here), so it ignores the cancel (after a 200 OK, you should answer a ACK) which is fine. Then asterisk sends the ACK and the BYE (great news) BUT (there is always a but) asterisk now don't stop sending the CANCEL after the 200 OK and/Or bye is over.

Let me know if you need an asterisk log, I don't really have time right now, so if needed, I'll take one tomorrow morning.

By: atca_pres (atca_pres) 2007-08-13 15:19:28

Just added the Asterisk log, got some time left :)

Thank you

By: Steve Davies (one47) 2007-08-14 08:14:31

Could a maintainer please delete sip_reinvite3.diff and sip_reinvite5.diff as these are non-functioning patches.

It is worse than you think. I could probably find why the CANCEL is repeating, but  sip_reinvite5 also prevents the reINVITE of the transferred call legs from working properly. I'll have to re-think the patch, perhaps preventing the redundant INVITE in the first place?

Steve

By: atca_pres (atca_pres) 2007-08-14 09:06:12

Maybe I live in a simple world, but isn't the server suppose to know at all time when :

A transaction is unfinished
A call is over or not

With this, it would be simple : If a transaction is not finished, don't send new ones, cancel. If the call is over (after a bye), clean whatever is left.

I'm no programmer :)

Just wanted to thank you One47 for your efforts.

By: Steve Davies (one47) 2007-08-14 09:43:31

Yes, the state of a call is tracked in the code, but SIP is not quite that straight-forward. An INVITE is used in several ways, and in Asterisk seems to be tracked differently depending on how it is used.

The key mechanisms affecting your system are 1) INVITE to Set-up a call, 2) INVITE to Redirect media during a reINVITE, 3) INVITE to un-Redirect media during a hangup.

1) was already tracked correctly.
2) is fixed by sip_reinvite4.diff
3) is still being missed.

I am uploading sip_reinvite6.diff, which if I am lucky will defer the BYE if a media-change INVITE is outstanding, and not at other times :-O

Cheers,
Steve

By: atca_pres (atca_pres) 2007-08-14 14:45:31

One47, I think you surpassed yourself on this last patch.

So far as my tests have gone, everything seems to be perfect.

I'll post something here if I find something. So far so good !

Thank you very much, once again.

By: Curt Moore (jcmoore) 2007-09-25 13:11:12

I was having a very similar problem where Asterisk would send a BYE immediately after receiving a "100 Trying" in response to a re-INVITE.  sip_reinvite6.diff seems to have fixed this for me.  Can we have some others look over this patch and try to get it in to 1.4/trunk ASAP as it's a pretty serious bug in the SIP stack.

The patch as it is seems to be correct but I'd prefer we had more eyes on it before committing it.

By: Olle Johansson (oej) 2007-11-06 01:56:55.000-0600

This bug needs to move to the top again and the patch needs to be looked at and possibly committed.

Note to myself: Get back to work :-)

By: Steve Davies (one47) 2007-11-12 11:41:23.000-0600

To paraphrase the patch, in case it helps in validating it:

1) The "Pending-Invite" flag has been set because we had a transaction in progress already. If a transaction is still in progress or we seem to be mid-INVITE still, then we cannot clear this flag yet, so leave it set and do nothing.

2) The "Pending-Bye" flag has been set because we had a transaction in progress. "BYE" cannot be sent mid-transaction (only CANCEL), so apply the same rules here to prevent sending a BYE.

By: Digium Subversion (svnbot) 2007-11-15 05:24:13.000-0600

Repository: asterisk
Revision: 89281

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r89281 | oej | 2007-11-15 05:24:12 -0600 (Thu, 15 Nov 2007) | 6 lines

Don't send re-invites during pending INVITE transactions.

Patch by one47 - thanks!

Closes issue ASTERISK-9034

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-11-15 05:29:17.000-0600

Repository: asterisk
Revision: 89283

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r89283 | oej | 2007-11-15 05:29:17 -0600 (Thu, 15 Nov 2007) | 14 lines

Merged revisions 89281 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r89281 | oej | 2007-11-15 12:26:22 +0100 (Tor, 15 Nov 2007) | 6 lines

Don't send re-invites during pending INVITE transactions.

Patch by one47 - thanks!

Closes issue ASTERISK-9034

........

------------------------------------------------------------------------