Summary:ASTERISK-03833: [patch] chan_sip does not decode escaped characters as per the RFC
Reporter:Leif Madsen (lmadsen)Labels:
Date Opened:2005-04-01 13:23:14.000-0600Date Closed:2008-01-15 15:46:02.000-0600
Versions:Frequency of
Environment:Attachments:( 0) international3.txt
( 1) internationalsip.txt
( 2) internationalsip2.txt
( 3) sjphone-pedantic-debug.txt
Description:When attempting to register a Cisco 7960 to Asterisk with a username which contains a '#', the Cisco phone replaces the '#' with the escaped code equivelent (%23) as defined in RFC 2936 (Uniform Resource Identifiers (URI): Generic Syntax). For SIP compliance as I understand it, Asterisk should accept these escaped characters at all times.

chan_sip.c does contain the following bit of code (line 1125):

/*--- url_decode: Decode SIP URL  ---*/
static void url_decode(char *s)
       char *o = s;
       unsigned int tmp;
       while(*s) {
               switch(*s) {
               case '%':
                       if (strlen(s) > 2) {
                               if (sscanf(s + 1, "%2x", &tmp) == 1) {
                                       *o = tmp;
                                       s += 2; /* Will be incremented once more when we break out */
                       /* Fall through if something wasn't right with the formatting */
                       *o = *s;
       *o = '\0';

And is called in get_destination (line 5393) with the following bit of code:

if (pedanticsipchecking)

However, according to oej, apparently even when pedantic=yes, Asterisk doesn't actually use that code. He did mention he got it working when pedantic=yes, but I believe that escaped characters should be decoded in all instances.


The following section (19.1.2) from RFC 3261 (SIP) states:

SIP follows the requirements and guidelines of RFC 2396 [5] when
  defining the set of characters that must be escaped in a SIP URI, and
  uses its ""%" HEX HEX" mechanism for escaping.  From RFC 2396 [5]:

     The set of characters actually reserved within any given URI
     component is defined by that component.  In general, a character
     is reserved if the semantics of the URI changes if the character
     is replaced with its escaped US-ASCII encoding [5].  Excluded US-
     ASCII characters (RFC 2396 [5]), such as space and control
     characters and characters used as URI delimiters, also MUST be
     escaped.  URIs MUST NOT contain unescaped space and control

  For each component, the set of valid BNF expansions defines exactly
  which characters may appear unescaped.  All other characters MUST be

  For example, "@" is not in the set of characters in the user
  component, so the user "j@s0n" must have at least the @ sign encoded,
  as in "j%40s0n".

  Expanding the hname and hvalue tokens in Section 25 show that all URI
  reserved characters in header field names and values MUST be escaped.

Also, section 19.1.4 URI Comparison states:

o  Characters other than those in the "reserved" set (see RFC 2396
        [5]) are equivalent to their ""%" HEX HEX" encoding.

Comments:By: Olle Johansson (oej) 2005-04-01 13:25:14.000-0600

Working on a patch.

By: Olle Johansson (oej) 2005-04-01 13:31:50.000-0600

Since this is part of registration and pedantic setting (url decoding) I'm including this patch in the patch in bug report ASTERISK-3583

By: Brian West (bkw918) 2005-04-06 00:00:54

I agree with leif on this one.  Unless i'm reading the RFC wrong.. (which is easy to do with sip)


By: Olle Johansson (oej) 2005-05-01 15:31:23

We need to define character set for [name] in sip.conf - is it utf-8, ISO-8859-1 or US ASCII (7 bit) ?

Leif, we need to add that issue to the alphaexten document. It's growing.

I have a partial patch somewhere, will try to find it and extract it.

By: Leif Madsen (lmadsen) 2005-05-01 22:17:10

ok, let me know when you post it so I can test it. Obviously I'll be alerted if you post it here :)

Olle - I suppose it needs to be added to the slide that discusses which character set is being used.

By: Olle Johansson (oej) 2005-06-04 07:12:15

Waiting for feedback on the alphanumericextensions. A short fix may help temporarily, but it is not a solution.

By: Michael Jerris (mikej) 2005-06-23 06:15:25

oej, any update on this?  I worked with somone this morning with the same issue and pendantic fixed it.  Is this still an issue?

By: adomjan (adomjan) 2005-06-23 09:33:23

partially fix the pendatic...
the voice cable modems works fine, but the sjphones can't register if pedatic=yes.

By: Michael Jerris (mikej) 2005-06-23 22:33:54

can we get a sip debug of the sjphone with pendantic=yes.  Maybe we need to split the url decoding into a different option.

By: adomjan (adomjan) 2005-06-24 04:52:22

The debug file is uploaded.

By: Olle Johansson (oej) 2005-06-25 04:34:55

This quickly evolves into a much bigger issue than url decoding. See the discussion on character sets on the -dev list. We need not a simple workaround, but a solution. What is a peer name? What is an extension? ASCII, ISO-8859-1, UTF-8 or EBCDIC? :-)

By: adomjan (adomjan) 2005-06-25 04:55:04

peer name: 15800114, extensions 15800114, charset is UTF-8 but I'm not using not ASCII characters.
the extensions contains only: * # [0-9]

By: Olle Johansson (oej) 2005-06-25 07:04:08

admonjan: Your bug is not related to this bug report. Please open another bug report so we do not confuse the issues. There is another bug report with a patch for handling registration ASTERISK-3583 - check that patch.

By: adomjan (adomjan) 2005-06-27 05:57:31

the patch of 0003663 bug fixed the registration problem with sjphone, thanks!

By: Michael Jerris (mikej) 2005-06-27 08:55:59

oej-  So the way I am reading your statement, the immediate issues from this bug are now all working, but there are still some much bigger internationalization issues to address, correct?

By: Olle Johansson (oej) 2005-07-20 12:51:23

Moving this to the "todo" list for version 1.3. It requires too much of fixing to do it right. Might add a quick fix in a separate report.

By: Olle Johansson (oej) 2005-07-26 06:06:25

This patch (yes, I had to change my mind ;-) )
* Adds more url decoding if pedantic=yes
* Fixes a small bug in registration (answer 404 if peer name not found)
 Found while testing this
* Moves url_decode to utils.c
* Adds a new fucntion url_encode to utils.c (needed for replace/REFER later)
* Fixes some formatting in utils.c

This patch makes chan_sip better, but does not solve the problem. I can now register and use a peer named "olel#extra" with Xten (did not work before). I can't use a peer named "j├Ârgen" (UTF8).

By: Brian West (bkw918) 2005-07-29 11:34:04

the function names are kinda skewed urlencode_string vs url_decode?  Shouldn't we have some sort of standard here that makes no sense to me to name them very differntly.


By: Olle Johansson (oej) 2005-07-29 14:06:55

bkw: Good input, I'll update the patch. Thanks.

By: Olle Johansson (oej) 2005-07-30 07:53:56

Added new patch inspired by bkw918's suggestion.

By: Tilghman Lesher (tilghman) 2005-07-30 16:00:30

Given that those two functions are public, I would recommend adding an "ast_" prefix to both of them.

By: Kevin P. Fleming (kpfleming) 2005-08-22 18:13:47

I notice a few things that don't seem to be correct with this patch:

1) As Corydon noted, the API function names should start with ast_.

2) The functions work with URI strings, not necessarily URLs (being a subset of URIs). In fact, the strings in SIP are URIs, not URLs, so I would suggest that the function names/documentation refer to URI instead of URL.

3) There are lots of other unrelated changes in the patch, including reformatting a function in utils.h that is no longer in that file.

4) url_encode seems to be using some extra variables for its operations that are not needed, but that's a minor point.

By: Olle Johansson (oej) 2005-08-23 00:40:34

Will update.

The extra variable *will* be needed by other coming patches that I'm working with.

By: Olle Johansson (oej) 2005-08-29 17:31:49


By: Kevin P. Fleming (kpfleming) 2005-08-29 18:38:30

Committed to CVS HEAD, thanks!

By: Digium Subversion (svnbot) 2008-01-15 15:46:02.000-0600

Repository: asterisk
Revision: 6441

U   trunk/channels/chan_sip.c
U   trunk/include/asterisk/utils.h
U   trunk/utils.c

r6441 | kpfleming | 2008-01-15 15:46:01 -0600 (Tue, 15 Jan 2008) | 2 lines

encode/decode URIs in 'pedantic' mode (issue ASTERISK-3833)