ASTERISK-08070: URIENCODE doesn't handle '#' correctly. Could be others

[Home]

Summary: ASTERISK-08070: URIENCODE doesn't handle '#' correctly. Could be others

Reporter: salaud (salaud) Labels:

Date Opened: 2006-11-06 03:52:35.000-0600 Date Closed: 2006-11-20 20:54:22.000-0600

Priority: Minor Regression? No

Status: Closed/Complete Components: Functions/func_uri

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments:

Description: doing URIENCODE(#) results in '#' instead of %23. I really need to pass strings like this through to AGI.

Comments: By: Joshua C. Colp (jcolp) 2006-11-09 22:14:27.000-0600

I believe it would actually break RFC if we converted '#' as it specifically says:

The character "#" is excluded
because it is used to delimit a URI from a fragment identifier in URI
references (Section 4).

The RFC in question is 2396 if you want to take a look as well.
By: John Covert (jcovert) 2006-11-14 12:35:51.000-0600

A generic uri encoding function that meets general-purpose needs would require quite a bit of parameterization. Deciding exactly what to encode is non-trivial and depends on the component of a URI being encoded. According to RFC 2396 encoding can be done by "only the mechanism responsible for generating or interpreting that component".

As supplied, ${URIENCODE(...)} encodes characters with an char value greater than 128, the "space" character, and the ten characters which RFC 2396 specifies as "reserved":

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","

The RFC further specifies a class of "unreserved" characters: "These include upper and lower case letters, decimal digits, and a limited set of punctuation marks and symbols."

unreserved = alphanum | mark

mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

The RFC further states:

Unreserved characters can be escaped without changing the semantics
of the URI, but this should not be done unless the URI is being used
in a context that does not allow the unescaped character to appear.

And then states:

Data must be escaped if it does not have a representation using an
unreserved character; this includes data that does not correspond to
a printable character of the US-ASCII coded character set, or that
corresponds to any US-ASCII character that is disallowed, as
explained below.

Thus it follows that contrary to the implementation, which escapes the reserved characters, what is actually supposed to be escaped are all characters except for the unreserved characters. However, that must be done with data within a URI rather than with the entire URI, because the reserved characters are typically delimiters, and should NOT be escaped when they are performing their function as a delimiter. This is why only the software actually responsible for building the URI can know what to encode; the encoding must be done on the components before they are assembled and separated by delimiters from the reserved characters.

The "#" character is neither a reserved nor an unreserved character. Since all characters which are not unreserved should be escaped, it should generally be escaped. The component on the receiving side with the responsibility for interpreting it and using it as a "fragment identifier" in accordance with Section 4 of RFC 2396 is responsible for having decoded the data it is operating on before it gets to the stage where it would need to interpret a "#".

Bottom line: the AGI the submitter is passing the data to needs to be able to deal with the data component by component and do the URI encoding before sending the URI on to the destination, or the Dialplan code needs to handle the URI encoding by calling an enhanced URIENCODE function which accepts parameters controlling which characters are to be encoded and combining the results of possibly several calls to this function with unencoded delimiters.

/john
By: John Covert (jcovert) 2006-11-14 12:37:13.000-0600

"an char value greater than 128" should be
"a char value greater than 127"

/john
By: Joshua C. Colp (jcolp) 2006-11-14 13:28:07.000-0600

Excellent explanation from jcovert! I'm comfortable with closing this bug for now, but if you would like to contribute an enhanced URIENCODE as outlined feel free to. Peace!
By: salaud (salaud) 2006-11-14 13:39:17.000-0600

I didn't know that 'won't fix' is a resolution type. Shouldn't the issue remain open for anyone who is interested to fix this? Or is, 'closed' - 'won't fix' still seen as 'open' - 'someone else' fix?

It needs to be fixed by someone and people need to be able to know that they can have the opportunity to fix it (aside from an attached note). This issue should affect almost everyone using FastAGI.

It may be just my confusion on the process.

Good thing that res_perl exists or I would be sunk. Too bad this issue isn't seen as needing any fixing. Lastly, the URI::uri_escape function in perl needs no parameters to work. It just works. Not sure how that matches with what jcovert formulated. Perhaps someone could just copy the reasoning of that code.
By: Joshua C. Colp (jcolp) 2006-11-14 13:57:17.000-0600

You need a generic URI encoding function, that's not what URIENCODE is. It follows specific parameters as outlined in the RFC a bit. Would I call this a bug? No, it's not working for how you need it to. As John said you can either create a generic URI encoding function that gives you more control and operates how you need/want it to, or do stuff in your way as you have already done. I'm sure others would like a generic purpose URI encoding function so if it is something you would like spark up interest on the mailing list, pay a developer, etc but traditionally we don't keep feature requests on here. As always if you do think that we are wrong talk more about why and we'll see. I'll keep this bug open for now and assign it to me until I hear your response as I would definitely like to work this out.
By: salaud (salaud) 2006-11-14 14:17:37.000-0600

"You need a generic URI encoding function, that's not what URIENCODE is."

What is it, then? Why is it called 'URIENCODE', if it does not encode URI's? Perhaps a clear statement of what URIENCODE is designed to do might be in order, as the help suggests it is a function that encodes URI's.

"Would I call this a bug? No, it's not working for how you need it to."

It is definitely NOT working for how I need it to work. That's why I opened this bug. In short, if a function called URIENCODE doesn't encode a URI fully and correctly, I would call that a bug, not a feature.

Perhaps just change the name of the function to something that makes more sense with how it functions? URISORTAENCODE? :)

Bottom line is that there exists a CLEAR expectation that the function perform as other functions that encode URI's do if it is called URIENCODE. I would say either, in all seriousness, (1) change the function name, (2) fix the function (3) change the help and description of the function that appears in the CLI and also on the WIKI to CLEARLY specify the scope of functionality and its uses. or (4) Forget about the function being used.
By: Joshua C. Colp (jcolp) 2006-11-15 14:47:32.000-0600

I made a minor adjustment to the documentation but if there's something specific you would like to see in it so others will not encounter the same as you then make a note.

[scrollkeeper adds] r.47625 of 1.4

By: Joshua C. Colp (jcolp) 2006-11-20 20:54:22.000-0600

I'm closing this out now since it's been a bit. If you would like to expand on the minor documentation change I did please feel free to reopen this, submit a new bug, or head to the mailing list to get some feedback from other individuals. Peace!