Summary:ASTERISK-16727: [patch] func_curl CURLOPT lacks a querycomponent (+-decoding) hashcompat mode
Reporter:Walter Doekes (wdoekes)Labels:
Date Opened:2010-09-24 04:16:07Date Closed:2011-01-07 12:24:46.000-0600
Versions:Frequency of
Environment:Attachments:( 0) 20100930__issue18046.diff.txt
( 1) issue18046_func_curl_hashcompat_querycomponent-
( 2) issue18046_func_curl_hashcompat_querycomponent-

in https://issues.asterisk.org/view.php?id=18037 I've added a QSFIELD which is pretty much obsoleted by the hashcompat mode of CURLOPT.

However, the ast_uri_decode leaves +'es verbatim in the decoded output, where I expect them to be decoded to spaces.

The attached patch adds a hashcompat=querycomponent mode that decodes the pluses properly. (For backwards compatibility the ast_value() works for all other values.)

Walter Doekes
Comments:By: Tilghman Lesher (tilghman) 2010-09-26 12:12:18

I don't think this is the correct decoding of the '+' character.  RFC 1738 states that the '+' character is reserved, because it MAY be used within particular URI schemes, but the exact interpretation of the '+' character is left up to particular URI schemes.  In particular, SIP URIs specify no particular use of the '+' character and therefore, it SHOULD NOT appear within an encoded SIP URI.

The only URI specified within RFC 1738 which translates the '+' character into a space is the Gopher URI, in which it is used to separate search parameters.  Note that a space encoded as a %20 may still be used in a Gopher URI, as there is a specific difference between a literal space within a search string and a '+' character denoting an additional search parameter.  However, in the generic case, we really should not be making any assumptions about the use of the '+' character.

Additionally, note the following clause in Section 2.2 of RFC 3986:
"If a reserved character is found in a URI component and no delimiting role is known for that character, then it must be interpreted as representing the data octet corresponding to that character's encoding in US-ASCII."  In other words, a literal '+' character within a SIP URI _MUST_ be interpreted as a literal '+' character, NOT as a space character.

By: Walter Doekes (wdoekes) 2010-09-26 14:29:04

Your criticism is misdirected. I do not propose to replace ast_uri_decode() with my version (that would indeed break SIP). I only propose to _add_ the "query component" mode (RFC2396 3.4) to the CURL return value decoding options. (Look at the patch.)

> In other words, a literal '+' character within a SIP URI _MUST_ be
> interpreted as a literal '+' character, NOT as a space character.

I'm all with you. But we're not talking about SIP URIs here, per se. We're talking about user data retrieved through CURL.

My problem is the following: my customer wants to interface with its customers using a simple interface. Give the customer a list of key-value pairs (HTTP GET query string) and have them reply with an equally simple set of key-value pairs. These customers will expect to run some kind of urlencode() in their favorite web programming language on their dictionary of return values, and that function will generally be replacing spaces with pluses (standard application/x-www-form-urlencoded rules). And I would agree with them that that's logical.

By: Tilghman Lesher (tilghman) 2010-09-26 22:43:50

Yes, that is logical.  But if you look at the same RFCs, you'll find that while '+' is still a reserved character for the HTTP URI, encoding a space SHOULD be done as '%20', NOT as '+'.  In other words, you need to change whatever the other side is using to be compliant, not to add a mode to Asterisk that is equally non-compliant.

Please see RFC 1738, section 3.4.7 for an example of why a space is NEVER encoded as a '+'.  To do so is simply wrong.

By: Tilghman Lesher (tilghman) 2010-09-26 23:01:41

Your "query component" mode is not described in RFC 2396, but only in RFC 1866, which has been marked as obsolete (see RFC 2854, which does not describe this method).  Therefore, you should not be using it in any new applications.

By: Walter Doekes (wdoekes) 2010-09-27 02:24:57

I concede that you have superior RFC reading skills.

However, not having to beat every user with a clue-bat will save time and money. You must agree with me that the users will use instinctively use http_build_query, urllib.urlencode and so forth. x-www-form-urlencoding is not about to go away any time soon. And often these languages do not have an equivalent dictionary-to-string function that allows them to not encode spaces as pluses.

So. I'll gladly redo the patch and rename "querycomponent-mode" to "deprecated-mode" or whatever you prefer, but I'd still like some version of it included.


By: Tilghman Lesher (tilghman) 2010-09-28 10:01:28

and every language you've cited also notes that the +-encoding is deprecated in the documentation and provides an alternate which does NOT do that.


By: Tilghman Lesher (tilghman) 2010-09-30 12:48:38

Switching to 1.8, as 1.6.2 is feature frozen.

By: Walter Doekes (wdoekes) 2010-10-04 05:20:14

Thanks for your patch tilghman.

You missed one replacement at (original) line 496:
- hashcompat = (cur->value != NULL) ? 1 : 0;
+ hashcompat = (long) cur->value;

Other than that, it works like I need it to :)

By: Walter Doekes (wdoekes) 2011-01-06 04:09:51.000-0600

(Oh, and I believe you also missed a ast_calloc before new->value = ...)

Here is an updated patch for -- issue 18161 has been fixed -- without your socks-edits and with my two comments.

By: Digium Subversion (svnbot) 2011-01-06 11:28:33.000-0600

Repository: asterisk
Revision: 300840

U   trunk/funcs/func_curl.c

r300840 | tilghman | 2011-01-06 11:28:32 -0600 (Thu, 06 Jan 2011) | 7 lines

Add a hashcompat mode called "legacy", which translates a literal plus sign to a space.

(closes issue ASTERISK-16727)
Reported by: wdoekes
      20100930__issue18046.diff.txt uploaded by tilghman (license 14)



By: Walter Doekes (wdoekes) 2011-01-06 12:03:57.000-0600

I believe you missed the crucial part in that commit ;)

The bit with

+ if (hashcompat == HASHCOMPAT_LEGACY) {
+ char *plus;
+ while ((plus = strchr(piece, '+')) || (plus = strchr(name, '+'))) {
+ *plus = ' ';
+ }
+ }

around line 657...

By: Digium Subversion (svnbot) 2011-01-07 12:23:53.000-0600

Repository: asterisk
Revision: 301008

U   trunk/funcs/func_curl.c

r301008 | tilghman | 2011-01-07 12:23:53 -0600 (Fri, 07 Jan 2011) | 5 lines

Oops, missed the actual decoding part.

(closes issue ASTERISK-16727)
Reported by: wdoekes