[Home]

Summary:ASTERISK-14057: [patch] Language handling for numbers, dates, etc is misbehaving when utilizing sub-regional languages
Reporter:Nir Simionovich (GreenfieldTech - Israel) (greenfieldtech)Labels:
Date Opened:2009-05-03 10:50:46Date Closed:2009-06-30 16:30:34
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Applications/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20090511__issue15022.diff.txt
( 1) 20090519__issue15022.diff.txt
Description:Per a discussion at Asterisk Euro DevCon, an issue was raised when a language is defined as "language_variant" format, for example, "fr_ca" (Candian French).

According to KPF and russelb, Asterisk is supposed to play files from the /var/lib/asterisk/sounds/[LANG_DIRECTORY] and /var/lib/asterisk/sounds/digits/[LANG_DIRECTORY], while at the same time, while playing the numbers, utilizing the logic intended for the "language" identified. However, after testing the edge case, we've discovered that Asterisk will default to the English grammer, thus, playing the entire grammer wrong.

****** ADDITIONAL INFORMATION ******

The above case was confirmed by creating a very simple dialplan:

Context 'custom-saytest' created by 'pbx_config' ]
'_X.' =>          1. Answer()                                   [pbx_config]
                 2. wait(1)                                    [pbx_config]
                 3. Set(CHANNEL(language)=he)                  [pbx_config]
                 4. SayNumber(12345)                           [pbx_config]
                 5. Set(CHANNEL(language)=he_jr)               [pbx_config]
                 6. SayNumber(12345)                           [pbx_config]

initially, we've created a directory under "/var/lib/asterisk/sounds/digits/he_jr", which contained the same files as "/var/lib/asterisk/sounds/digits". If all was working fine, when playing step 6, the grammer would have been hebrew, while the recordings would have been english. However, both the language and grammer were English, thus, Asterisk defaulted to English, as indicated in the code.
Comments:By: Tzafrir Cohen (tzafrir) 2009-05-08 09:03:54

No need for extra sound files to reproduce this. When verbose enough, Asterisk will show us what it wants to play.

sweetmorn*CLI> dialplan show custom-saytest
[ Context 'custom-saytest' created by 'pbx_config' ]
 '_X.' =>          1. Answer()                                   [pbx_config]
                   2. wait(1)                                    [pbx_config]
                   3. Set(CHANNEL(language)=he)                  [pbx_config]
                   4. SayNumber(${EXTEN})                        [pbx_config]
                   5. Set(CHANNEL(language)=he_jr)               [pbx_config]
                   6. SayNumber(${EXTEN})                        [pbx_config]

(Same as above, except using the given extension number)



sweetmorn*CLI> core set verbose 3
Verbosity is at least 3
sweetmorn*CLI> channel originate Local/201@custom-saytest application Echo
   -- Executing [201@custom-saytest:1] Answer("Local/201@custom-saytest-e55b;2", "") in new stack
   -- Launching Echo() on Local/201@custom-saytest-e55b;1
   -- Executing [201@custom-saytest:2] Wait("Local/201@custom-saytest-e55b;2", "1") in new stack
   -- Executing [201@custom-saytest:3] Set("Local/201@custom-saytest-e55b;2", "CHANNEL(language)=he") in new stack
   -- Executing [201@custom-saytest:4] SayNumber("Local/201@custom-saytest-e55b;2", "201") in new stack
   -- ast_say_digits_full: started. num: 201, options="(null)"
   -- ast_say_digits_full: num: 201, state=0, options="(null)", mf=-1
   -- ast_say_digits_full: num: 201, state=0, options="(null)", mf=-1, tmpnum=0
[May  8 17:00:23] WARNING[30356]: file.c:641 ast_openstream_full: File digits/200 does not exist in any format
[May  8 17:00:23] WARNING[30356]: file.c:924 ast_streamfile: Unable to open digits/200 (format 0x40 (slin)): No such file or directory
   -- ast_say_digits_full: num: 1, state=2, options="(null)", mf=-1, tmpnum=0
[May  8 17:00:23] WARNING[30356]: file.c:641 ast_openstream_full: File digits/ve does not exist in any format
[May  8 17:00:23] WARNING[30356]: file.c:924 ast_streamfile: Unable to open digits/ve (format 0x40 (slin)): No such file or directory
   -- ast_say_digits_full: num: 1, state=0, options="(null)", mf=-1, tmpnum=0
   -- <Local/201@custom-saytest-e55b;2> Playing 'digits/1.gsm' (language 'he')
   -- Executing [201@custom-saytest:5] Set("Local/201@custom-saytest-e55b;2", "CHANNEL(language)=he_jr") in new stack
   -- Executing [201@custom-saytest:6] SayNumber("Local/201@custom-saytest-e55b;2", "201") in new stack
   -- <Local/201@custom-saytest-e55b;2> Playing 'digits/2.gsm' (language 'he_jr')
   -- <Local/201@custom-saytest-e55b;2> Playing 'digits/hundred.gsm' (language 'he_jr')
   -- <Local/201@custom-saytest-e55b;2> Playing 'digits/1.gsm' (language 'he_jr')
   -- Auto fallthrough, channel 'Local/201@custom-saytest-e55b;2' status is 'UNKNOWN'


As we can see, for the language 'he', 201 is:

 200
 1

This is because Hebrew has a word for 200 (??????).

For the language 'he_jr, however, we get:

 2
 houndred
 1

which is from the original English saynumber function.

By: Tzafrir Cohen (tzafrir) 2009-05-08 09:09:43

Looking at main/say.c , I see that en_GB has its own say_number_full function: ast_say_number_full_en_GB()

The proposed change will break that.

Note, however, that "pt" and "pt_BR" are explicitly using the same ast_say_number_full_pt().

By: Tzafrir Cohen (tzafrir) 2009-05-08 09:18:32

To compare the two functions:

perl -n -e 'print if (/static int ast_say_number_full_en\(.*\)$/../^}/)' main/say.c  >en

perl -n -e 'print if (/static int ast_say_number_full_en_GB\(.*\)$/../^}/)' main/say.c  >en_GB

diff -u en en_GB

As an unrelated note, the original _en one could also use some indentation removal by making the big 'else' a 'else if'.

By: Tilghman Lesher (tilghman) 2009-05-11 12:36:53

In your description, you say that "initially, we've created a directory under '/var/lib/asterisk/sounds/digits/he_jr'."  However, the right directory for trunk should actually be '/var/lib/asterisk/sounds/he_jr/digits/'.  Note the swap of the final two directories.

Note that this behavior can be changed back to the original 1.4 path, if you set languageprefix=no in the [options] section of asterisk.conf.  However, in 1.6 and trunk, this option defaults to "yes".

I need you confirm what this setting is, since that goes directly to how Asterisk behaves and MAY explain the behavior you're seeing.



By: Tzafrir Cohen (tzafrir) 2009-05-11 12:46:59

See my first note. The actual sound files are not really required to demonstrate the issue. So this is really a technical matter.

First thing to decide is whether this actually needs changing. Given that "en" and "en_GB" are actually two different "languages" today.

By: Tilghman Lesher (tilghman) 2009-05-11 14:38:39

tzafrir:  the issue, as I understand it, is that Asterisk is not "finding" the files necessary, which is why directory ordering is indeed germane to the issue.

EDIT: I think I see.  There are two problems here.  The first is that the directories are specified incorrectly, and so the wrong files are used (English is the fallback).  The second problem is the one you alluded to, which is that the grammar used is incorrect, because an exact string comparison is done, while a substring should be used instead.  Patch uploaded to fix this second problem.



By: Tzafrir Cohen (tzafrir) 2009-05-19 02:09:39

make that:

if (!strncasecmp(language, "en_GB", 5))

as well

By: Tzafrir Cohen (tzafrir) 2009-05-19 02:28:22

I'm trying to see how this should be documented, to understand if the coded behavior is consistent:

Sound files should reside in a subdirectory whose name is <CODE>[_<NAME>]

<CODE>: normally the ISO639-1 two-letter language code. e.g. "en" for English, "fr" for French. This will be used to set rules for syntax and numbers. As a special case, "en_GB" has a slightly different syntax than "en".

<Name>: If you have more than one set of sound files of the same language, you can have several directories. Make sure that the first character after <CODE> is an Underscore (_).


What happens if other countries turn out to be like en_GB? Are there such other cases?

By: Tilghman Lesher (tilghman) 2009-05-19 10:31:12

tzafrir:  Oh, there are lots of other cases.  Probably the worst of them are es_XX, where XX is a dialect of Spanish in each of their former colonies, whose languages have strayed in completely different ways over the centuries.

By: Tilghman Lesher (tilghman) 2009-05-19 13:05:27

tzafrir:  why would I need to do a strncasecmp(language, "en_GB", 5)?  There's no longer match possible.  That change does not improve the code in any conceivable way.

By: Tilghman Lesher (tilghman) 2009-05-19 13:36:32

Patch updated, because addition of Urdu broke the existing patch.

By: Leif Madsen (lmadsen) 2009-06-01 11:15:44

It would be ideal if the reporter of this issue could test the patch provided by Tilghman. Thanks!

By: Digium Subversion (svnbot) 2009-06-30 15:23:52

Repository: asterisk
Revision: 204556

U   branches/1.4/UPGRADE.txt
U   branches/1.4/main/say.c

------------------------------------------------------------------------
r204556 | tilghman | 2009-06-30 15:23:51 -0500 (Tue, 30 Jun 2009) | 6 lines

More incorrect language codes, plus ensuring that regionalizations use the specified language, and not English for grammar.
(closes issue ASTERISK-14057)
Reported by: greenfieldtech
Patches:
      20090519__issue15022.diff.txt uploaded by tilghman (license 14)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=204556

By: Digium Subversion (svnbot) 2009-06-30 15:41:05

Repository: asterisk
Revision: 204563

_U  trunk/
U   trunk/UPGRADE.txt
U   trunk/main/say.c

------------------------------------------------------------------------
r204563 | tilghman | 2009-06-30 15:41:04 -0500 (Tue, 30 Jun 2009) | 13 lines

Merged revisions 204556 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r204556 | tilghman | 2009-06-30 15:23:51 -0500 (Tue, 30 Jun 2009) | 6 lines
 
 More incorrect language codes, plus ensuring that regionalizations use the specified language, and not English for grammar.
 (closes issue ASTERISK-14057)
  Reported by: greenfieldtech
  Patches:
        20090519__issue15022.diff.txt uploaded by tilghman (license 14)
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=204563

By: Digium Subversion (svnbot) 2009-06-30 16:21:44

Repository: asterisk
Revision: 204581

_U  branches/1.6.0/
U   branches/1.6.0/UPGRADE.txt
U   branches/1.6.0/main/say.c

------------------------------------------------------------------------
r204581 | tilghman | 2009-06-30 16:21:44 -0500 (Tue, 30 Jun 2009) | 20 lines

Merged revisions 204563 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r204563 | tilghman | 2009-06-30 15:41:04 -0500 (Tue, 30 Jun 2009) | 13 lines
 
 Merged revisions 204556 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r204556 | tilghman | 2009-06-30 15:23:51 -0500 (Tue, 30 Jun 2009) | 6 lines
   
   More incorrect language codes, plus ensuring that regionalizations use the specified language, and not English for grammar.
   (closes issue ASTERISK-14057)
    Reported by: greenfieldtech
    Patches:
          20090519__issue15022.diff.txt uploaded by tilghman (license 14)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=204581

By: Digium Subversion (svnbot) 2009-06-30 16:30:23

Repository: asterisk
Revision: 204611

_U  branches/1.6.2/
U   branches/1.6.2/UPGRADE.txt
U   branches/1.6.2/main/say.c

------------------------------------------------------------------------
r204611 | tilghman | 2009-06-30 16:30:23 -0500 (Tue, 30 Jun 2009) | 20 lines

Merged revisions 204563 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r204563 | tilghman | 2009-06-30 15:41:04 -0500 (Tue, 30 Jun 2009) | 13 lines
 
 Merged revisions 204556 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r204556 | tilghman | 2009-06-30 15:23:51 -0500 (Tue, 30 Jun 2009) | 6 lines
   
   More incorrect language codes, plus ensuring that regionalizations use the specified language, and not English for grammar.
   (closes issue ASTERISK-14057)
    Reported by: greenfieldtech
    Patches:
          20090519__issue15022.diff.txt uploaded by tilghman (license 14)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=204611

By: Digium Subversion (svnbot) 2009-06-30 16:30:33

Repository: asterisk
Revision: 204612

_U  branches/1.6.1/
U   branches/1.6.1/UPGRADE.txt
U   branches/1.6.1/main/say.c

------------------------------------------------------------------------
r204612 | tilghman | 2009-06-30 16:30:33 -0500 (Tue, 30 Jun 2009) | 20 lines

Merged revisions 204563 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r204563 | tilghman | 2009-06-30 15:41:04 -0500 (Tue, 30 Jun 2009) | 13 lines
 
 Merged revisions 204556 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r204556 | tilghman | 2009-06-30 15:23:51 -0500 (Tue, 30 Jun 2009) | 6 lines
   
   More incorrect language codes, plus ensuring that regionalizations use the specified language, and not English for grammar.
   (closes issue ASTERISK-14057)
    Reported by: greenfieldtech
    Patches:
          20090519__issue15022.diff.txt uploaded by tilghman (license 14)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=204612