Summary: | ASTERISK-05843: [post 1.4] improper handling of contexts with same name | ||
Reporter: | Luigi Rizzo (rizzo) | Labels: | |
Date Opened: | 2005-12-14 15:39:11.000-0600 | Date Closed: | 2008-03-12 17:47:41 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Core/Configuration |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | [I have put reproducibility=always because the problem is deterministic, and severity=major because it is an undetected configuration error which may result in serious and hard-to-detect misbehaviours of the dialplan. Then, your mileage may vary] When using regcontext=xyz where xyz is the name of a context already existing in extensions.conf (or probably some other config file as well), asterisk will create two instances of the context xyz. However the extension lookup code will stop the search after the first instance, thus resulting in unexpected results. A stripped down example is below, where the _5. entry is in exension.conf, and the 551 entry is the result of regcontext=local-users regexten=551 in sip.conf for a peer. The source of the problem is that regcontext creates immediately the empty context in the global list, whereas pbx_config later builds contexts in a temporary list, then calling ast_merge_contexts_and_delete() at line 1776 to merge the two lists. Unfortunately the list merging only puts the local list in front of the existing contexts, without checks for duplicates. I have no idea on what is the proper fix, nor what is the behaviour on a 'extension reload' or similar. Surely, at the very least the code should produce a big warning message in case we found one such misconfiguration (i.e. multiple contexts with the same name), if merging the two is not possible or too expensive. On a related topic: most functions that compare context names are case-sensitive, however a few of them are not, e.g. ast_context_create() complete_show_dialplan_context() __ast_context_destroy() and possibly more. Apart from the inconsistency that needs to be fixed, there is also the issue that most of asterisk is case-insensitive when it comes to names, so i think you should state clearly what is the policy and why contexts are dealt with in a different way. ****** ADDITIONAL INFORMATION ****** *CLI> show dialplan local-users [ Context 'local-users' created by 'pbx_config' ] '_5.' => 2. Dial(SIP/${EXTEN}) [pbx_config] [ Context 'local-users' created by 'SIP' ] '551' => 1. Noop(551) [SIP] *CLI> dial 551@local-users No such extension '551' in context 'local-users' *CLI> | ||
Comments: | By: Luigi Rizzo (rizzo) 2005-12-14 15:40:45.000-0600 BTW i do have the disclaimer on file... and the problem seems to be a long standing one, not related to a particular SVN version. By: Olle Johansson (oej) 2005-12-15 05:24:34.000-0600 Let's try to take one problem per issue report. I would suggest focusing on the multiple contexts in this one and opening another to discuss the rules for context handling. By: Luigi Rizzo (rizzo) 2005-12-15 05:58:40.000-0600 i generally try (modulo mistakes) to keep issues separate, but in this case i reported them together intentionally. The two issues are strongly related, because the fix has to decide when two contexts have the same name, and this is either case-sensitive or case-insensitive, and a decision has to be made. The obvious solution would be to define a function (or macro) ast_ctx_match() and use it consistently wherever we try a context name match, so we can revisit the decision very quickly in the future. By: Matt O'Gorman (mogorman) 2006-01-17 11:02:44.000-0600 Rizzo I tried duplicating this behavior in our lab machine, and was unable to do so. The 2 contexts merged fine. [ Context 'default' created by 'pbx_config' ] '1000' => 1. VoicemailMain() [pbx_config] '1234' => 1. Noop(polytest2) [SIP] 2. dial(sip/linphone) [pbx_config] '4321' => 1. voicemailmain() [pbx_config] '501' => 1. Dial(SIP/polytest2) [pbx_config] '601' => 1. Dial(SIP/polytest) [pbx_config] '6025' => hint: Zap/3 [pbx_config] '6252' => hint: Zap/1 [pbx_config] '6266' => hint: Zap/2 [pbx_config] '7000' => 1. Dial(sip/linphone) [pbx_config] 'linphone' => 1. Noop(linphone) [SIP] 'polytest' => 1. Noop(polytest) [SIP] Include => 'parkedcalls' [pbx_config] the only issue at all is that my regexten flattened a line in my config file , which some might not consider a bug, are you still able to replicate this as of svn trunk 8132 By: Luigi Rizzo (rizzo) 2006-01-17 11:21:03.000-0600 yes, see below. Thing is, nothing has changed in the relevant code, so the analysis above (of which i am pretty confident now) still applies, please re-read it: the code does not check for duplicate contexts, simply appends two lists. The only reason wny you might not see the problem is if pbx_config runs before the external module - e.g. (just guessing) if you manually load chan_sip after pbx_config has run maybe ? *CLI> show version Asterisk SVN-trunk-r8127M built by luigi @ prova.iet.unipi.it on a i386 running FreeBSD on 2006-01-17 19:29:47 UTC *CLI> show dialplan local-users [ Context 'local-users' (1) created by 'pbx_config' ] '_99' => 2. Noop(test) [pbx_config] [ Context 'local-users' (1) created by 'SIP' ] '551' => 1. Noop(551) [SIP] '552' => 1. Noop(552) [SIP] -= 3 extensions (3 priorities) in 2 contexts. =- By: Luigi Rizzo (rizzo) 2006-01-17 17:02:39.000-0600 I see two possible fixes: - the easy way is to allow only one registrar per context, and report an error (in add_extension()) when one tries to register an extension in a context with a registrar different from the existing one. Then the merge function could simply replace (entirely) the contexts with the same name from the same registrar, or even all contexts from the same registrar (this is how the code works now, except that it doesn't check for the multiple-registrar case). - the alternative, more expensive, to allow entries from different registrars in the same context, is to have the merge function call the equivalent of ast_add_extension2() (but using the already allocated entry) on each element <ctx,ext,pri> of the list to merge, replacing existing entries. Then the second parameter to ast_merge_contexts_and_delete() becomes useless. By: Matt O'Gorman (mogorman) 2006-01-17 18:09:11.000-0600 hey rizzon in commit 8162 and 8163 i changed the default load of the modules so that pbx_config and pbx_ael get loaded before channel structures as that is the way it should be anyways. That should make your issue dissapear By: Luigi Rizzo (rizzo) 2006-01-18 00:29:06.000-0600 just a note to remember that we need to revisit the issue. The change of load order only fixes the problem temporarily, because if you add an extension to the dialplan for a context that already "belongs" to another registrar, and issue an "extensions reload", you will see the problem again - a new context is created with the same name. I really believe that a proper fix involves one of the two approaches that I mentioned. By: Leif Madsen (lmadsen) 2006-05-02 22:53:04 /housekeeping Since rizzo even noted that this needs to be revisited, I'm bringing it up for discussion. By: jmls (jmls) 2006-10-31 03:41:15.000-0600 /housekeeping rizzo, regarding 0039786: any more thoughts on the fix required ? By: jmls (jmls) 2006-11-19 13:29:26.000-0600 hey - another 20 days have passed. PING PING PING :) By: Serge Vecher (serge-v) 2007-02-28 13:46:52.000-0600 perhaps Mr. Dialplan Wizard can rescue this bug By: Steve Murphy (murf) 2007-03-01 12:26:29.000-0600 OK, been looking at the code. I have to add one more requirement to merge_contexts_and_delete: that it hold the locks for less than a frame time. Since the freeing and destruction of list elements is the most time-consuming part of the operation, (or, at least, WAS), I propose this algorithm: 4 lists are involved: 1. the list of contexts to merge into the dialplan (extcontexts) 2. the existing contexts (contexts) 3. a list containing extens to free 4. a list containing just contexts to free The algorithm would go something like this: 1. get the conlock & hintlock 2. preserve the watchers as before 3. traverse the dp, and unlink all exten/prio that match registrar. Do Not remove any contexts (yet). Unlink them from the contexts list, and link them instead to the list of extens to free. 4. traverse the dp again, and for any empty contexts, that match registrar, unlink from contexts, and link to the contexts to free list. this and #3 might be tied into a single traversal. 5. Now, for each context in extcontexts, search for a match in contexts. (THIS MAKES ME NERVOUS IF THE DP IS BIG!) if found: either the context or something in it has a different registrar. go thru the contexts entry, and relink the exten/prios into the matching extcontext's entry. If there are exten collisions, (THIS SEARCH MAKES ME NERVOUS IF THE DP IS BIG!) then take all the contexts prios for that exten, and insert them into the collided extcontexts exten. Issue a warning only if there are collisions. Keep the extcontexts version. After all the prios are merged, then put the contexts exten into the free list. Now, Move the now empty contexts' context into the free context list. Then move the merged context into the contexts list from the extcontext list. if not found: link the context from extcontexts to contexts. This is the "quick and easy" path. 6. Restore the watchers, as is now being done. 7. Unlock the above locks, 8. Destroy the stuff in the to-be-freed lists 9. Return. This will keep the regcontexts. If the dp is big, or the extcontexts big, this operation will run dangerously slow! Having O(1) search times for all 3 types of search (context, exten, prio) would be a big plus. Will this be sufficient? The key is getting part 5 right. By: Brandon Kruse (bkruse) 2008-01-27 23:38:19.000-0600 Hey Guys, Throwing out some housekeeping. It has been almost a year. What is the status on this issue. Thanks! -bk By: Steve Murphy (murf) 2008-03-05 11:39:40.000-0600 OK, I've published both intention and then completion of fixes to this bug on the asterisk-dev, both of which letters got absolutely no response (deer in the headlights?) They are: http://lists.digium.com/pipermail/asterisk-dev/2008-February/032065.html and: http://lists.digium.com/pipermail/asterisk-dev/2008-March/032124.html I have the fixes in team/murf/bug6002 Please review and test! I will commit these fixes to trunk soon if there are no objections. By: Digium Subversion (svnbot) 2008-03-07 12:54:02.000-0600 Repository: asterisk Revision: 106757 U trunk/apps/app_dial.c U trunk/apps/app_meetme.c U trunk/apps/app_queue.c U trunk/channels/chan_iax2.c U trunk/channels/chan_sip.c U trunk/channels/chan_skinny.c U trunk/include/asterisk/pbx.h U trunk/include/asterisk/pval.h U trunk/main/features.c U trunk/main/pbx.c U trunk/pbx/pbx_ael.c U trunk/pbx/pbx_config.c U trunk/res/ael/ael.flex U trunk/res/ael/ael.tab.c U trunk/res/ael/ael.tab.h U trunk/res/ael/ael.y U trunk/res/ael/ael_lex.c U trunk/res/ael/pval.c U trunk/utils/Makefile U trunk/utils/ael_main.c U trunk/utils/conf2ael.c U trunk/utils/extconf.c ------------------------------------------------------------------------ r106757 | murf | 2008-03-07 12:53:59 -0600 (Fri, 07 Mar 2008) | 126 lines (closes issue ASTERISK-5843) Reported by: rizzo Tested by: murf Proposal of the changes to be made, and then an announcement of how they were accomplished: http://lists.digium.com/pipermail/asterisk-dev/2008-February/032065.html and: http://lists.digium.com/pipermail/asterisk-dev/2008-March/032124.html Here is a recap, file by file, of what I have done: pbx/pbx_config.c pbx/pbx_ael.c All funcs that were passed a ptr to the context list, now will ALSO be passed a hashtab ptr to the same set. Why? because (for the time being), the dialplan is stored in both, to facilitate a quick, low-cost move to hash-tables to speed up dialplan processing. If it was deemed necessary to pass the context LIST, well, it is just as necessary to have the TABLE available. This is because the list/table in question might not be the global one, but temporary ones we would use to stage the dialplan on, and then swap into the global position when things are ready. We now have one external function for apps to use, "ast_context_find_or_create()" instead of the pre-existing "find" and "create", as all existing usages used both in tandem anyway. pbx_config, and pbx_ael, will stage the reloaded dialplan into local lists and tables, and then call merge_contexts_and_delete, which will merge (now) existing contexts and priorities from other registrars into this local set by copying them. Then, merge_contexts_and_delete will lock down the contexts, swap the lists and tables, and unlock (real quick), and then destroy the old dialplan. chan_sip.c chan_iax.c chan_skinny.c All the channel drivers that would add regcontexts now use the ast_context_find_or_create now. chan_sip also includes a small fix to get rid of warnings about removing priorities that never got entered. apps/app_meetme.c apps/app_dial.c apps/app_queue.c All the apps that added a context/exten/priority were also modified to use ast_context_find_or_create instead. include/asterisk/pbx.h ast_context_create() is removed. Find_or_create_ is the new method. ast_context_find_or_create() interface gets the hashtab added. ast_merge_contexts_and_delete() gets the local hashtab arg added. ast_wrlock_contexts_version() is added so you can detect if someone else got a writelock between your readlocking and writelocking. ast_hashtab_compare_contexts was made public for use in pbx_config/pbx_ael ast_hashtab_hash_contexts was in like fashion make public. include/asterisk/pval.h ast_compile_ael2() interface changed to include the local hashtab table ptr. main/features.c For the sake of the parking context, we use ast_context_find_or_create(). main/pbx.c I changed all the "tree" names to "table" instead. That's because the original implementation was based on binary trees. (had a free library). Then I moved to hashtabs. Now, the names move forward too. refcount field added to contexts, so you can keep track of how many modules wanted this context to exist. Some log messages that are warnings were inflated from LOG_NOTICE to LOG_WARNING. Added some calls to ast_verb(3,...) for debug messages Lots of little mods to ast_context_remove_extension2, which is now excersized in ways it was not previously; one definite bug fixed. find_or_create was upgraded to handle both local lists/tables as well as the globals. context_merge() was added to do the per-context merging of the old/present contexts/extens/prios into the new/proposed local list/tables ast_merge_contexts_and_delete() was heavily modified. ast_add_extension2() was also upgraded to handle changes. the context_destroy() code was re-engineered to handle the new way of doing things, by exten/prio instead of by context. res/ael/pval.c res/ael/ael.tab.c res/ael/ael.tab.h res/ael/ael.y res/ael/ael_lex.c res/ael/ael.flex utils/ael_main.c utils/extconf.c utils/conf2ael.c utils/Makefile Had to change the interface to ast_compile_ael2(), to include the hashtab ptr. This ended up involving several external apps. The main gotcha was I had to include lock.h and hashtab.h in several places. As a side note, I tested this stuff pretty thoroughly, I replicated the problems originally reported by Luigi, and made triply sure that reloads worked, and everything worked thru "stop gracefully". I found a and fixed a few bugs as I was merging into trunk, that did not appear in my tests of bug6002. How's this for verbose commit messages? ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=106757 By: Digium Subversion (svnbot) 2008-03-12 17:47:41 Repository: asterisk Revision: 108351 _U branches/1.6.0/ ------------------------------------------------------------------------ r108351 | russell | 2008-03-12 17:47:37 -0500 (Wed, 12 Mar 2008) | 133 lines Blocked revisions 106757 via svnmerge ........ r106757 | murf | 2008-03-07 12:57:57 -0600 (Fri, 07 Mar 2008) | 126 lines (closes issue ASTERISK-5843) Reported by: rizzo Tested by: murf Proposal of the changes to be made, and then an announcement of how they were accomplished: http://lists.digium.com/pipermail/asterisk-dev/2008-February/032065.html and: http://lists.digium.com/pipermail/asterisk-dev/2008-March/032124.html Here is a recap, file by file, of what I have done: pbx/pbx_config.c pbx/pbx_ael.c All funcs that were passed a ptr to the context list, now will ALSO be passed a hashtab ptr to the same set. Why? because (for the time being), the dialplan is stored in both, to facilitate a quick, low-cost move to hash-tables to speed up dialplan processing. If it was deemed necessary to pass the context LIST, well, it is just as necessary to have the TABLE available. This is because the list/table in question might not be the global one, but temporary ones we would use to stage the dialplan on, and then swap into the global position when things are ready. We now have one external function for apps to use, "ast_context_find_or_create()" instead of the pre-existing "find" and "create", as all existing usages used both in tandem anyway. pbx_config, and pbx_ael, will stage the reloaded dialplan into local lists and tables, and then call merge_contexts_and_delete, which will merge (now) existing contexts and priorities from other registrars into this local set by copying them. Then, merge_contexts_and_delete will lock down the contexts, swap the lists and tables, and unlock (real quick), and then destroy the old dialplan. chan_sip.c chan_iax.c chan_skinny.c All the channel drivers that would add regcontexts now use the ast_context_find_or_create now. chan_sip also includes a small fix to get rid of warnings about removing priorities that never got entered. apps/app_meetme.c apps/app_dial.c apps/app_queue.c All the apps that added a context/exten/priority were also modified to use ast_context_find_or_create instead. include/asterisk/pbx.h ast_context_create() is removed. Find_or_create_ is the new method. ast_context_find_or_create() interface gets the hashtab added. ast_merge_contexts_and_delete() gets the local hashtab arg added. ast_wrlock_contexts_version() is added so you can detect if someone else got a writelock between your readlocking and writelocking. ast_hashtab_compare_contexts was made public for use in pbx_config/pbx_ael ast_hashtab_hash_contexts was in like fashion make public. include/asterisk/pval.h ast_compile_ael2() interface changed to include the local hashtab table ptr. main/features.c For the sake of the parking context, we use ast_context_find_or_create(). main/pbx.c I changed all the "tree" names to "table" instead. That's because the original implementation was based on binary trees. (had a free library). Then I moved to hashtabs. Now, the names move forward too. refcount field added to contexts, so you can keep track of how many modules wanted this context to exist. Some log messages that are warnings were inflated from LOG_NOTICE to LOG_WARNING. Added some calls to ast_verb(3,...) for debug messages Lots of little mods to ast_context_remove_extension2, which is now excersized in ways it was not previously; one definite bug fixed. find_or_create was upgraded to handle both local lists/tables as well as the globals. context_merge() was added to do the per-context merging of the old/present contexts/extens/prios into the new/proposed local list/tables ast_merge_contexts_and_delete() was heavily modified. ast_add_extension2() was also upgraded to handle changes. the context_destroy() code was re-engineered to handle the new way of doing things, by exten/prio instead of by context. res/ael/pval.c res/ael/ael.tab.c res/ael/ael.tab.h res/ael/ael.y res/ael/ael_lex.c res/ael/ael.flex utils/ael_main.c utils/extconf.c utils/conf2ael.c utils/Makefile Had to change the interface to ast_compile_ael2(), to include the hashtab ptr. This ended up involving several external apps. The main gotcha was I had to include lock.h and hashtab.h in several places. As a side note, I tested this stuff pretty thoroughly, I replicated the problems originally reported by Luigi, and made triply sure that reloads worked, and everything worked thru "stop gracefully". I found a and fixed a few bugs as I was merging into trunk, that did not appear in my tests of bug6002. How's this for verbose commit messages? ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=108351 |