Summary: | ASTERISK-04275: [patch] [post 1.2] improved loader | ||
Reporter: | Luigi Rizzo (rizzo) | Labels: | |
Date Opened: | 2005-05-25 16:28:46 | Date Closed: | 2006-03-29 06:58:07.000-0600 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 20060325__bug4377.diff.txt ( 1) loader.diff ( 2) loader-part2.diff ( 3) loader-patch1a.diff ( 4) loader-patch2a | |
Description: | [disclaimer on file] [experimental feature] The module loader currently in asterisk has the following issues: 1. modules are loaded with RTLD_LAZY, because some modules may depend on symbols defined by other modules and resolving them at load time could be impossible. This is a rather severe safety problem because you can never know if a symbol will be resolved when you use it, and so you might panic at any time if you hit one such symbol in a module 2. some modules are loaded with RTLD_GLOBAL, either by default or when instructed so from the config files, to make their symbols available to other modules. The problem here is more on correctness -- by mistake, a module might export something is not meant to (by simply forgetting to declare the symbol 'static'), and another module might import something different from what he thinks 3. (not fixed by this patch) There is a bit of namespace pollution in the module definition, in that all modules must define a largish number of global symbols (key, description, load_module...) with rather common names, which cannot be reused for other purposes (e.g. variable or argument names) if you want to run the compiler with -Walias (a very useful warning in my opinion). It would be a lot easier to put all 'required' callbacks in one structure, and only require the module to export that single structure. ------------ end of problem description, start of solution ----- The attached patch enables the loader to load modules with RTLD_NOW and RTLD_LOCAL, thus fixing issues #1 and #2 above. kpfleming suggests that this is something to consider after 1.2. Nevertheless, i think it is useful that people can have a look at this proposal and comment it. The core of the patch is to include/asterisk/module.h (which has also a bit of documentation on how things work), and loader.c, which implements the loader logic. The rest is just mechanical changes to the other files. With these changes, all modules can be successfully loaded. Because exported symbols are refcounted, we don't have to play tricks with usecount() to prevent a module from being unloaded. See Additional Information for a description of the approach followed. ****** ADDITIONAL INFORMATION ****** The approach used is the following (see the description given in include/asterisk/module.h in the patch): -- DATA STRUCTURES -- * a module that wants to export symbols, puts there in a struct symbol_entry exported_symbols[] = { MOD_FUNC(some_function), MOD_DATA(data_structure), ... }; that contains name, address and size of the exported object (MOD_FUNC and MOD_DATA are macros that simplify filling up exported_symbols[]); Typically, a module will export a single struct, containing pointers to all the methods supplied by that module, i.e. its "interface". As an example, ast_monitor exports 4 methods, and we have #define METHOD(x) METHOD_BASE(ast_monitor_, x) static struct ast_monitor ast_monitor = { METHOD(start), METHOD(stop), METHOD(change_fname), METHOD(setjoinfiles), }; struct symbol_entry exported_symbols[] = { MOD_DATA(ast_monitor), { NULL, NULL} }; * a module that wants to use symbols defined externally, lists those symbols in a structure struct symbol_entry required_symbols[] = { MOD_WANT(some_function), MOD_WANT(data_structure), ... }; Typically, a module will require the pointers to all the interfaces of modules its uses. As an example, apps/app_queue uses ast_monitor and ast_features so it has: static struct ast_features *ast_features; static struct ast_monitor *ast_monitor; struct symbol_entry required_symbols[] = { MOD_WANT(ast_monitor), MOD_WANT(ast_features), { NULL, NULL } }; To use externally-supplied methods, one has to do a (rather mechanical) change in the current code to go through the struct pointer, e.g. ast_monitor_start() becomes ast_monitor->start() and so on. * The interface exported by a module, previously defined as functions in a header file, now is redefined as a struct containing one function pointer for each exported function, filled up in the module with METHOD() macros as in the above example. -- LOADER BEHAVIOUR -- Each module can be in one of the 4 states NEW, RESOLVED, FAIL, ACTIVE 1. (load phase) Loads the whole list of modules with RTLD_LOCAL | RTLD_NOW, so all references must be resolved immediately, and no global symbols will be made available externally by mistake. Mark all newly opened modules as NEW For each module the loader keeps track of exported and required symbols. 2. (fixup phase) Without cyclic dependencies, things are fairly simple: a) foreach module m in state NEW try to resolve all m->required_symbols[] using exported_symbols[]. If failed, mark the module as FAIL, otherwise as RESOLVED. b) for each module m in state RESOLVED if m->required_symbol[] is empty, or all providers of symbols in m->required_symbol[] are ACTIVE, then try load_module(). If successful, m becomes ACTIVE, otherwise becomes FAIL (and release all references to other modules) c) if any module changed state in step b), repeat step b) otherwise terminate. If there are no cyclic dependencies, at the end each modules is either ACTIVE or FAIL. In case of cyclic dependencies, there might be a subset of modules in state RESOLVED. To unblock things, assume that on one of them we can try load_module(), and restart from step b). Repeat until all modules are ok. Note: the core of the patch | ||
Comments: | By: BJ Weschke (bweschke) 2005-12-12 20:23:05.000-0600 This patch, given its age, no longer patches cleanly to the source tree. Are you going to bring it up to date or can this bug be closed? <Housekeeping/> By: Russell Bryant (russell) 2005-12-13 00:20:49.000-0600 I believe that he is watiting on feedback about the implementation before wasting time updating the patch since it is rather intrusive. Given that is the case, I don't think this needs to be updated at this point, nor should the bug be closed. By: Luigi Rizzo (rizzo) 2005-12-13 00:55:24.000-0600 before deciding what to do of this bug, i would appreciate that someone involved with architectural decisions could spend a bit of time reading the extensive notes above to see if the new feature is interesting or desirable or not. The actual patch is there just as a proof of concept, to show that those ideas can be implemented and work. I am happy to see the bug closed if considered not interesting, or to provide an updated/modified patch for commit (but only if there is a firm interest in committing it, because the changes are extensive and require a lot of work). I think closing reports without even a review in months, on the grounds that the author did not follow up with updated patches is not a sensible approach - it only serves to discourage contributors. By: BJ Weschke (bweschke) 2005-12-13 05:06:04.000-0600 My note was misguided. Please forgive the original note. I think this is a really interesting approach/design, and you've raised some very valid points about the potential issues with the loader as they exist today. I was just trying to refresh something that had been sitting idle in Mantis for nearly 8 months with no activity. By: Luigi Rizzo (rizzo) 2005-12-13 08:25:18.000-0600 bweschke: no problem, mine was a general comment on the handling of reports, not specifically referred to your note or my bugs. Perhaps we should add a "arch" category for those patches that need architectural review, without immediate commit requirements and before looking at the details of the code or the formatting. Of course this only works if the "arch" category gets some reviewer cycles to be analysed. Then, I understand that time is scarce and bugs may need to be closed because "sorry we couldn't find the resources to deal with your patch". By: Tilghman Lesher (tilghman) 2006-01-05 13:29:48.000-0600 1. Gotta use the builtin linkedlist structures. 2. We should name the structures which hold public API calls according to their module name, minus the .so extension. So: res_features->ast_masq_park_call, res_adsi->transmit_message, res_monitor->start, chan_modem->send. I never really liked having ast_ prefixes inside modules, anyway. Looks great, let's update this to current trunk, including the changes above, and we'll get this in. By: Tilghman Lesher (tilghman) 2006-01-05 13:32:19.000-0600 One more change: for efficiency sake, once we know a module is noload, we should notate that and auto-fail all modules that depend upon that one. By: Luigi Rizzo (rizzo) 2006-01-06 07:51:46.000-0600 corydon, thanks for the comments. I will provide patches in the following order: patch1) cleanup the existing loader.c using linked lists and code simplifications as in the current patch and corydon's suggestion and coding_guidelines. These are used but not strictly related to the new loader; patch2) slightly modified the "new loader" code so that both old-style and new-style modules can coexist (because we have a backward-compatibility issue with binary-only modules such as licensed codecs, right ?) patches3-N) changes for individual modules (or blocks of related modules) and their users. This way hopefully changes can be be integrated more easily. By: Luigi Rizzo (rizzo) 2006-01-06 20:17:13.000-0600 part 1 of the workplan - cleanup of the existing loader in preparation for the new stuff. Most of the changes to loader.c are to: - use linkedlist macros for internal data structures; - put repeated common code into functions; - add a few comments to the code; and some minor optimizations e.g. - exit early in load_modules() if the module directory cannot be opened; - exit early in key_matches() on a mismatch; - use O(N) instead of O(N^2) algorithm in printdigest(). Also, fix some bugs in the cli helper which did not generate the 'extra' names (such as 'extconfig', 'enum', 'rtp', ... that 'reload' accepts); Part of this change puts the name and the handlers for these special names into a table, so the handlers (ast_cdr_engine_reload(), dnsmgr_reload(), read_config_maps() ... ) must all have the same type. For consistency and future use, i have made all to be int (*f)(void), so the patch also includes fixes to the appropriate header and C files. Except for fixing CLI bugs, this patch preserves the existing behaviour. The next step will be to move the 'struct module' definition in include/asterisk/module.h and implement the new loader algorithm in parallel to the old one. New modules can be told from old ones because they will export only one symbol (the struct module, properly extended) instead of all the callbacks, and the loader will be able to work on them in a fully backward compatible way. By: Tilghman Lesher (tilghman) 2006-02-13 14:46:57.000-0600 Okay, I don't know if loader-patch1.diff just has bitrot or what, but I just attempted to apply it, and it won't compile against current trunk. By: Luigi Rizzo (rizzo) 2006-02-13 15:17:04.000-0600 updated patch against SVN9817 By: Tilghman Lesher (tilghman) 2006-02-14 16:09:31.000-0600 Okay, part 1 is now committed, as of revision 10084. Let's start looking at part 2. By: Kevin P. Fleming (kpfleming) 2006-02-14 17:42:05.000-0600 And wherever you end up, you _must_ provide a method for old modules to safely be ignored. I have no problem releasing new versions of the binary codec modules once Asterisk 1.4 is released (or sooner, once this has stabilized), but we cannot have someone's system fail to start because they have an old-format module present in their directory when they don't have source for it. By: Luigi Rizzo (rizzo) 2006-02-15 08:25:26.000-0600 kevin, what is exactly that you want to achieve ? in my design (already implemented in my branch on svn) both old-style (as they are now) and new-style modules can coexist. The former are still loaded with RTLD_GLOBAL|RTLD_LAZY, and with them you risk a run-time panic (same as now) as there is a chance to dereference a NULL pointer. With the new style modules, everything is resolved (and errors are detected) at load time. By: Luigi Rizzo (rizzo) 2006-02-15 08:32:27.000-0600 for part 2: the real patch only involves module.h and loader.c However, there are several modules (basically all codecs/codec_*c) which use a mix of the standard macros and direct manipulation of locks and localusecount. Let me think a bit on what is the best way out that doesn't require to undo all the cleaning i have put in module.h... By: Luigi Rizzo (rizzo) 2006-02-15 08:56:22.000-0600 ok part2 ready. as i said the real patch is only loader.c and module.h To fix the issue with codecs/codec_*c i have provided a temporary macro OLD_STANDARD_USECOUNT() which should go away as the codecs are fixed to use the local*stuff in a correct way. The other offending piece of code was res/res_smdi.c which had an inline expansion of STANDARD_USECOUNT_DECL, STANDARD_DECREMENT_USECOUNT and STANDARD_INCREMENT_USECOUNT. The patch replaces the code with the macros, which is a good thing anyways. By: Tilghman Lesher (tilghman) 2006-02-15 11:16:18.000-0600 Rather than committing the "this is a hack, linkedlists.h needs to be updated" portions, I'd like to see your proposed revisions to linkedlists.h first as a separate patch. By: Tilghman Lesher (tilghman) 2006-02-23 16:50:19.000-0600 Could we get an updated patch, please? By: Luigi Rizzo (rizzo) 2006-02-24 06:53:44.000-0600 updated to today's version. By: Tilghman Lesher (tilghman) 2006-03-25 00:05:14.000-0600 Round 2 of the loader updates has been committed. I think the updates to individual modules can be completed as a series of janitor projects. If there are other major changes that need to happen, please reopen this bug, otherwise thank you for your contribution. By: Tilghman Lesher (tilghman) 2006-03-25 19:45:40.000-0600 Patch has been reverted due to breakage. By: Tilghman Lesher (tilghman) 2006-03-25 20:38:05.000-0600 New patch uploaded. I need people who had trouble with the previous commit of this patch to test this. By: Luigi Rizzo (rizzo) 2006-03-28 09:34:01.000-0600 i will take care of this code, thanks. By: Tilghman Lesher (tilghman) 2006-03-28 11:03:12.000-0600 rizzo: if you're taking this over, note that this needs to be finalized and go into the trunk before the architectural freeze this Friday. By: Luigi Rizzo (rizzo) 2006-03-29 06:58:05.000-0600 patch applied. |