Summary:ASTERISK-04275: [patch] [post 1.2] improved loader
Reporter:Luigi Rizzo (rizzo)Labels:
Date Opened:2005-05-25 16:28:46Date Closed:2006-03-29 06:58:07.000-0600
Versions:Frequency of
Environment:Attachments:( 0) 20060325__bug4377.diff.txt
( 1) loader.diff
( 2) loader-part2.diff
( 3) loader-patch1a.diff
( 4) loader-patch2a
Description:[disclaimer on file]
[experimental feature]
The module loader currently in asterisk has the following issues:
1. modules are loaded with RTLD_LAZY, because some modules may depend on
  symbols defined by other modules and resolving them at load time
  could be impossible.
  This is a rather severe safety problem because you can never know if
  a symbol will be resolved when you use it, and so you might panic
  at any time if you hit one such symbol in a module

2. some modules are loaded with RTLD_GLOBAL, either by default or
  when instructed so from the config files, to make their symbols
  available to other modules.
  The problem here is more on correctness -- by mistake, a module
  might export something is not meant to (by simply forgetting to
  declare the symbol 'static'), and another module might import
  something different from what he thinks

3. (not fixed by this patch) There is a bit of namespace pollution
  in the module definition, in that all modules must define a largish
  number of global symbols (key, description, load_module...) with
  rather common names, which cannot be reused for other purposes
  (e.g. variable or argument names) if you want to run the compiler
  with -Walias (a very useful warning in my opinion).
  It would be a lot easier to put all 'required' callbacks in
  one structure, and only require the module to export that single

------------ end of problem description, start of solution -----

The attached patch enables the loader to load modules with
RTLD_NOW and RTLD_LOCAL, thus fixing issues #1 and #2 above.
kpfleming suggests that this is something to consider after 1.2.
Nevertheless, i think it is useful that people can have a look
at this proposal and comment it.

The core of the patch is to include/asterisk/module.h (which has
also a bit of documentation on how things work), and loader.c, which
implements the loader logic.
The rest is just mechanical changes to the other files. With these changes,
all modules can be successfully loaded. Because exported symbols are
refcounted, we don't have to play tricks with usecount() to prevent
a module from being unloaded.

See Additional Information for a description of the approach followed.


The approach used is the following (see the description given
in include/asterisk/module.h in the patch):


* a module that wants to export symbols, puts there in a
       struct symbol_entry exported_symbols[] = {

  that contains name, address and size of the exported object
  (MOD_FUNC and MOD_DATA are macros that simplify filling up
  Typically, a module will export a single struct, containing
  pointers to all the methods supplied by that module, i.e.
  its "interface".
  As an example, ast_monitor exports 4 methods, and we have

       #define        METHOD(x)       METHOD_BASE(ast_monitor_, x)
       static struct ast_monitor ast_monitor = {

       struct symbol_entry exported_symbols[] = {
              { NULL, NULL}
* a module that wants to use symbols defined externally, lists
 those symbols in a structure

       struct symbol_entry required_symbols[] = {

  Typically, a module will require the pointers to all the interfaces
  of modules its uses.
  As an example, apps/app_queue uses ast_monitor and ast_features so it has:

       static struct ast_features *ast_features;
       static struct ast_monitor *ast_monitor;
       struct symbol_entry required_symbols[] = {
               { NULL, NULL }

 To use externally-supplied methods, one has to do a (rather mechanical)
 change in the current code to go through the struct pointer, e.g.

       ast_monitor_start()     becomes         ast_monitor->start()

 and so on.

* The interface exported by a module, previously defined as functions
 in a header file, now is redefined as a struct containing one
 function pointer for each exported function, filled up in the
 module with METHOD() macros as in the above example.


Each module can be in one of the 4 states NEW, RESOLVED, FAIL, ACTIVE
1. (load phase)
  Loads the whole list of modules with RTLD_LOCAL | RTLD_NOW, so
  all references must be resolved immediately, and no global symbols
  will be made available externally by mistake.
  Mark all newly opened modules as NEW
  For each module the loader keeps track of exported and required symbols.

2. (fixup phase)
  Without cyclic dependencies, things are fairly simple:
  a) foreach module m in state NEW
     try to resolve all m->required_symbols[] using exported_symbols[].
     If failed, mark the module as FAIL, otherwise as RESOLVED.
  b) for each module m in state RESOLVED
     if m->required_symbol[] is empty, or all providers of symbols in
     m->required_symbol[] are ACTIVE, then try load_module(). If
     successful, m becomes ACTIVE, otherwise becomes FAIL
     (and release all references to other modules)
  c) if any module changed state in step b), repeat step b) otherwise
  If there are no cyclic dependencies, at the end each modules is
  either ACTIVE or FAIL. In case of cyclic dependencies, there might be
  a subset of modules in state RESOLVED. To unblock things, assume that
  on one of them we can try load_module(), and restart from step b).
  Repeat until all modules are ok.

Note: the core of the patch

Comments:By: BJ Weschke (bweschke) 2005-12-12 20:23:05.000-0600

This patch, given its age, no longer patches cleanly to the source tree. Are you going to bring it up to date or can this bug be closed?

By: Russell Bryant (russell) 2005-12-13 00:20:49.000-0600

I believe that he is watiting on feedback about the implementation before wasting time updating the patch since it is rather intrusive.  Given that is the case, I don't think this needs to be updated at this point, nor should the bug be closed.

By: Luigi Rizzo (rizzo) 2005-12-13 00:55:24.000-0600

before deciding what to do of this bug, i would appreciate that someone involved
with architectural decisions could spend a bit of time reading the extensive
notes above to see if the new feature is interesting or desirable or not.
The actual patch is there just as a proof of concept, to show that those
ideas can be implemented and work.

I am happy to see the bug closed if considered not interesting, or to
provide an updated/modified patch for commit (but only if there is a
firm interest in committing it, because the changes are extensive
and require a lot of work).

I think closing reports without even a review in months, on
the grounds that the author did not follow up with updated patches is
not a sensible approach - it only serves to discourage contributors.

By: BJ Weschke (bweschke) 2005-12-13 05:06:04.000-0600

My note was misguided. Please forgive the original note. I think this is a really interesting approach/design, and you've raised some very valid points about the potential issues with the loader as they exist today. I was just trying to refresh something that had been sitting idle in Mantis for nearly 8 months with no activity.

By: Luigi Rizzo (rizzo) 2005-12-13 08:25:18.000-0600

bweschke: no problem, mine was a general comment on the handling of reports,
not specifically referred to your note or my bugs.

Perhaps we should add a "arch" category for those
patches that need architectural review, without immediate commit
requirements and before looking at the details of the code or the formatting.
Of course this only works if the "arch" category gets some reviewer cycles
to be analysed. Then, I understand that time is scarce and bugs may
need to be closed because "sorry we couldn't find the resources to deal
with your patch".

By: Tilghman Lesher (tilghman) 2006-01-05 13:29:48.000-0600

1.  Gotta use the builtin linkedlist structures.
2.  We should name the structures which hold public API calls according to their module name, minus the .so extension.  So:  res_features->ast_masq_park_call, res_adsi->transmit_message, res_monitor->start, chan_modem->send.  I never really liked having ast_ prefixes inside modules, anyway.

Looks great, let's update this to current trunk, including the changes above, and we'll get this in.

By: Tilghman Lesher (tilghman) 2006-01-05 13:32:19.000-0600

One more change:  for efficiency sake, once we know a module is noload, we should notate that and auto-fail all modules that depend upon that one.

By: Luigi Rizzo (rizzo) 2006-01-06 07:51:46.000-0600

corydon, thanks for the comments.
I will provide patches in the following order:

 cleanup the existing loader.c using linked lists and code simplifications
 as in the current patch and corydon's suggestion and coding_guidelines.
 These are used but not strictly related to the new loader;

  slightly modified the "new loader" code so that both old-style and
  new-style modules can coexist (because we have a backward-compatibility
  issue with binary-only modules such as licensed codecs, right ?)

  changes for individual modules (or blocks of related modules) and their

This way hopefully changes can be be integrated more easily.

By: Luigi Rizzo (rizzo) 2006-01-06 20:17:13.000-0600

part 1 of the workplan - cleanup of the existing loader in
preparation for the new stuff.

Most of the changes to loader.c are to:
- use linkedlist macros for internal data structures;
- put repeated common code into functions;
- add a few comments to the code;
and some minor optimizations e.g.
- exit early in load_modules() if the module directory cannot be opened;
- exit early in key_matches() on a mismatch;
- use O(N) instead of O(N^2) algorithm in printdigest().

Also, fix some bugs in the cli helper which did not generate the 'extra'
names (such as 'extconfig', 'enum', 'rtp', ... that 'reload' accepts);

Part of this change puts the name and the handlers for these special
names into a table, so the handlers (ast_cdr_engine_reload(), dnsmgr_reload(),
read_config_maps() ... ) must all have the same type. For consistency
and future use, i have made all to be int (*f)(void), so the patch also
includes fixes to the appropriate header and C files.

Except for fixing CLI bugs, this patch preserves the existing behaviour.

The next step will be to move the 'struct module' definition in
include/asterisk/module.h and implement the new loader algorithm in
parallel to the old one. New modules can be told from old ones because
they will export only one symbol (the struct module, properly extended)
instead of all the callbacks, and the loader will be able to work on
them in a fully backward compatible way.

By: Tilghman Lesher (tilghman) 2006-02-13 14:46:57.000-0600

Okay, I don't know if loader-patch1.diff just has bitrot or what, but I just attempted to apply it, and it won't compile against current trunk.

By: Luigi Rizzo (rizzo) 2006-02-13 15:17:04.000-0600

updated patch against SVN9817

By: Tilghman Lesher (tilghman) 2006-02-14 16:09:31.000-0600

Okay, part 1 is now committed, as of revision 10084.  Let's start looking at part 2.

By: Kevin P. Fleming (kpfleming) 2006-02-14 17:42:05.000-0600

And wherever you end up, you _must_ provide a method for old modules to safely be ignored. I have no problem releasing new versions of the binary codec modules once Asterisk 1.4 is released (or sooner, once this has stabilized), but we cannot have someone's system fail to start because they have an old-format module present in their directory when they don't have source for it.

By: Luigi Rizzo (rizzo) 2006-02-15 08:25:26.000-0600

kevin, what is exactly that you want to achieve ?

in my design (already implemented in my branch on svn)
both old-style (as they are now) and new-style modules can
coexist. The former are still loaded with RTLD_GLOBAL|RTLD_LAZY,
and with them you risk a run-time panic (same as now)
as there is a chance to dereference a NULL pointer.
With the new style modules, everything is resolved (and errors
are detected) at load time.

By: Luigi Rizzo (rizzo) 2006-02-15 08:32:27.000-0600

for part 2:
the real patch only involves module.h and loader.c
However, there are several modules (basically all codecs/codec_*c)
which use a mix of the standard macros and direct manipulation
of locks and localusecount. Let me think a bit on what is the
best way out that doesn't require to undo all the cleaning i
have put in module.h...

By: Luigi Rizzo (rizzo) 2006-02-15 08:56:22.000-0600

ok part2 ready.
as i said the real patch is only loader.c and module.h
To fix the issue with codecs/codec_*c i have provided
a temporary macro OLD_STANDARD_USECOUNT() which should go
away as the codecs are fixed to use the local*stuff in
a correct way.
The other offending piece of code was res/res_smdi.c
which had an inline expansion of STANDARD_USECOUNT_DECL,
The patch replaces the code with the macros, which is a good
thing anyways.

By: Tilghman Lesher (tilghman) 2006-02-15 11:16:18.000-0600

Rather than committing the "this is a hack, linkedlists.h needs to be updated" portions, I'd like to see your proposed revisions to linkedlists.h first as a separate patch.

By: Tilghman Lesher (tilghman) 2006-02-23 16:50:19.000-0600

Could we get an updated patch, please?

By: Luigi Rizzo (rizzo) 2006-02-24 06:53:44.000-0600

updated to today's version.

By: Tilghman Lesher (tilghman) 2006-03-25 00:05:14.000-0600

Round 2 of the loader updates has been committed.  I think the updates to individual modules can be completed as a series of janitor projects.  If there are other major changes that need to happen, please reopen this bug, otherwise thank you for your contribution.

By: Tilghman Lesher (tilghman) 2006-03-25 19:45:40.000-0600

Patch has been reverted due to breakage.

By: Tilghman Lesher (tilghman) 2006-03-25 20:38:05.000-0600

New patch uploaded.  I need people who had trouble with the previous commit of this patch to test this.

By: Luigi Rizzo (rizzo) 2006-03-28 09:34:01.000-0600

i will take care of this code, thanks.

By: Tilghman Lesher (tilghman) 2006-03-28 11:03:12.000-0600

rizzo:  if you're taking this over, note that this needs to be finalized and go into the trunk before the architectural freeze this Friday.

By: Luigi Rizzo (rizzo) 2006-03-29 06:58:05.000-0600

patch applied.