[Home]

Summary:DAHLIN-00280: dahdi_dynamic_eth(ethmf,loc)
Reporter:Pavel Selivanov (biohumanoid)Labels:
Date Opened:2012-01-25 13:59:32.000-0600Date Closed:2012-04-03 15:15:38
Priority:BlockerRegression?
Status:Closed/CompleteComponents:dahdi_dynamic_eth
Versions:2.6.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) 0001-dahdi-Update-dev_set_name-dev_name-for-RHEL-5.6.patch
( 1) 0002-dahdi_dynamic_eth-Move-tx-packet-flushing-to-process.patch
( 2) 0003-dahdi_dynamic-Since-dynamic-devices-are-parentless-w.patch
( 3) 2012.04.02.tar.gz
( 4) dahdi_dynamic_eth.c.patch
( 5) dahdi_dynamic.patch
( 6) dahlin-280.patch
( 7) dynamic_hang.log
( 8) owner.patch
Description:Jan 25 19:42:41 PACS kernel: [ 4660.852384] BUG: unable to handle kernel NULL pointer dereference at 00000074

dahdi_dynamic.c: dahdi_register_device(d->ddev, d->dev).
In DAHDI 2.6.0, d->dev is always NULL.

Underlying driver (eth, ethmf, loc) must set dahdi_dynamic->dev to it's "parent", I think.
dahdi_dynamic.c:ztdeth_create(): dyn->dev = &z->dev->dev;
The patch file is attached.

But the trouble is a little bit deeper.
dahdi_sysfs_add_device are calling
dev_set_name(dev, "%s:%s", parent->bus->name, dev_name(parent));
but parent->bus is NULL... (at least for Inter e100 driver...).
Not sure how to fix it not to break the idea (manual SPAN/chan assignment).

In any way, right now, dahdi_dynamic* is not working at all !
Comments:By: Russ Meyerriecks (rmeyerriecks) 2012-01-27 09:50:34.465-0600

We are aware of this issue and working on a fix for it to be released in dahdi 2.6.1

By: Pavel Selivanov (biohumanoid) 2012-01-27 10:39:37.145-0600

Thank you, I was confused by "Triage" state, expected it will change.

By: Pavel Selivanov (biohumanoid) 2012-03-19 03:25:22.541-0500

Dear all,

I wonder if there are any news on this bug ?

Seems, there are some more bugs in dahdi_dynamic.
If you will apply the patches provided - driver will break OS:
on first transmitted packet
on module unload

Note: provided patch should be extended, as long as struct device->name (in Linux < 3.0 ?) limited to 20 bytes (dev_set_name will concatenate "eth1:12:34:56:12:34:56:0" to "eth1:12:34:56:12:34:".
Thank you.

By: Shaun Ruffell (sruffell) 2012-03-19 10:39:51.108-0500

I actually was working on this yesterday. With the change Oron made to allow parentless devices, I just needed to name the dynamic devices, but I ran into an issue with a backport on RHEL 4 where I couldn't really just use the bus_id field on the dahdi_device but must set the name on the underlying kobject.

I need to test a bit more, but I fully expect this to be fixed in the next release.

By: Shaun Ruffell (sruffell) 2012-03-19 10:41:07.149-0500

I just looked at your patch, which was essentially the same thing I had done, but it failed to register on Centos 5.7.

By: Shaun Ruffell (sruffell) 2012-03-21 11:50:27.357-0500

Pavel, I've attached 3 patches that I tested here.  Hopefully you're around to look at them, but I probably will just commit them to trunk / 2.6, since they are better than what we currently have.

I decided to *not* use the dynamic address in the name of the dahdi_device in order to work around to 20 character limit, and there is more work to be done I believe in order to get the devices working when spans aren't auto-assigned, but....


By: Shaun Ruffell (sruffell) 2012-03-21 14:24:03.437-0500

I went ahead and committed these onto [trunk|http://svnview.digium.com/svn/dahdi?view=revision&revision=10563] and the head of the [2.6 branch|http://svnview.digium.com/svn/dahdi?view=revision&revision=10571]. Please let me know if you still have any problems, otherwise they will be in the next release.

By: Pavel Selivanov (biohumanoid) 2012-03-21 23:33:56.470-0500

I've compiled dahdi-linux from trunk rev 10584.
Now troubles with loading dynamic module, span creation, function with asterisk (pulse, dtmf, voice).
On unload, dynamic will crash OS (dynamic_hang.log).

By: Shaun Ruffell (sruffell) 2012-03-22 01:25:43.443-0500

What config are you using?  I tested this out with on 3.3 and 2.6.18-308 and was able to run pattern test between the local spans and between the machines on the eth spans.  Perhaps I should try with your configuration.

By: Shaun Ruffell (sruffell) 2012-03-22 01:49:17.084-0500

With the following conf, when loaded with another device on the other side:
{noformat}
dynamic=eth,eth0/00:0b:db:92:1b:90,24,1
bchan=1-23
dchan=24
{noformat}

I get:

{noformat}
# cat /proc/dahdi/1
Span 1: DYN/eth/eth0/00:0b:db:92:1b:90 "Dynamic 'eth' span at 'eth0/00:0b:db:92:1b:90'" (MASTER)

          1 DYN/eth/eth0/00:0b:db:92:1b:90/1 Clear
          2 DYN/eth/eth0/00:0b:db:92:1b:90/2 Clear
          3 DYN/eth/eth0/00:0b:db:92:1b:90/3 Clear
          4 DYN/eth/eth0/00:0b:db:92:1b:90/4 Clear
          5 DYN/eth/eth0/00:0b:db:92:1b:90/5 Clear
          6 DYN/eth/eth0/00:0b:db:92:1b:90/6 Clear
          7 DYN/eth/eth0/00:0b:db:92:1b:90/7 Clear
          8 DYN/eth/eth0/00:0b:db:92:1b:90/8 Clear
          9 DYN/eth/eth0/00:0b:db:92:1b:90/9 Clear
         10 DYN/eth/eth0/00:0b:db:92:1b:90/10 Clear
         11 DYN/eth/eth0/00:0b:db:92:1b:90/11 Clear
         12 DYN/eth/eth0/00:0b:db:92:1b:90/12 Clear
         13 DYN/eth/eth0/00:0b:db:92:1b:90/13 Clear
         14 DYN/eth/eth0/00:0b:db:92:1b:90/14 Clear
         15 DYN/eth/eth0/00:0b:db:92:1b:90/15 Clear
         16 DYN/eth/eth0/00:0b:db:92:1b:90/16 Clear
         17 DYN/eth/eth0/00:0b:db:92:1b:90/17 Clear
         18 DYN/eth/eth0/00:0b:db:92:1b:90/18 Clear
         19 DYN/eth/eth0/00:0b:db:92:1b:90/19 Clear
         20 DYN/eth/eth0/00:0b:db:92:1b:90/20 Clear
         21 DYN/eth/eth0/00:0b:db:92:1b:90/21 Clear
         22 DYN/eth/eth0/00:0b:db:92:1b:90/22 Clear
         23 DYN/eth/eth0/00:0b:db:92:1b:90/23 Clear
         24 DYN/eth/eth0/00:0b:db:92:1b:90/24 HDLCFCS
{noformat}

and everything is working fine for me.  I can both unload and use dahdi_cfg -s.  So I'm curious what is different in your configuration besides the kernel version.

By: Pavel Selivanov (biohumanoid) 2012-03-22 01:55:55.341-0500

I'm running Linux 2.6.26 (Debian, x86).
dahdi(trunk) is working, but driver will crash OS on dahdi_dynamic_, right here: __module_get(d->driver->owner);

I applied owner.patch - now I can load/unload module without troubles (almost).
I'll describe this "almost" in next comment, else one more bug.

By: Pavel Selivanov (biohumanoid) 2012-03-22 01:57:36.706-0500

dynamic=eth,eth0/00:55:55:55:55:00,30,1

fxsls=8-15
fxols=1-7,16-30
alaw=1-30

loadzone=ru
defaultzone=ru


By: Shaun Ruffell (sruffell) 2012-03-22 02:10:23.055-0500

Hmm..I do not think the owner patch is good. The owner should be the one that is set in the specific dynamic driver when it's registered. For example, for the eth driver, it's set here:
{noformat}
static struct dahdi_dynamic_driver ztd_eth = {
       .owner = THIS_MODULE,
       .name = "eth",
       .desc = "Ethernet",
       .create = ztdeth_create,
       .destroy = ztdeth_destroy,
       .transmit = ztdeth_transmit,
       .flush = ztdeth_flush,
};
{noformat}

Still thinking about why your setup wouldn't have a valid reference the the module there.

By: Shaun Ruffell (sruffell) 2012-03-22 02:18:33.529-0500

Using your configuration works for me on 2.6.18 and 3.3.0.

{noformat}
Span 1: DYN/eth/eth0/00:0b:db:92:1b:90 "Dynamic 'eth' span at 'eth0/00:0b:db:92:1b:90'" (MASTER)

          1 DYN/eth/eth0/00:0b:db:92:1b:90/1 FXOLS
          2 DYN/eth/eth0/00:0b:db:92:1b:90/2 FXOLS
          3 DYN/eth/eth0/00:0b:db:92:1b:90/3 FXOLS
          4 DYN/eth/eth0/00:0b:db:92:1b:90/4 FXOLS
          5 DYN/eth/eth0/00:0b:db:92:1b:90/5 FXOLS
          6 DYN/eth/eth0/00:0b:db:92:1b:90/6 FXOLS
          7 DYN/eth/eth0/00:0b:db:92:1b:90/7 FXOLS
          8 DYN/eth/eth0/00:0b:db:92:1b:90/8 FXSLS
          9 DYN/eth/eth0/00:0b:db:92:1b:90/9 FXSLS
         10 DYN/eth/eth0/00:0b:db:92:1b:90/10 FXSLS
         11 DYN/eth/eth0/00:0b:db:92:1b:90/11 FXSLS
         12 DYN/eth/eth0/00:0b:db:92:1b:90/12 FXSLS
         13 DYN/eth/eth0/00:0b:db:92:1b:90/13 FXSLS
         14 DYN/eth/eth0/00:0b:db:92:1b:90/14 FXSLS
         15 DYN/eth/eth0/00:0b:db:92:1b:90/15 FXSLS
         16 DYN/eth/eth0/00:0b:db:92:1b:90/16 FXOLS
         17 DYN/eth/eth0/00:0b:db:92:1b:90/17 FXOLS
         18 DYN/eth/eth0/00:0b:db:92:1b:90/18 FXOLS
         19 DYN/eth/eth0/00:0b:db:92:1b:90/19 FXOLS
         20 DYN/eth/eth0/00:0b:db:92:1b:90/20 FXOLS
         21 DYN/eth/eth0/00:0b:db:92:1b:90/21 FXOLS
         22 DYN/eth/eth0/00:0b:db:92:1b:90/22 FXOLS
         23 DYN/eth/eth0/00:0b:db:92:1b:90/23 FXOLS
         24 DYN/eth/eth0/00:0b:db:92:1b:90/24 FXOLS
         25 DYN/eth/eth0/00:0b:db:92:1b:90/25 FXOLS
         26 DYN/eth/eth0/00:0b:db:92:1b:90/26 FXOLS
         27 DYN/eth/eth0/00:0b:db:92:1b:90/27 FXOLS
         28 DYN/eth/eth0/00:0b:db:92:1b:90/28 FXOLS
         29 DYN/eth/eth0/00:0b:db:92:1b:90/29 FXOLS
         30 DYN/eth/eth0/00:0b:db:92:1b:90/30 FXOLS
{noformat}

By: Pavel Selivanov (biohumanoid) 2012-03-22 02:23:23.872-0500

One more noisy bug:
while [ 1 ]; do dahdi_cfg ; sleep 0.1; dahdi_cfg -s; sleep 0.1; done

It's a matter of time, how soon will it crash.
It will crash OS in dahdi_dynamic_receive.

Flow (my opinion):
res = dtd->create(d, d->addr);
dev_add_pack
packet from eth0
dahdi_dynamic_receive
kernel crash...

If ethernet will receive a packet 0xd00d after dtd->create & before dahdi_register_device -
dahdi_dynamic_receive will use/pass unitialized span.

I think, it's a good idea to add filed "initialized" in struct dahdi_dynamic and to check it in dahdi_dynamic_receive.


By: Pavel Selivanov (biohumanoid) 2012-03-22 02:34:58.844-0500

> Hmm..I do not think the owner patch is good. The owner should be the one that is set in the specific dynamic driver when it's registered. For example, for the eth driver, it's set here:

> static struct dahdi_dynamic_driver ztd_eth = {
>        .owner = THIS_MODULE,
Oops, I missed it :-) I thought it's NULL.
It's more strange, because, if .owner is dahdi_dynamic_eth - my OS crash.
Once I change it to dahdi_dynamic - it's not crashing.

>Still thinking about why your setup wouldn't have a valid reference the the module there.
I'll double check it. It could be "partial" rebuild.
I'm testing current trunk (as far as I understand, you've already patched trunk), I didn't applied any patches.

By: Shaun Ruffell (sruffell) 2012-03-22 02:39:25.635-0500

Yes, I see your point. Instead of adding a new flag, what do you think about something using the registered flag that's in the span already. This will not be set until after spanno / channel number assignment so it should be safe:

{noformat}
diff --git a/drivers/dahdi/dahdi_dynamic_eth.c b/drivers/dahdi/dahdi_dynamic_eth.c
index f68a4ea..de75385 100644
--- a/drivers/dahdi/dahdi_dynamic_eth.c
+++ b/drivers/dahdi/dahdi_dynamic_eth.c
@@ -71,7 +71,7 @@ static struct dahdi_span *ztdeth_getspan(unsigned char *addr, unsigned short sub
       if (z)
               span = z->span;
       spin_unlock_irqrestore(&zlock, flags);
-       return span;
+       return (test_bit(DAHDI_FLAGBIT_REGISTERED, &span->flags)) ? span : NULL;
}
{noformat}



By: Shaun Ruffell (sruffell) 2012-03-22 02:49:04.258-0500

That doesn't work. Give me a minute. I'm looking at fixing that crash.

By: Shaun Ruffell (sruffell) 2012-03-22 03:01:50.522-0500

Ok, this works for me:
{noformat}
diff --git a/drivers/dahdi/dahdi_dynamic_eth.c b/drivers/dahdi/dahdi_dynamic_eth.c
index f68a4ea..ecf46ea 100644                                                                                                                      
--- a/drivers/dahdi/dahdi_dynamic_eth.c                                                                                                            
+++ b/drivers/dahdi/dahdi_dynamic_eth.c                                                                                                            
@@ -71,6 +71,8 @@ static struct dahdi_span *ztdeth_getspan(unsigned char *addr, unsigned short sub                                                
       if (z)  
               span = z->span;
       spin_unlock_irqrestore(&zlock, flags);
+       if (!span || !test_bit(DAHDI_FLAGBIT_REGISTERED, &span->flags))                                                                            
+               return NULL;                                                                                                                      
       return span;
}
{noformat}

Care to try it on your setup? You should be able to apply it via:
curl "https://github.com/sruffell/dahdi-linux/commit/95b5fe829.patch" | patch -p1  onto your dahdi-linux directory.

By: Pavel Selivanov (biohumanoid) 2012-03-22 08:34:07.831-0500

> Care to try it on your setup?
Works perfect !

The troubles with dahdi_cfg -s still persists...
{panel:title=system.conf}
dynamic=eth,eth0/00:55:55:55:55:00,30,1

fxsls=8-15
fxols=1-7,16-30
alaw=1-30

loadzone=ru
defaultzone=ru
{panel}

dahdi_cfg

cat /proc/modules  | grep dahdi
dahdi_dynamic_eth 4824 0 - Live 0xf8d07000
dahdi_dynamic 9264 1 dahdi_dynamic_eth, Live 0xf8d03000
dahdi 191304 1 dahdi_dynamic, Live 0xf8e04000
crc_ccitt 2080 1 dahdi, Live 0xf8cc7000

reference counter to dahdi_dynamic_eth must be 1 !
I'll take a look tomorrow, what's up.
dahdi 2.4.1.2 works perfect on the same machine/kernel...

By: Shaun Ruffell (sruffell) 2012-03-22 09:07:27.266-0500

The module refcount should no longer be one unless user space has an open reference. In the 2.5 series there were many changes to allow the drivers to be stopped via "modprobe -r dahdi_dynamic_eth" like all the other board drivers when user space doesn't have any open references. The drivers will unwind (should) unwind themselves.

So...this might not be an issue.

By: Pavel Selivanov (biohumanoid) 2012-03-22 10:09:55.826-0500

I'm sorry, no problems with "owner".
I had dahdi 2.4.1.2 installed and automatically started on boot, and a script to unload dahdi / load dahdi without installing it.
I removed dahdi 2.4.0 and some stuff, rebooted machine.
Now, dahdi works fine !

As a feedback:
1. I can remove dahdi while asterisk is running
rmmod dahdi_dynamic_eth; rmmod dahdi_dynamic; rmmod dahdi;
Asterisk received an event, freed /dev/dahdi/channel
2. I can unload dahdi_dynamic without "dahdi_cfg -s", shared pointer implementation (dahdi_get/put) seems to work perfect.

BTW: please, take a look at DAHLIN-278 DAHLIN-279.

By: Pavel Selivanov (biohumanoid) 2012-03-22 10:10:04.266-0500

Give me a hint please, should I open continue my patch here
https://issues.asterisk.org/jira/browse/DAHLIN-26
or open a new ticket ?

Seems somebody had simular problem DAHLIN-245, but the patch provided are ugly:
1. in fack, tasklets are not used (on reveive)
2. no fifo. Necessary on high load systems
3. no tdmoe stats (welcome to tcpdump...)

I'm maintaining my fifo-based patches for years - it works perfect.
As long as dahdi do not support Linux < 2.6.18 - take a look once again.
I will rework it (sprintf -> seq_printf, transmit on sync_tick).


By: Shaun Ruffell (sruffell) 2012-03-22 10:52:14.183-0500

If you agree, I think continuing it on DAHLIN-26 makes sense (I wasn't a watcher so I didn't see your update in February).  Especially since 2.6.9 support was dropped. There should be at least a couple of more months before 2.7 is released.

So commit the patch above and close this issue?

By: Pavel Selivanov (biohumanoid) 2012-03-22 12:38:58.838-0500

> So commit the patch above and close this issue?
yes.

Thank you.

By: Shaun Ruffell (sruffell) 2012-03-22 13:42:31.402-0500

Committed onto [trunk|http://svnview.digium.com/svn/dahdi?view=rev&rev=10587] and onto the [2.6 branch|http://svnview.digium.com/svn/dahdi?view=rev&rev=10591]. Will be in 2.6.1 when it's released.

By: Pavel Selivanov (biohumanoid) 2012-04-02 03:05:37.820-0500

Hi,

It's sad, but dynamic is crashing debian lenny with 2.6.26 kernel, right here: __module_get(d->driver->owner);
I've installed clean debian lenny on VMWare - it's 100% crashing on unload...

I've attached configs/logs/packages as 2012.04.02.tar.gz.

PS: Debian Squeeze with 2.6.32 works fine...


By: Shaun Ruffell (sruffell) 2012-04-03 00:50:15.496-0500

Pavel,

I have 5 changes currently out on my [for-trunk branch|http://git.asterisk.org/gitweb/?p=team/sruffell/dahdi-linux.git;a=shortlog;h=refs/heads/for-trunk].  There are more changes than I would like, but getting rid of the __module_get calls was more involved than I was expecting. I did try this out on debian lenny in addition to 2.6.18 and 3.3.

Can you try those changes from the git repo? If not I've attached them all combined into one patch on this issue as dahlin-280.patch.

To run from the git repo
{noformat}
git clone git://git.asterisk.org/team/sruffell/dahdi-linux
cd dahdi-linux
git checkout -b for-trunk --track origin/for-trunk
make install, etc...
{noformat}

Thanks,
Shaun

By: Pavel Selivanov (biohumanoid) 2012-04-03 04:33:21.960-0500

Linux 2.6.26 tests

span create/shutdown - OK
span create/shutdown loop with "sleep 0.1" - OK
rmmod witout "dahdi_cfg -s" - OK
voice/pulse/dtmf - OK
rmmod dahdi_dynamic_eth with running Asterisk - Failed (busy)

=== patched (for-trunk):
lsmod | grep dahdi
dahdi_dynamic_eth       4792  30
dahdi_dynamic           9040  31 dahdi_dynamic_eth
dahdi                 191468  61 dahdi_dynamic
crc_ccitt               2080  1 dahdi

=== unpatched (trunk)
dahdi_dynamic_eth       4760  0
dahdi_dynamic           9200  61 dahdi_dynamic_eth
dahdi                 191304  61 dahdi_dynamic
crc_ccitt               2080  1 dahdi
rmmod dahdi_dynamic_eth
[Apr  3 09:32:09] NOTICE[7790]: chan_dahdi.c:9450 handle_init_event: Got DAHDI_EVENT_REMOVED. Destroying channel 1


By: Shaun Ruffell (sruffell) 2012-04-03 15:15:38.786-0500

Ok..closing again. These changes (with some very slight modifications) are on trunk and the 2.6 branch and will be in the 2.6.1 release again.

Thanks for testing (which reminds me...I didn't add your name to the last round of commits...sorry...)

By: Pavel Selivanov (biohumanoid) 2012-04-04 00:45:24.640-0500

Probably you've missed it in report, but I can't rmmod dahdi_dynamic_eth while asterisk is running/using DAHDI spans.
dahdi_dynamic_eth 4792 30
dahdi_dynamic 9040 31 dahdi_dynamic_eth
It's not a crash, so we can leave with it, but it's not so comfortable as it could/should be...

Should I open new bugreport ?

In r10596, I had
dahdi_dynamic_eth 4760 0
dahdi_dynamic 9200 61 dahdi_dynamic_eth
and I could rmmod dahdi_dynamic_eth

Thank you.
Using kref gave a great improvements to DAHDI/Asterisk :-)

By: Shaun Ruffell (sruffell) 2012-04-04 10:10:30.337-0500

Actually, my understanding is that the current behavior is the intended behavior.  The physical card drivers work this way, and dynamic_spans were made this way in [r9573 "dahdi_dynamic: dynamic drivers should not reference count themselves"|http://svnview.digium.com/svn/dahdi?view=revision&revision=9573].

Normally we try to prevent surprise channel removals while Asterisk is running (since it doesn't handle it very well), but we've (most Tzafrir and Oron) have started trying to accommodate surprise removal since users sometimes yank out the USB devices unintentionally.

Does this cause any particular problem for you?

By: Pavel Selivanov (biohumanoid) 2012-04-18 07:41:48.003-0500

Hi,
I'm sorry for a huge delay with an answer.

To be sure I get the idea, current (intended) design:
- prevent unloading dahdi while used /dev/dahdi/*
- permit unloading dahdi_dynamic without "dahdi -s"
Is it clear ?

In r10596 I could unload dahdi_dynamic_eth (and, I think, any other driver like wct1xxp) even with /dev/dahdi/* openned.
And it was great !

Up to me, it's a bad idea to remove driver while it's used.
I'd use: asterisk -rx "module unload -f chan_dahdi.so"
But the world is cruel... So I'd prefer to be able to rmmod dahdi_dynamic_eth at any time...
modprobe dahdi_dynamic.ko remove_on_rmmod=1 ?