Summary: | DAHLIN-00280: dahdi_dynamic_eth(ethmf,loc) | ||
Reporter: | Pavel Selivanov (biohumanoid) | Labels: | |
Date Opened: | 2012-01-25 13:59:32.000-0600 | Date Closed: | 2012-04-03 15:15:38 |
Priority: | Blocker | Regression? | |
Status: | Closed/Complete | Components: | dahdi_dynamic_eth |
Versions: | 2.6.0 | Frequency of Occurrence | Constant |
Related Issues: | |||
Environment: | Attachments: | ( 0) 0001-dahdi-Update-dev_set_name-dev_name-for-RHEL-5.6.patch ( 1) 0002-dahdi_dynamic_eth-Move-tx-packet-flushing-to-process.patch ( 2) 0003-dahdi_dynamic-Since-dynamic-devices-are-parentless-w.patch ( 3) 2012.04.02.tar.gz ( 4) dahdi_dynamic_eth.c.patch ( 5) dahdi_dynamic.patch ( 6) dahlin-280.patch ( 7) dynamic_hang.log ( 8) owner.patch | |
Description: | Jan 25 19:42:41 PACS kernel: [ 4660.852384] BUG: unable to handle kernel NULL pointer dereference at 00000074 dahdi_dynamic.c: dahdi_register_device(d->ddev, d->dev). In DAHDI 2.6.0, d->dev is always NULL. Underlying driver (eth, ethmf, loc) must set dahdi_dynamic->dev to it's "parent", I think. dahdi_dynamic.c:ztdeth_create(): dyn->dev = &z->dev->dev; The patch file is attached. But the trouble is a little bit deeper. dahdi_sysfs_add_device are calling dev_set_name(dev, "%s:%s", parent->bus->name, dev_name(parent)); but parent->bus is NULL... (at least for Inter e100 driver...). Not sure how to fix it not to break the idea (manual SPAN/chan assignment). In any way, right now, dahdi_dynamic* is not working at all ! | ||
Comments: | By: Russ Meyerriecks (rmeyerriecks) 2012-01-27 09:50:34.465-0600 We are aware of this issue and working on a fix for it to be released in dahdi 2.6.1 By: Pavel Selivanov (biohumanoid) 2012-01-27 10:39:37.145-0600 Thank you, I was confused by "Triage" state, expected it will change. By: Pavel Selivanov (biohumanoid) 2012-03-19 03:25:22.541-0500 Dear all, I wonder if there are any news on this bug ? Seems, there are some more bugs in dahdi_dynamic. If you will apply the patches provided - driver will break OS: on first transmitted packet on module unload Note: provided patch should be extended, as long as struct device->name (in Linux < 3.0 ?) limited to 20 bytes (dev_set_name will concatenate "eth1:12:34:56:12:34:56:0" to "eth1:12:34:56:12:34:". Thank you. By: Shaun Ruffell (sruffell) 2012-03-19 10:39:51.108-0500 I actually was working on this yesterday. With the change Oron made to allow parentless devices, I just needed to name the dynamic devices, but I ran into an issue with a backport on RHEL 4 where I couldn't really just use the bus_id field on the dahdi_device but must set the name on the underlying kobject. I need to test a bit more, but I fully expect this to be fixed in the next release. By: Shaun Ruffell (sruffell) 2012-03-19 10:41:07.149-0500 I just looked at your patch, which was essentially the same thing I had done, but it failed to register on Centos 5.7. By: Shaun Ruffell (sruffell) 2012-03-21 11:50:27.357-0500 Pavel, I've attached 3 patches that I tested here. Hopefully you're around to look at them, but I probably will just commit them to trunk / 2.6, since they are better than what we currently have. I decided to *not* use the dynamic address in the name of the dahdi_device in order to work around to 20 character limit, and there is more work to be done I believe in order to get the devices working when spans aren't auto-assigned, but.... By: Shaun Ruffell (sruffell) 2012-03-21 14:24:03.437-0500 I went ahead and committed these onto [trunk|http://svnview.digium.com/svn/dahdi?view=revision&revision=10563] and the head of the [2.6 branch|http://svnview.digium.com/svn/dahdi?view=revision&revision=10571]. Please let me know if you still have any problems, otherwise they will be in the next release. By: Pavel Selivanov (biohumanoid) 2012-03-21 23:33:56.470-0500 I've compiled dahdi-linux from trunk rev 10584. Now troubles with loading dynamic module, span creation, function with asterisk (pulse, dtmf, voice). On unload, dynamic will crash OS (dynamic_hang.log). By: Shaun Ruffell (sruffell) 2012-03-22 01:25:43.443-0500 What config are you using? I tested this out with on 3.3 and 2.6.18-308 and was able to run pattern test between the local spans and between the machines on the eth spans. Perhaps I should try with your configuration. By: Shaun Ruffell (sruffell) 2012-03-22 01:49:17.084-0500 With the following conf, when loaded with another device on the other side: {noformat} dynamic=eth,eth0/00:0b:db:92:1b:90,24,1 bchan=1-23 dchan=24 {noformat} I get: {noformat} # cat /proc/dahdi/1 Span 1: DYN/eth/eth0/00:0b:db:92:1b:90 "Dynamic 'eth' span at 'eth0/00:0b:db:92:1b:90'" (MASTER) 1 DYN/eth/eth0/00:0b:db:92:1b:90/1 Clear 2 DYN/eth/eth0/00:0b:db:92:1b:90/2 Clear 3 DYN/eth/eth0/00:0b:db:92:1b:90/3 Clear 4 DYN/eth/eth0/00:0b:db:92:1b:90/4 Clear 5 DYN/eth/eth0/00:0b:db:92:1b:90/5 Clear 6 DYN/eth/eth0/00:0b:db:92:1b:90/6 Clear 7 DYN/eth/eth0/00:0b:db:92:1b:90/7 Clear 8 DYN/eth/eth0/00:0b:db:92:1b:90/8 Clear 9 DYN/eth/eth0/00:0b:db:92:1b:90/9 Clear 10 DYN/eth/eth0/00:0b:db:92:1b:90/10 Clear 11 DYN/eth/eth0/00:0b:db:92:1b:90/11 Clear 12 DYN/eth/eth0/00:0b:db:92:1b:90/12 Clear 13 DYN/eth/eth0/00:0b:db:92:1b:90/13 Clear 14 DYN/eth/eth0/00:0b:db:92:1b:90/14 Clear 15 DYN/eth/eth0/00:0b:db:92:1b:90/15 Clear 16 DYN/eth/eth0/00:0b:db:92:1b:90/16 Clear 17 DYN/eth/eth0/00:0b:db:92:1b:90/17 Clear 18 DYN/eth/eth0/00:0b:db:92:1b:90/18 Clear 19 DYN/eth/eth0/00:0b:db:92:1b:90/19 Clear 20 DYN/eth/eth0/00:0b:db:92:1b:90/20 Clear 21 DYN/eth/eth0/00:0b:db:92:1b:90/21 Clear 22 DYN/eth/eth0/00:0b:db:92:1b:90/22 Clear 23 DYN/eth/eth0/00:0b:db:92:1b:90/23 Clear 24 DYN/eth/eth0/00:0b:db:92:1b:90/24 HDLCFCS {noformat} and everything is working fine for me. I can both unload and use dahdi_cfg -s. So I'm curious what is different in your configuration besides the kernel version. By: Pavel Selivanov (biohumanoid) 2012-03-22 01:55:55.341-0500 I'm running Linux 2.6.26 (Debian, x86). dahdi(trunk) is working, but driver will crash OS on dahdi_dynamic_, right here: __module_get(d->driver->owner); I applied owner.patch - now I can load/unload module without troubles (almost). I'll describe this "almost" in next comment, else one more bug. By: Pavel Selivanov (biohumanoid) 2012-03-22 01:57:36.706-0500 dynamic=eth,eth0/00:55:55:55:55:00,30,1 fxsls=8-15 fxols=1-7,16-30 alaw=1-30 loadzone=ru defaultzone=ru By: Shaun Ruffell (sruffell) 2012-03-22 02:10:23.055-0500 Hmm..I do not think the owner patch is good. The owner should be the one that is set in the specific dynamic driver when it's registered. For example, for the eth driver, it's set here: {noformat} static struct dahdi_dynamic_driver ztd_eth = { .owner = THIS_MODULE, .name = "eth", .desc = "Ethernet", .create = ztdeth_create, .destroy = ztdeth_destroy, .transmit = ztdeth_transmit, .flush = ztdeth_flush, }; {noformat} Still thinking about why your setup wouldn't have a valid reference the the module there. By: Shaun Ruffell (sruffell) 2012-03-22 02:18:33.529-0500 Using your configuration works for me on 2.6.18 and 3.3.0. {noformat} Span 1: DYN/eth/eth0/00:0b:db:92:1b:90 "Dynamic 'eth' span at 'eth0/00:0b:db:92:1b:90'" (MASTER) 1 DYN/eth/eth0/00:0b:db:92:1b:90/1 FXOLS 2 DYN/eth/eth0/00:0b:db:92:1b:90/2 FXOLS 3 DYN/eth/eth0/00:0b:db:92:1b:90/3 FXOLS 4 DYN/eth/eth0/00:0b:db:92:1b:90/4 FXOLS 5 DYN/eth/eth0/00:0b:db:92:1b:90/5 FXOLS 6 DYN/eth/eth0/00:0b:db:92:1b:90/6 FXOLS 7 DYN/eth/eth0/00:0b:db:92:1b:90/7 FXOLS 8 DYN/eth/eth0/00:0b:db:92:1b:90/8 FXSLS 9 DYN/eth/eth0/00:0b:db:92:1b:90/9 FXSLS 10 DYN/eth/eth0/00:0b:db:92:1b:90/10 FXSLS 11 DYN/eth/eth0/00:0b:db:92:1b:90/11 FXSLS 12 DYN/eth/eth0/00:0b:db:92:1b:90/12 FXSLS 13 DYN/eth/eth0/00:0b:db:92:1b:90/13 FXSLS 14 DYN/eth/eth0/00:0b:db:92:1b:90/14 FXSLS 15 DYN/eth/eth0/00:0b:db:92:1b:90/15 FXSLS 16 DYN/eth/eth0/00:0b:db:92:1b:90/16 FXOLS 17 DYN/eth/eth0/00:0b:db:92:1b:90/17 FXOLS 18 DYN/eth/eth0/00:0b:db:92:1b:90/18 FXOLS 19 DYN/eth/eth0/00:0b:db:92:1b:90/19 FXOLS 20 DYN/eth/eth0/00:0b:db:92:1b:90/20 FXOLS 21 DYN/eth/eth0/00:0b:db:92:1b:90/21 FXOLS 22 DYN/eth/eth0/00:0b:db:92:1b:90/22 FXOLS 23 DYN/eth/eth0/00:0b:db:92:1b:90/23 FXOLS 24 DYN/eth/eth0/00:0b:db:92:1b:90/24 FXOLS 25 DYN/eth/eth0/00:0b:db:92:1b:90/25 FXOLS 26 DYN/eth/eth0/00:0b:db:92:1b:90/26 FXOLS 27 DYN/eth/eth0/00:0b:db:92:1b:90/27 FXOLS 28 DYN/eth/eth0/00:0b:db:92:1b:90/28 FXOLS 29 DYN/eth/eth0/00:0b:db:92:1b:90/29 FXOLS 30 DYN/eth/eth0/00:0b:db:92:1b:90/30 FXOLS {noformat} By: Pavel Selivanov (biohumanoid) 2012-03-22 02:23:23.872-0500 One more noisy bug: while [ 1 ]; do dahdi_cfg ; sleep 0.1; dahdi_cfg -s; sleep 0.1; done It's a matter of time, how soon will it crash. It will crash OS in dahdi_dynamic_receive. Flow (my opinion): res = dtd->create(d, d->addr); dev_add_pack packet from eth0 dahdi_dynamic_receive kernel crash... If ethernet will receive a packet 0xd00d after dtd->create & before dahdi_register_device - dahdi_dynamic_receive will use/pass unitialized span. I think, it's a good idea to add filed "initialized" in struct dahdi_dynamic and to check it in dahdi_dynamic_receive. By: Pavel Selivanov (biohumanoid) 2012-03-22 02:34:58.844-0500 > Hmm..I do not think the owner patch is good. The owner should be the one that is set in the specific dynamic driver when it's registered. For example, for the eth driver, it's set here: > static struct dahdi_dynamic_driver ztd_eth = { > .owner = THIS_MODULE, Oops, I missed it :-) I thought it's NULL. It's more strange, because, if .owner is dahdi_dynamic_eth - my OS crash. Once I change it to dahdi_dynamic - it's not crashing. >Still thinking about why your setup wouldn't have a valid reference the the module there. I'll double check it. It could be "partial" rebuild. I'm testing current trunk (as far as I understand, you've already patched trunk), I didn't applied any patches. By: Shaun Ruffell (sruffell) 2012-03-22 02:39:25.635-0500 Yes, I see your point. Instead of adding a new flag, what do you think about something using the registered flag that's in the span already. This will not be set until after spanno / channel number assignment so it should be safe: {noformat} diff --git a/drivers/dahdi/dahdi_dynamic_eth.c b/drivers/dahdi/dahdi_dynamic_eth.c index f68a4ea..de75385 100644 --- a/drivers/dahdi/dahdi_dynamic_eth.c +++ b/drivers/dahdi/dahdi_dynamic_eth.c @@ -71,7 +71,7 @@ static struct dahdi_span *ztdeth_getspan(unsigned char *addr, unsigned short sub if (z) span = z->span; spin_unlock_irqrestore(&zlock, flags); - return span; + return (test_bit(DAHDI_FLAGBIT_REGISTERED, &span->flags)) ? span : NULL; } {noformat} By: Shaun Ruffell (sruffell) 2012-03-22 02:49:04.258-0500 That doesn't work. Give me a minute. I'm looking at fixing that crash. By: Shaun Ruffell (sruffell) 2012-03-22 03:01:50.522-0500 Ok, this works for me: {noformat} diff --git a/drivers/dahdi/dahdi_dynamic_eth.c b/drivers/dahdi/dahdi_dynamic_eth.c index f68a4ea..ecf46ea 100644 --- a/drivers/dahdi/dahdi_dynamic_eth.c +++ b/drivers/dahdi/dahdi_dynamic_eth.c @@ -71,6 +71,8 @@ static struct dahdi_span *ztdeth_getspan(unsigned char *addr, unsigned short sub if (z) span = z->span; spin_unlock_irqrestore(&zlock, flags); + if (!span || !test_bit(DAHDI_FLAGBIT_REGISTERED, &span->flags)) + return NULL; return span; } {noformat} Care to try it on your setup? You should be able to apply it via: curl "https://github.com/sruffell/dahdi-linux/commit/95b5fe829.patch" | patch -p1 onto your dahdi-linux directory. By: Pavel Selivanov (biohumanoid) 2012-03-22 08:34:07.831-0500 > Care to try it on your setup? Works perfect ! The troubles with dahdi_cfg -s still persists... {panel:title=system.conf} dynamic=eth,eth0/00:55:55:55:55:00,30,1 fxsls=8-15 fxols=1-7,16-30 alaw=1-30 loadzone=ru defaultzone=ru {panel} dahdi_cfg cat /proc/modules | grep dahdi dahdi_dynamic_eth 4824 0 - Live 0xf8d07000 dahdi_dynamic 9264 1 dahdi_dynamic_eth, Live 0xf8d03000 dahdi 191304 1 dahdi_dynamic, Live 0xf8e04000 crc_ccitt 2080 1 dahdi, Live 0xf8cc7000 reference counter to dahdi_dynamic_eth must be 1 ! I'll take a look tomorrow, what's up. dahdi 2.4.1.2 works perfect on the same machine/kernel... By: Shaun Ruffell (sruffell) 2012-03-22 09:07:27.266-0500 The module refcount should no longer be one unless user space has an open reference. In the 2.5 series there were many changes to allow the drivers to be stopped via "modprobe -r dahdi_dynamic_eth" like all the other board drivers when user space doesn't have any open references. The drivers will unwind (should) unwind themselves. So...this might not be an issue. By: Pavel Selivanov (biohumanoid) 2012-03-22 10:09:55.826-0500 I'm sorry, no problems with "owner". I had dahdi 2.4.1.2 installed and automatically started on boot, and a script to unload dahdi / load dahdi without installing it. I removed dahdi 2.4.0 and some stuff, rebooted machine. Now, dahdi works fine ! As a feedback: 1. I can remove dahdi while asterisk is running rmmod dahdi_dynamic_eth; rmmod dahdi_dynamic; rmmod dahdi; Asterisk received an event, freed /dev/dahdi/channel 2. I can unload dahdi_dynamic without "dahdi_cfg -s", shared pointer implementation (dahdi_get/put) seems to work perfect. BTW: please, take a look at DAHLIN-278 DAHLIN-279. By: Pavel Selivanov (biohumanoid) 2012-03-22 10:10:04.266-0500 Give me a hint please, should I open continue my patch here https://issues.asterisk.org/jira/browse/DAHLIN-26 or open a new ticket ? Seems somebody had simular problem DAHLIN-245, but the patch provided are ugly: 1. in fack, tasklets are not used (on reveive) 2. no fifo. Necessary on high load systems 3. no tdmoe stats (welcome to tcpdump...) I'm maintaining my fifo-based patches for years - it works perfect. As long as dahdi do not support Linux < 2.6.18 - take a look once again. I will rework it (sprintf -> seq_printf, transmit on sync_tick). By: Shaun Ruffell (sruffell) 2012-03-22 10:52:14.183-0500 If you agree, I think continuing it on DAHLIN-26 makes sense (I wasn't a watcher so I didn't see your update in February). Especially since 2.6.9 support was dropped. There should be at least a couple of more months before 2.7 is released. So commit the patch above and close this issue? By: Pavel Selivanov (biohumanoid) 2012-03-22 12:38:58.838-0500 > So commit the patch above and close this issue? yes. Thank you. By: Shaun Ruffell (sruffell) 2012-03-22 13:42:31.402-0500 Committed onto [trunk|http://svnview.digium.com/svn/dahdi?view=rev&rev=10587] and onto the [2.6 branch|http://svnview.digium.com/svn/dahdi?view=rev&rev=10591]. Will be in 2.6.1 when it's released. By: Pavel Selivanov (biohumanoid) 2012-04-02 03:05:37.820-0500 Hi, It's sad, but dynamic is crashing debian lenny with 2.6.26 kernel, right here: __module_get(d->driver->owner); I've installed clean debian lenny on VMWare - it's 100% crashing on unload... I've attached configs/logs/packages as 2012.04.02.tar.gz. PS: Debian Squeeze with 2.6.32 works fine... By: Shaun Ruffell (sruffell) 2012-04-03 00:50:15.496-0500 Pavel, I have 5 changes currently out on my [for-trunk branch|http://git.asterisk.org/gitweb/?p=team/sruffell/dahdi-linux.git;a=shortlog;h=refs/heads/for-trunk]. There are more changes than I would like, but getting rid of the __module_get calls was more involved than I was expecting. I did try this out on debian lenny in addition to 2.6.18 and 3.3. Can you try those changes from the git repo? If not I've attached them all combined into one patch on this issue as dahlin-280.patch. To run from the git repo {noformat} git clone git://git.asterisk.org/team/sruffell/dahdi-linux cd dahdi-linux git checkout -b for-trunk --track origin/for-trunk make install, etc... {noformat} Thanks, Shaun By: Pavel Selivanov (biohumanoid) 2012-04-03 04:33:21.960-0500 Linux 2.6.26 tests span create/shutdown - OK span create/shutdown loop with "sleep 0.1" - OK rmmod witout "dahdi_cfg -s" - OK voice/pulse/dtmf - OK rmmod dahdi_dynamic_eth with running Asterisk - Failed (busy) === patched (for-trunk): lsmod | grep dahdi dahdi_dynamic_eth 4792 30 dahdi_dynamic 9040 31 dahdi_dynamic_eth dahdi 191468 61 dahdi_dynamic crc_ccitt 2080 1 dahdi === unpatched (trunk) dahdi_dynamic_eth 4760 0 dahdi_dynamic 9200 61 dahdi_dynamic_eth dahdi 191304 61 dahdi_dynamic crc_ccitt 2080 1 dahdi rmmod dahdi_dynamic_eth [Apr 3 09:32:09] NOTICE[7790]: chan_dahdi.c:9450 handle_init_event: Got DAHDI_EVENT_REMOVED. Destroying channel 1 By: Shaun Ruffell (sruffell) 2012-04-03 15:15:38.786-0500 Ok..closing again. These changes (with some very slight modifications) are on trunk and the 2.6 branch and will be in the 2.6.1 release again. Thanks for testing (which reminds me...I didn't add your name to the last round of commits...sorry...) By: Pavel Selivanov (biohumanoid) 2012-04-04 00:45:24.640-0500 Probably you've missed it in report, but I can't rmmod dahdi_dynamic_eth while asterisk is running/using DAHDI spans. dahdi_dynamic_eth 4792 30 dahdi_dynamic 9040 31 dahdi_dynamic_eth It's not a crash, so we can leave with it, but it's not so comfortable as it could/should be... Should I open new bugreport ? In r10596, I had dahdi_dynamic_eth 4760 0 dahdi_dynamic 9200 61 dahdi_dynamic_eth and I could rmmod dahdi_dynamic_eth Thank you. Using kref gave a great improvements to DAHDI/Asterisk :-) By: Shaun Ruffell (sruffell) 2012-04-04 10:10:30.337-0500 Actually, my understanding is that the current behavior is the intended behavior. The physical card drivers work this way, and dynamic_spans were made this way in [r9573 "dahdi_dynamic: dynamic drivers should not reference count themselves"|http://svnview.digium.com/svn/dahdi?view=revision&revision=9573]. Normally we try to prevent surprise channel removals while Asterisk is running (since it doesn't handle it very well), but we've (most Tzafrir and Oron) have started trying to accommodate surprise removal since users sometimes yank out the USB devices unintentionally. Does this cause any particular problem for you? By: Pavel Selivanov (biohumanoid) 2012-04-18 07:41:48.003-0500 Hi, I'm sorry for a huge delay with an answer. To be sure I get the idea, current (intended) design: - prevent unloading dahdi while used /dev/dahdi/* - permit unloading dahdi_dynamic without "dahdi -s" Is it clear ? In r10596 I could unload dahdi_dynamic_eth (and, I think, any other driver like wct1xxp) even with /dev/dahdi/* openned. And it was great ! Up to me, it's a bad idea to remove driver while it's used. I'd use: asterisk -rx "module unload -f chan_dahdi.so" But the world is cruel... So I'd prefer to be able to rmmod dahdi_dynamic_eth at any time... modprobe dahdi_dynamic.ko remove_on_rmmod=1 ? |