Summary: | DAHLIN-00134: Server crash after I do ntpdate -u ntp.nasa.gov | ||
Reporter: | missnebun (missnebun) | Labels: | |
Date Opened: | 2009-08-03 22:14:03 | Date Closed: | 2009-11-12 12:38:53.000-0600 |
Priority: | Minor | Regression? | No |
Status: | Closed/Complete | Components: | dahdi_dummy |
Versions: | 2.2.0.2 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Attachments: | ||
Description: | I compile the last version ... the same ... I do ntpdate -u ntpnasa.gov takes like few min with the error and then crash. Aug 3 22:44:34 pbx1 kernel: BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0] Aug 3 22:44:34 pbx1 kernel: Aug 3 22:44:34 pbx1 kernel: Pid: 0, comm: swapper Aug 3 22:44:34 pbx1 kernel: EIP: 0060:[<c060f4d8>] CPU: 1 Aug 3 22:44:34 pbx1 kernel: EIP is at _spin_unlock_irqrestore+0x8/0x9 Aug 3 22:44:34 pbx1 kernel: EFLAGS: 00000246 Tainted: G (2.6.18-128.2.1.el5PAE #1) Aug 3 22:44:34 pbx1 kernel: EAX: f8f737f0 EBX: 00000002 ECX: 00000246 EDX: 00000200 Aug 3 22:44:34 pbx1 kernel: ESI: 00000001 EDI: f8f819e0 EBP: 00000000 DS: 007b ES: 007b Aug 3 22:44:34 pbx1 kernel: CR0: 8005003b CR2: b7d54000 CR3: 0072c000 CR4: 000006f0 Aug 3 22:44:34 pbx1 kernel: [<f8f69b90>] dahdi_receive+0x755/0x776 [dahdi] Aug 3 22:44:34 pbx1 kernel: [<f8f6d147>] dahdi_transmit+0x11/0x48d [dahdi] Aug 3 22:44:34 pbx1 kernel: [<f8efd26b>] dahdi_dummy_timer+0x8b/0xcc [dahdi_dummy] Aug 3 22:44:34 pbx1 kernel: [<f8efd1e0>] dahdi_dummy_timer+0x0/0xcc [dahdi_dummy] Aug 3 22:44:34 pbx1 kernel: [<c042c5b1>] run_timer_softirq+0xfb/0x151 Aug 3 22:44:34 pbx1 kernel: [<c0429047>] __do_softirq+0x87/0x114 Aug 3 22:44:34 pbx1 kernel: [<c04073d7>] do_softirq+0x52/0x9c Aug 3 22:44:34 pbx1 kernel: [<c04059d7>] apic_timer_interrupt+0x1f/0x24 Aug 3 22:44:34 pbx1 kernel: [<c0403bb0>] default_idle+0x0/0x59 Aug 3 22:44:34 pbx1 kernel: [<c0403be1>] default_idle+0x31/0x59 Aug 3 22:44:34 pbx1 kernel: [<c0403ca8>] cpu_idle+0x9f/0xb9 Aug 3 22:44:34 pbx1 kernel: ======================= Aug 3 22:44:44 pbx1 kernel: BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0] Aug 3 22:44:44 pbx1 kernel: Aug 3 22:44:44 pbx1 kernel: Pid: 0, comm: swapper Aug 3 22:44:44 pbx1 kernel: EIP: 0060:[<c060f4d8>] CPU: 1 Aug 3 22:44:44 pbx1 kernel: EIP is at _spin_unlock_irqrestore+0x8/0x9 Aug 3 22:44:44 pbx1 kernel: EFLAGS: 00000246 Tainted: G (2.6.18-128.2.1.el5PAE #1) Aug 3 22:44:44 pbx1 kernel: EAX: f8f737f0 EBX: 00000002 ECX: 00000246 EDX: 00000200 Aug 3 22:44:44 pbx1 kernel: ESI: 00000001 EDI: f8f819e0 EBP: 00000000 DS: 007b ES: 007b Aug 3 22:44:44 pbx1 kernel: CR0: 8005003b CR2: b7d54000 CR3: 0072c000 CR4: 000006f0 Aug 3 22:44:44 pbx1 kernel: [<f8f69b90>] dahdi_receive+0x755/0x776 [dahdi] Aug 3 22:44:44 pbx1 kernel: [<f8f69bad>] dahdi_receive+0x772/0x776 [dahdi] Aug 3 22:44:44 pbx1 kernel: [<f8efd26b>] dahdi_dummy_timer+0x8b/0xcc [dahdi_dummy] Aug 3 22:44:44 pbx1 kernel: [<f8efd1e0>] dahdi_dummy_timer+0x0/0xcc [dahdi_dummy] Aug 3 22:44:44 pbx1 kernel: [<c042c5b1>] run_timer_softirq+0xfb/0x151 Aug 3 22:44:44 pbx1 kernel: [<c0429047>] __do_softirq+0x87/0x114 Aug 3 22:44:44 pbx1 kernel: [<c04073d7>] do_softirq+0x52/0x9c Aug 3 22:44:44 pbx1 kernel: [<c04059d7>] apic_timer_interrupt+0x1f/0x24 Aug 3 22:44:44 pbx1 kernel: [<c0403bb0>] default_idle+0x0/0x59 Aug 3 22:44:44 pbx1 kernel: [<c0403be1>] default_idle+0x31/0x59 Aug 3 22:44:44 pbx1 kernel: [<c0403ca8>] cpu_idle+0x9f/0xb9 Aug 3 22:44:44 pbx1 kernel: ======================= Aug 3 22:44:54 pbx1 kernel: BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0] Aug 3 22:44:54 pbx1 kernel: Aug 3 22:44:54 pbx1 kernel: Pid: 0, comm: swapper Aug 3 22:44:54 pbx1 kernel: EIP: 0060:[<c060f4d8>] CPU: 1 Aug 3 22:44:54 pbx1 kernel: EIP is at _spin_unlock_irqrestore+0x8/0x9 Aug 3 22:44:54 pbx1 kernel: EFLAGS: 00000246 Tainted: G (2.6.18-128.2.1.el5PAE #1) Aug 3 22:44:54 pbx1 kernel: EAX: f8f737f0 EBX: 00000002 ECX: 00000246 EDX: 00000200 Aug 3 22:44:54 pbx1 kernel: ESI: 00000001 EDI: f8f859f0 EBP: 00000000 DS: 007b ES: 007b Aug 3 22:44:54 pbx1 kernel: CR0: 8005003b CR2: b7d54000 CR3: 0072c000 CR4: 000006f0 Aug 3 22:44:54 pbx1 kernel: [<f8f69b90>] dahdi_receive+0x755/0x776 [dahdi] Aug 3 22:44:54 pbx1 kernel: [<f8f69440>] dahdi_receive+0x5/0x776 [dahdi] Aug 3 22:44:54 pbx1 kernel: [<f8efd26b>] dahdi_dummy_timer+0x8b/0xcc [dahdi_dummy] Aug 3 22:44:54 pbx1 kernel: [<f8efd1e0>] dahdi_dummy_timer+0x0/0xcc [dahdi_dummy] Aug 3 22:44:54 pbx1 kernel: [<c042c5b1>] run_timer_softirq+0xfb/0x151 Aug 3 22:44:54 pbx1 kernel: [<c0429047>] __do_softirq+0x87/0x114 Aug 3 22:44:54 pbx1 kernel: [<c04073d7>] do_softirq+0x52/0x9c Aug 3 22:44:54 pbx1 kernel: [<c04059d7>] apic_timer_interrupt+0x1f/0x24 Aug 3 22:44:54 pbx1 kernel: [<c0403bb0>] default_idle+0x0/0x59 Aug 3 22:44:54 pbx1 kernel: [<c0403be1>] default_idle+0x31/0x59 Aug 3 22:44:54 pbx1 kernel: [<c0403ca8>] cpu_idle+0x9f/0xb9 Aug 3 22:44:54 pbx1 kernel: ======================= | ||
Comments: | By: mihpel (mihpel) 2009-08-04 10:41:09 I am facing the same problem with elastix dahdi-2.1.0.4-24 http://bugs.elastix.org/view.php?id=148 By: Shaun Ruffell (sruffell) 2009-08-04 10:54:00 I'll have an update for this later today. My current hypothesis is that since dahdi dummy uses the actual passage of time to know how many times to tick dahdi, that after you run ntpdate, dahdi thinks that it has many many ticks it needs to call in order to catch up. So....I'll fix dahdi_dummy to place an upper limit on the number of times it tries to catch up (which I should have had in there to begin with). My guess is that you could also resolve this by ensuring that ntp is started before dahdi in your init scripts on system boot. By: Digium Subversion (svnbot) 2009-08-04 11:22:56 Repository: dahdi Revision: 6933 U linux/trunk/drivers/dahdi/dahdi_dummy.c ------------------------------------------------------------------------ r6933 | sruffell | 2009-08-04 11:22:55 -0500 (Tue, 04 Aug 2009) | 10 lines dahdi_dummy: Do not allow jumps in system time to lock up the system. Since dahdi_dummy uses the number of milliseconds that has actually passed to determine how many times to call dahdi_receive, it is possible that if the system time shifts after dahdi is started, that the system can appear to lock up while dahdi_dummy attempts to catch up. This change prevents soft lock ups under these conditions. (closes issue DAHLIN-134) Reported by: missnebun ------------------------------------------------------------------------ http://svn.digium.com/view/dahdi?view=rev&revision=6933 By: Digium Subversion (svnbot) 2009-08-04 11:24:40 Repository: dahdi Revision: 6934 _U linux/branches/2.2/ U linux/branches/2.2/drivers/dahdi/dahdi_dummy.c ------------------------------------------------------------------------ r6934 | sruffell | 2009-08-04 11:24:40 -0500 (Tue, 04 Aug 2009) | 18 lines Merged revisions 6933 via svnmerge from https://origsvn.digium.com/svn/dahdi/linux/trunk ........ r6933 | sruffell | 2009-08-04 11:22:39 -0500 (Tue, 04 Aug 2009) | 10 lines dahdi_dummy: Do not allow jumps in system time to lock up the system. Since dahdi_dummy uses the number of milliseconds that has actually passed to determine how many times to call dahdi_receive, it is possible that if the system time shifts after dahdi is started, that the system can appear to lock up while dahdi_dummy attempts to catch up. This change prevents soft lock ups under these conditions. (closes issue DAHLIN-134) Reported by: missnebun ........ ------------------------------------------------------------------------ http://svn.digium.com/view/dahdi?view=rev&revision=6934 By: missnebun (missnebun) 2009-08-04 23:41:49 I know is a stupid questions ... but how I apply this fix ... ? I have no idea about svn ... cand you please provide me with the command. Thank you By: Shaun Ruffell (sruffell) 2009-08-05 09:15:20 No worries....that isn't a stupid question. I've committed the change to the head of the dahdi-linux 2.2 branch. So what would be easiest: > svn co http://svn.asterisk.org/svn/dahdi/linux/branches/2.2 dahdi-linux-2.2 > cd dahdi-linux-2.2 > make install > /etc/init.d/dahdi stop > /etc/init.d/dahdi start By: Shaun Ruffell (sruffell) 2009-09-23 23:02:18 Will be in 2.3.0 release and any 2.2 release after the 2.2.0.2. By: Digium Subversion (svnbot) 2009-10-29 13:31:18 Repository: dahdi Revision: 7437 U linux/trunk/drivers/dahdi/dahdi-base.c ------------------------------------------------------------------------ r7437 | sruffell | 2009-10-29 13:31:17 -0500 (Thu, 29 Oct 2009) | 10 lines dahdi-base: Do not allow jumps in system time to lock up the system w/core_timer Since dahdi coretimer uses the number of milliseconds that has actually passed to determine how many times to call dahdi_receive, it is possible that if the system time shifts after dahdi is started, that the system can appear to lock up while the core timer attempts to catch up. This change prevents soft lock ups under these conditions. This is brings the dahdi_dummy changes in r6933 into dahdi-base. (related to issue DAHLIN-134) ------------------------------------------------------------------------ http://svn.digium.com/view/dahdi?view=rev&revision=7437 |