[Home]

Summary:DAHLIN-00134: Server crash after I do ntpdate -u ntp.nasa.gov
Reporter:missnebun (missnebun)Labels:
Date Opened:2009-08-03 22:14:03Date Closed:2009-11-12 12:38:53.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:dahdi_dummy
Versions:2.2.0.2 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:I compile the last version ... the same ...
I do ntpdate -u ntpnasa.gov takes like few min with the error and then crash.

Aug  3 22:44:34 pbx1 kernel: BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]
Aug  3 22:44:34 pbx1 kernel:
Aug  3 22:44:34 pbx1 kernel: Pid: 0, comm:              swapper
Aug  3 22:44:34 pbx1 kernel: EIP: 0060:[<c060f4d8>] CPU: 1
Aug  3 22:44:34 pbx1 kernel: EIP is at _spin_unlock_irqrestore+0x8/0x9
Aug  3 22:44:34 pbx1 kernel:  EFLAGS: 00000246    Tainted: G       (2.6.18-128.2.1.el5PAE #1)
Aug  3 22:44:34 pbx1 kernel: EAX: f8f737f0 EBX: 00000002 ECX: 00000246 EDX: 00000200
Aug  3 22:44:34 pbx1 kernel: ESI: 00000001 EDI: f8f819e0 EBP: 00000000 DS: 007b ES: 007b
Aug  3 22:44:34 pbx1 kernel: CR0: 8005003b CR2: b7d54000 CR3: 0072c000 CR4: 000006f0
Aug  3 22:44:34 pbx1 kernel:  [<f8f69b90>] dahdi_receive+0x755/0x776 [dahdi]
Aug  3 22:44:34 pbx1 kernel:  [<f8f6d147>] dahdi_transmit+0x11/0x48d [dahdi]
Aug  3 22:44:34 pbx1 kernel:  [<f8efd26b>] dahdi_dummy_timer+0x8b/0xcc [dahdi_dummy]
Aug  3 22:44:34 pbx1 kernel:  [<f8efd1e0>] dahdi_dummy_timer+0x0/0xcc [dahdi_dummy]
Aug  3 22:44:34 pbx1 kernel:  [<c042c5b1>] run_timer_softirq+0xfb/0x151
Aug  3 22:44:34 pbx1 kernel:  [<c0429047>] __do_softirq+0x87/0x114
Aug  3 22:44:34 pbx1 kernel:  [<c04073d7>] do_softirq+0x52/0x9c
Aug  3 22:44:34 pbx1 kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
Aug  3 22:44:34 pbx1 kernel:  [<c0403bb0>] default_idle+0x0/0x59
Aug  3 22:44:34 pbx1 kernel:  [<c0403be1>] default_idle+0x31/0x59
Aug  3 22:44:34 pbx1 kernel:  [<c0403ca8>] cpu_idle+0x9f/0xb9
Aug  3 22:44:34 pbx1 kernel:  =======================
Aug  3 22:44:44 pbx1 kernel: BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]
Aug  3 22:44:44 pbx1 kernel:
Aug  3 22:44:44 pbx1 kernel: Pid: 0, comm:              swapper
Aug  3 22:44:44 pbx1 kernel: EIP: 0060:[<c060f4d8>] CPU: 1
Aug  3 22:44:44 pbx1 kernel: EIP is at _spin_unlock_irqrestore+0x8/0x9
Aug  3 22:44:44 pbx1 kernel:  EFLAGS: 00000246    Tainted: G       (2.6.18-128.2.1.el5PAE #1)
Aug  3 22:44:44 pbx1 kernel: EAX: f8f737f0 EBX: 00000002 ECX: 00000246 EDX: 00000200
Aug  3 22:44:44 pbx1 kernel: ESI: 00000001 EDI: f8f819e0 EBP: 00000000 DS: 007b ES: 007b
Aug  3 22:44:44 pbx1 kernel: CR0: 8005003b CR2: b7d54000 CR3: 0072c000 CR4: 000006f0
Aug  3 22:44:44 pbx1 kernel:  [<f8f69b90>] dahdi_receive+0x755/0x776 [dahdi]
Aug  3 22:44:44 pbx1 kernel:  [<f8f69bad>] dahdi_receive+0x772/0x776 [dahdi]
Aug  3 22:44:44 pbx1 kernel:  [<f8efd26b>] dahdi_dummy_timer+0x8b/0xcc [dahdi_dummy]
Aug  3 22:44:44 pbx1 kernel:  [<f8efd1e0>] dahdi_dummy_timer+0x0/0xcc [dahdi_dummy]
Aug  3 22:44:44 pbx1 kernel:  [<c042c5b1>] run_timer_softirq+0xfb/0x151
Aug  3 22:44:44 pbx1 kernel:  [<c0429047>] __do_softirq+0x87/0x114
Aug  3 22:44:44 pbx1 kernel:  [<c04073d7>] do_softirq+0x52/0x9c
Aug  3 22:44:44 pbx1 kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
Aug  3 22:44:44 pbx1 kernel:  [<c0403bb0>] default_idle+0x0/0x59
Aug  3 22:44:44 pbx1 kernel:  [<c0403be1>] default_idle+0x31/0x59
Aug  3 22:44:44 pbx1 kernel:  [<c0403ca8>] cpu_idle+0x9f/0xb9
Aug  3 22:44:44 pbx1 kernel:  =======================
Aug  3 22:44:54 pbx1 kernel: BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]
Aug  3 22:44:54 pbx1 kernel:
Aug  3 22:44:54 pbx1 kernel: Pid: 0, comm:              swapper
Aug  3 22:44:54 pbx1 kernel: EIP: 0060:[<c060f4d8>] CPU: 1
Aug  3 22:44:54 pbx1 kernel: EIP is at _spin_unlock_irqrestore+0x8/0x9
Aug  3 22:44:54 pbx1 kernel:  EFLAGS: 00000246    Tainted: G       (2.6.18-128.2.1.el5PAE #1)
Aug  3 22:44:54 pbx1 kernel: EAX: f8f737f0 EBX: 00000002 ECX: 00000246 EDX: 00000200
Aug  3 22:44:54 pbx1 kernel: ESI: 00000001 EDI: f8f859f0 EBP: 00000000 DS: 007b ES: 007b
Aug  3 22:44:54 pbx1 kernel: CR0: 8005003b CR2: b7d54000 CR3: 0072c000 CR4: 000006f0
Aug  3 22:44:54 pbx1 kernel:  [<f8f69b90>] dahdi_receive+0x755/0x776 [dahdi]
Aug  3 22:44:54 pbx1 kernel:  [<f8f69440>] dahdi_receive+0x5/0x776 [dahdi]
Aug  3 22:44:54 pbx1 kernel:  [<f8efd26b>] dahdi_dummy_timer+0x8b/0xcc [dahdi_dummy]
Aug  3 22:44:54 pbx1 kernel:  [<f8efd1e0>] dahdi_dummy_timer+0x0/0xcc [dahdi_dummy]
Aug  3 22:44:54 pbx1 kernel:  [<c042c5b1>] run_timer_softirq+0xfb/0x151
Aug  3 22:44:54 pbx1 kernel:  [<c0429047>] __do_softirq+0x87/0x114
Aug  3 22:44:54 pbx1 kernel:  [<c04073d7>] do_softirq+0x52/0x9c
Aug  3 22:44:54 pbx1 kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
Aug  3 22:44:54 pbx1 kernel:  [<c0403bb0>] default_idle+0x0/0x59
Aug  3 22:44:54 pbx1 kernel:  [<c0403be1>] default_idle+0x31/0x59
Aug  3 22:44:54 pbx1 kernel:  [<c0403ca8>] cpu_idle+0x9f/0xb9
Aug  3 22:44:54 pbx1 kernel:  =======================
Comments:By: mihpel (mihpel) 2009-08-04 10:41:09

I am facing the same problem with elastix dahdi-2.1.0.4-24
http://bugs.elastix.org/view.php?id=148

By: Shaun Ruffell (sruffell) 2009-08-04 10:54:00

I'll have an update for this later today.  My current hypothesis is that since dahdi dummy uses the actual passage of time to know how many times to tick dahdi, that after you run ntpdate, dahdi thinks that it has many many ticks it needs to call in order to catch up.  So....I'll fix dahdi_dummy to place an upper limit on the number of times it tries to catch up (which I should have had in there to begin with).

My guess is that you could also resolve this by ensuring that ntp is started before dahdi in your init scripts on system boot.

By: Digium Subversion (svnbot) 2009-08-04 11:22:56

Repository: dahdi
Revision: 6933

U   linux/trunk/drivers/dahdi/dahdi_dummy.c

------------------------------------------------------------------------
r6933 | sruffell | 2009-08-04 11:22:55 -0500 (Tue, 04 Aug 2009) | 10 lines

dahdi_dummy: Do not allow jumps in system time to lock up the system.

Since dahdi_dummy uses the number of milliseconds that has actually passed to
determine how many times to call dahdi_receive, it is possible that if the
system time shifts after dahdi is started, that the system can appear to lock
up while dahdi_dummy attempts to catch up.  This change prevents soft lock ups
under these conditions.

(closes issue DAHLIN-134)
Reported by: missnebun
------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=6933

By: Digium Subversion (svnbot) 2009-08-04 11:24:40

Repository: dahdi
Revision: 6934

_U  linux/branches/2.2/
U   linux/branches/2.2/drivers/dahdi/dahdi_dummy.c

------------------------------------------------------------------------
r6934 | sruffell | 2009-08-04 11:24:40 -0500 (Tue, 04 Aug 2009) | 18 lines

Merged revisions 6933 via svnmerge from
https://origsvn.digium.com/svn/dahdi/linux/trunk

........
 r6933 | sruffell | 2009-08-04 11:22:39 -0500 (Tue, 04 Aug 2009) | 10 lines
 
 dahdi_dummy: Do not allow jumps in system time to lock up the system.
 
 Since dahdi_dummy uses the number of milliseconds that has actually passed to
 determine how many times to call dahdi_receive, it is possible that if the
 system time shifts after dahdi is started, that the system can appear to lock
 up while dahdi_dummy attempts to catch up.  This change prevents soft lock ups
 under these conditions.
 
 (closes issue DAHLIN-134)
 Reported by: missnebun
........

------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=6934

By: missnebun (missnebun) 2009-08-04 23:41:49

I know is a stupid questions ... but how I apply this fix ... ? I have no idea about svn ... cand you please provide me with the command.

Thank you

By: Shaun Ruffell (sruffell) 2009-08-05 09:15:20

No worries....that isn't a stupid question.

I've committed the change to the head of the dahdi-linux 2.2 branch.  So what would be easiest:

> svn co http://svn.asterisk.org/svn/dahdi/linux/branches/2.2 dahdi-linux-2.2
> cd dahdi-linux-2.2
> make install
> /etc/init.d/dahdi stop
> /etc/init.d/dahdi start

By: Shaun Ruffell (sruffell) 2009-09-23 23:02:18

Will be in 2.3.0 release and any 2.2 release after the 2.2.0.2.

By: Digium Subversion (svnbot) 2009-10-29 13:31:18

Repository: dahdi
Revision: 7437

U   linux/trunk/drivers/dahdi/dahdi-base.c

------------------------------------------------------------------------
r7437 | sruffell | 2009-10-29 13:31:17 -0500 (Thu, 29 Oct 2009) | 10 lines

dahdi-base: Do not allow jumps in system time to lock up the system w/core_timer

Since dahdi coretimer uses the number of milliseconds that has actually passed
to determine how many times to call dahdi_receive, it is possible that if the
system time shifts after dahdi is started, that the system can appear to lock
up while the core timer attempts to catch up.  This change prevents soft lock
ups under these conditions.  This is brings the dahdi_dummy changes in r6933
into dahdi-base.

(related to issue DAHLIN-134)
------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=7437