[Home]

Summary:ASTERISK-00670: zaptel driver hard-locks kernel
Reporter:Andrew Kohlsmith (akohlsmith)Labels:
Date Opened:2003-12-16 12:14:21.000-0600Date Closed:2008-06-07 10:33:47
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:This is the second time this has happened in a week.  I have a Duron1300 in an ECS L7VMM2 motherboard.  The only addon card in it is the T100P and it's connected to a Carrier Access Channel Bank 1 (12FXS/12FXO).  

The *only* unusual thing I can say about this system is that I do run UML on it for our intranet (the old intranet box died and we're slowly moving services over, running it in user-mode linux was the quickest fix).

Anyway I get a kernel panic now and again and ksymoops points to zaptel.  Both crashes have _identical_ call traces, although the processor registers have been different.

When I reset the box and it boots up, my channel bank is in a very funky state -- channels are either very choppy and jittery or they are all unavailable (CB itself shows all idle, but * can't place any calls -- fast busies) -- I power-cycle the CB1 and everything's normal again.

The call trace is as follows:
>>EIP; c0115c80 <__wake_up+20/60>   <=====

>>esi; c2158c50 <_end+1d7faec/1fc3aefc>
>>ebp; c88a7a90 <_end+84ce92c/1fc3aefc>
>>esp; c88a7a7c <_end+84ce918/1fc3aefc>

Trace; e00548d8 <[zaptel]process_timers+38/50>
Trace; e004fe2c <[zaptel]zt_receive+6c/f10>
Trace; e007d820 <[wct1xxp]t1xxp_receiveprep+190/360>
Trace; e007c942 <[wct1xxp]t1xxp_interrupt+1b2/1f0>
Trace; c013aa40 <__block_prepare_write+1d0/330>
Trace; c011a12d <qm_refs+13d/190>
Trace; c010a2a8 <do_IRQ+68/b0>
Trace; c010ca48 <call_do_IRQ+5/d>
Trace; c012c8c2 <do_generic_file_write+252/3e0>
Trace; c012cd53 <generic_file_write+103/120>
Trace; c01610b2 <ext3_file_write+22/c0>

Code;  c0115c80 <__wake_up+20/60>
00000000 <_EIP>:
Code;  c0115c80 <__wake_up+20/60>   <=====
  0:   8b 53 fc                  mov    0xfffffffc(%ebx),%edx   <=====
Code;  c0115c83 <__wake_up+23/60>
  3:   8b 02                     mov    (%edx),%eax
Code;  c0115c85 <__wake_up+25/60>
  5:   85 c7                     test   %eax,%edi
Code;  c0115c87 <__wake_up+27/60>
  7:   75 17                     jne    20 <_EIP+0x20>
Code;  c0115c89 <__wake_up+29/60>
  9:   8b 16                     mov    (%esi),%edx
Code;  c0115c8b <__wake_up+2b/60>
  b:   39 f3                     cmp    %esi,%ebx
Code;  c0115c8d <__wake_up+2d/60>
  d:   75 f1                     jne    0 <_EIP>
Code;  c0115c8f <__wake_up+2f/60>
  f:   ff 75 f0                  pushl  0xfffffff0(%ebp)
Code;  c0115c92 <__wake_up+32/60>
 12:   9d                        popf
Code;  c0115c93 <__wake_up+33/60>
 13:   8d 00                     lea    (%eax),%eax

<0>Kernel panic: Aiee, killing interrupt handler!
Comments:By: Andrew Kohlsmith (akohlsmith) 2003-12-16 12:16:13.000-0600

I forgot to mention: zaptel is CVS from Dec 11.  Same for libpri and asterisk itself.

new-intra*CLI> show version
Asterisk CVS-11/05/03-02:13:03 built by andrew@new-intra on a i686 running Linux

By: zoa (zoa) 2003-12-16 12:27:03.000-0600

i've had it twice on 2 different servers in a period of 4 months, one with TE410p, the other server with an X100p.

i use kernel 2.4.20 on both machines.

By: Brian West (bkw918) 2004-01-07 00:11:23.000-0600

Any other input on this?

By: Andrew Kohlsmith (akohlsmith) 2004-01-07 08:00:54.000-0600

Just a datapoint -- I have had this happen about once or twice a week since I originally reported it.  Call traces are always the same.  It's almost as if it'll die when it's been doing HDD activity and gets a zaptel interrupt.  The IDE controller and the T100P are not on the same interrupt (the T100P is on its own entirely, as is the IDE controller).

I hope to test this theory soon by creating metric buttloads of IDE activity and seeing if the crash occurs more often.

By: zoa (zoa) 2004-01-07 08:08:25.000-0600

i have no ide stuff in my server... happens anyway, so i don't think that can be the problem.

By: Brian West (bkw918) 2004-01-10 17:57:46.000-0600

Make sure you do not have MMX turned on when you compile zaptel and report those findings.

By: zoa (zoa) 2004-01-10 19:30:10.000-0600

there is no such thing as mmx in the zaptel Makefile

By: zoa (zoa) 2004-01-10 19:38:27.000-0600

so you mean zconfig.h because he is using duron cpu

By: Andrew Kohlsmith (akohlsmith) 2004-01-12 05:36:48.000-0600

I *do* have that set.

# Define if you want MMX optimizations in zaptel
#
KFLAGS+=-DCONFIG_ZAPTEL_MMX

however I was under the impression that my CPU knew what MMX was:

# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2000+
stepping        : 1
cpu MHz         : 1673.812
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3342.33

(you'll note the 'mmx' in the flags)

I will remove that though to see if it makes a difference.

By: jrollyson (jrollyson) 2004-01-14 03:14:33.000-0600

Did that define fix the issue?

By: James Golovich (jamesgolovich) 2004-01-15 11:00:00.000-0600

It's pretty much a rule (unwritten afaik) that if you have an athlon enabled kernel you should not enable MMX support in zaptel.

By: Malcolm Davenport (mdavenport) 2004-01-15 11:19:20.000-0600

In my experience, the problem seemed to only arise on Athlon chips that included SSE optimizations, i.e XP, and latter MPs, in the CPU and in the Kernel.  I have an old Thunderbird-core Athlon Socket A, no SSE, that works just fine with MMX enabled in Zaptel and the Kernel compiled for Athlon cpu type.  Not a hard rule, just sharing what I've seen.

By: James Golovich (jamesgolovich) 2004-01-15 12:48:10.000-0600

Does anyone know what the real cause of this is, or a way to test for it?  If there was then it would be simple to put together a program that tests for the best build options (or allow them to be specified manually in case of cross-compiling)

or perhaps just some general hard rules in the zapconf.h or wherever these things are defined now.  Like checking if CONFIG_MK7 then don't set MMX

By: Mark Spencer (markster) 2004-02-06 22:25:32.000-0600

I did at least document the incompatibility but I don't know how to try to make it work better.

By: Digium Subversion (svnbot) 2008-06-07 10:33:47

Repository: dahdi
Revision: 310

U   trunk/zconfig.h

------------------------------------------------------------------------
r310 | markster | 2008-06-07 10:33:46 -0500 (Sat, 07 Jun 2008) | 2 lines

Warn of AMD incompatibility with MMX in zconfig.h (bug ASTERISK-670)

------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=310