[Home]

Summary:ASTERISK-01315: Zaptel module makes system unstable
Reporter:zoa (zoa)Labels:
Date Opened:2004-03-31 03:55:45.000-0600Date Closed:2011-06-07 14:05:32
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:This is kind of a weird bug report i think, but i have reasons to believe that the current zaptel makes servers unstable by causing random segfaults.



****** ADDITIONAL INFORMATION ******

I noticed that on one of my servers i had some issues with random segfaults of simple commands like ping, ls, mc, ftp etc.

I thought i had bad memory, so i did a memory check, found nothing, thought cpu was overheating so checked the temperature, nothing seemed wrong, so i thought it would be the mainboard.


Then suddenly i noticed the same behaviour on a second server, did some quick checks and noticed that as long as i didnt load any zaptel modules the server is rockstable.

I was running: Asterisk CVS-03/27/04-11:26:53 on the two servers, both are full intel dual xeons (different mobos, cpu's mem, both scsi).

I'm using wct4xxp.so and zaptel.so on kernel 2.4.18

Anyone else experiencing weird behaviour ?

Comments:By: alric (alric) 2004-03-31 12:39:34.000-0600

Were the zaptel drivers updated on 3/27/04 as well, or are they older than the asterisk version mentioned?

By: zoa (zoa) 2004-03-31 13:02:08.000-0600

they were updated as well afaik

By: Mark Spencer (markster) 2004-03-31 14:13:21.000-0600

Did you compile for SMP?  Does the problem go away if you run UP?

By: Mark Spencer (markster) 2004-03-31 14:21:20.000-0600

Also, can you confirm you do NOT have MMX optimization turned on in the Makefile?

By: zoa (zoa) 2004-04-01 03:26:53.000-0600

the box was taken out of the colo to go into RMA, it will be placed back tonight i think and i'll have a closer look at it.

By: zoa (zoa) 2004-04-01 09:19:17.000-0600

hmmz, i tried todays cvs, when doing insmod zaptel, then insmod wct4xxp, then doing ztcfg
Boom
Kernel panic.

By: zoa (zoa) 2004-04-01 09:22:52.000-0600

hmm, no kernel panic, box just hung, last thing on the screen is the standard ztcfg output that it found the xilinx.
(tried it twice already, hung twice... fs fuxored.)

By: zoa (zoa) 2004-04-01 09:53:45.000-0600

Looks like some people on the mailinglist have the same issues today.


To: asterisk-users@lists.digium.com
Subject: Re: [Asterisk-Users] Zaptel/PRI problem



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Thursday 01 April 2004 10:35, Mickey Binder wrote:
> >I used the following in the zaptel cvs directory to roll back the zaptel
> >sources:
> >cvs up -D "2004-03-05 10:28"
> Do you have to recompile both zaptel and asterisk in order to use the older
> zaptel?
> If I try to recompile after issuing the cvs up -D "2004-03-05 10:28"
> command the compiler tells my I need a newer zaptel. I therefore downgraded
> my Asterisk to same date, but are there any other options?


Yeah, that happened to me as well today. So I tried the latest zaptel version
which is currently locking up my system. I'm currently trying to get some
debug output from zaptel/wct4xxp for one of the Asterisk devs.

By: zoa (zoa) 2004-04-01 09:59:32.000-0600

oké, asterisk doesnt lockup with yesterdays cvs, looks like citats' smp patch has some special effects :)

By: thansen (thansen) 2004-04-01 10:09:01.000-0600

It still locks up my system, as written in mailinglist. I am using MMX optimizations however. Is this bad?

By: zoa (zoa) 2004-04-01 10:10:38.000-0600

now i installed yesterdays cvs, (cvs checkout -D yesterday) , MMX disabled and now it no longer hardlocks the server, it just makes sure asterisk hangs when a first call attempt is made.

By: Mark Spencer (markster) 2004-04-02 02:17:11.000-0600

please be sure you did a make clean ; make install on all of zaptel, libpri, and asterisk.

By: zoa (zoa) 2004-04-02 03:06:13.000-0600

i removed /usr/src before grabbing the goodies.
i'm now running last weeks libpri and zaptel, and yesterdays version of asterisk, that seems to work.

By: Mark Spencer (markster) 2004-04-02 10:47:36.000-0600

And again, can you test latest CVS and see if it's broken.  Again please try just zaptel, then zaptel and libpri, then all three, so we can narrow down the problem.  in each case, be sure to "make clean ; make install" after updating and in the case of zaptel be sure to unload both your wct4xxp *and* zaptel modules.  With recent CVS, I will also want to pay close attention to the timing.

By: zoa (zoa) 2004-04-02 11:20:15.000-0600

i just played around with it all afternoon,

somehow i feel that it might be something very weird, as in:
I tried to revert to a version that was stable for me and still on my hd in /usr/src/asterisk-23mar.

when i recompile it (both libpri, and wct4xxp and asterisk), now it always hangs on the first call.
I did a make clean
removed all asterisk files i could find in /usr/bin/ /usr/sbin/ /usr/lib/
then did a make libpri, make install, make zaptel make install, make asterisk, make install.

if i have enough time left (and the fsck stops fast enough) i'll try again with latest cvs today.

By: zoa (zoa) 2004-04-02 11:21:49.000-0600

oh, btw on the other unstable server i was using kernel 2.4.20 so that doesnt seem to influence it.

Im on debian woody 3.0 with gcc 2.95.

Would changing the compile flags influence the results ?  (I could not find any info on -O6.)

By: zoa (zoa) 2004-04-02 11:26:52.000-0600

this starts to look like this bug: (but then on another server).

http://bugs.digium.com/bug_view_page.php?bug_id=0001224

By: James Golovich (jamesgolovich) 2004-04-02 12:24:07.000-0600

I've not seen this issue on any of my boxes.  If reverting to a version that was stable before doesn't work then I strongly suspect asterisk is fine and its probably related to the system.  When your reverting to older versions, are you also reverting to older versions of libpri and zaptel?

Unless a newer module was added to one of the releases and it was never removed.  So you might want to 'rm /usr/lib/asterisk/modules/*' and recompile/install asterisk

By: Mark Spencer (markster) 2004-04-02 12:54:23.000-0600

Fixed in CVS

By: zoa (zoa) 2004-04-03 03:41:49.000-0600

i had some more fun with the box,

Looks like bad voodoo happens on restart or shutdown:

*CLI> stop now
Beginning asterisk shutdown....
Executing last minute cleanups
 == Destroying any remaining musiconhold processes
Yuck! Error in buffer handling...: Connection reset by peer
Yuck! Error in buffer handling...: Connection reset by peer
Yuck! Error in buffer handling...: Connection reset by peer
Asterisk cleanly ending (0).

-> was like this some minutes, meanwhile ps -auwx showed defunct asterisk threads and a 99,9% asterisk process.

when doing a ctrl-c, it showed me once
Beginning asterisk shutdown....
so it was not completely unresponsive yet.

I was able to killall asterisk, killing all threads (also the CLI thread) except for 1 thread that took 99,9% cpu.

By: zoa (zoa) 2004-04-03 05:02:48.000-0600

maybe related, maybe not: (this was on a ctrl-c or a stop now, dont know for sure.)

And again it leaves 3 99,9% cpu processes.


Reading symbols from /usr/lib/asterisk/modules/app_nbscat.so...done.
Loaded symbols for /usr/lib/asterisk/modules/app_nbscat.so
Reading symbols from /usr/lib/asterisk/modules/format_g726.so...done.
Loaded symbols for /usr/lib/asterisk/modules/format_g726.so
#0  0x40021e90 in pthread_mutex_lock () from /lib/libpthread.so.0
(gdb)
(gdb) bt full
#0  0x40021e90 in pthread_mutex_lock () from /lib/libpthread.so.0
No symbol table info available.
#1  0x4010da8b in free () from /lib/libc.so.6
No symbol table info available.
#2  0x08093a0a in el_end (el=0x80d31c8) at el.c:125
       el = (EditLine *) 0x80d31c8
#3  0x08082a3c in quit_handler (num=2, nice=0, safeshutdown=1, restart=0)
   at asterisk.c:549
       nice = 0
       safeshutdown = -1073746744
       restart = 0
       filename = "/root/.asterisk_history", '\0' <repeats 56 times>
       s = 1080985669
       e = 1080985669
       x = -1073746744
#4  0x08085701 in __quit_handler (num=2) at asterisk.c:597
       num = -1069183166
ASTERISK-1  0x40023f54 in pthread_sighandler () from /lib/libpthread.so.0
No symbol table info available.
ASTERISK-2  0x400c86b8 in sigaction () from /lib/libc.so.6
No symbol table info available.
ASTERISK-3  0x08095d3f in read_char (el=0x80d31c8, cp=0xbffff1ab "") at read.c:302
       el = (EditLine *) 0x0
---Type <return> to continue, or q <return> to quit---
       cp = 0x0
       num_read = -1069183166
       tried = 0
ASTERISK-4  0x08095ddd in el_getc (el=0x80d31c8, cp=0xbffff1ab "") at read.c:347
       el = (EditLine *) 0x80d31c8
       cp = 0xbffff1ab ""
       num_read = -1069183166
       ma = (c_macro_t *) 0x80d3454
ASTERISK-5  0x08095c4a in read_getcmd (el=0x80d31c8, cmdnum=0xbffff1aa "",
   ch=0xbffff1ab "") at read.c:243
       el = (EditLine *) 0x80d31c8
       cmdnum = (el_action_t *) 0xc0458f42 <Address 0xc0458f42 out of bounds>
       ch = 0xbffff1ab ""
       cmd = 255 'ÿ'
       num = -1069183166
ASTERISK-6 0x08096085 in el_gets (el=0x80d31c8, nread=0xbffff1f0) at read.c:443
       el = (EditLine *) 0x80d31c8
       retval = 1075530176
       cmdnum = 0 '\0'
       num = -1
       ch = 0 '\0'
ASTERISK-7 0x08085216 in main (argc=2, argv=0xbffff584) at asterisk.c:1635
       title = "Asterisk Console on 'imroVOIP1' (pid 1105)\0@ÃÃ\n@\fóÿ¿Òd\0@ì0\---Type <return> to continue, or q <return> to quit---
001@xV\n@Èõ\002@ql\0@ì0\001@Ï\a\0\0Ð<\001@ql\0@\0\0\0\0&cedil;\004\0\0Ð<\001@\fã\001@\234Ó\001@L¿\001@Ð<\001@\0\0\0\0\0\0\0\0\b\0\0\0<ø\002@\002\0\0\0&acute;òÿ¿xV\n@\f÷\002@{{\027\006{{\027\006\bóÿ¿Èõ\002@ÃÃ\n@\024>\001@d¬E\006d¬E\006Xóÿ¿,É\001@Ð<\001@\fóÿ¿2\226\0@"...
       argc = -1073745412
       c = 1075530176
       filename = "/root/.asterisk_history", '\0' <repeats 56 times>
       hostname = "imroVOIP1\0\002@\006\0\0\0\034ó\n@Àôÿ¿<o\0@\0\0\0\0Áò\n@\230ôÿ¿ql\0@ãÎ\004\b\0.\0\0H\237\n@Ý\a\0\0ø¢\n@x!\n@Èõ\002@\b\0\0\0<ø\002@\006\0\0\0Hôÿ¿\024\236\004\b&frac14;7\001@\017S\216\a\017S\216\aÜôÿ¿x6\001@)Ó\004\bÈõ\002@\b\0\0\0H\237\n@Èõ\002@xôÿ¿\230k\n@\f÷\002@taß\003taß\003\fõÿ¿Èõ\002@øÐ\n@Èõ\002@\b\0\0\0\230k\n@>\\\002@àê\002@ ~\e@\002\0\0\0\216ÿw\001àôÿ¿"...
       tmp = "\e[1;37;40mAsterisk Ready.\n\e[0;37;40m\0U\n@\0\0\0\0\202\211¹\n\034ó\n@\220ôÿ¿<o\0@\225ò\n@\223ý\004\bÝ\a\0\0ql\0@ì0\001@"
       xarg = 0x0
       x = 1075530176
       f = (FILE *) 0xbffff1fc
       sigs = {__val = {134238211, 0 <repeats 31 times>}}
       num = 1770
       buf = 0xbffff1fc "Asterisk Console on 'imroVOIP1' (pid 1105)"

By: zoa (zoa) 2004-04-03 11:26:05.000-0600

hmmz
now i noticed that asterisk has 2 threads at 99% cpu however, asterisk is still working fine.

(As long as i don't do a restart now or a stop now, as that will cause asterisk to stop but the ports for iax2 etc will stay in use).

By: Mark Spencer (markster) 2004-04-03 20:58:08.000-0600

The last crash was unrelated.  That's just because editline doesn't exit cleanly always.  Anyway if you attach gdb to Asterisk when it's taking 99% CPU what do you see?

By: Mark Spencer (markster) 2004-04-04 12:32:37

zoa: did you have a TDM card in this system too?  If so, please cvs update to latest zaptel.  I found a problem which might have led to some problems in the TDM driver.

By: zoa (zoa) 2004-04-05 03:56:05

the only card in the server is a te410p.

i'll try to attach gdb later today.

By: zoa (zoa) 2004-04-05 04:56:08

imroVOIP1:~# gdb -p 232
GNU gdb 2002-04-01-cvs
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-linux".
Attaching to process 232
Reading symbols from /usr/sbin/asterisk...done.
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...done.
[New Thread 1024 (LWP 232)]
[New Thread 2049 (LWP 233)]
[New Thread 1026 (LWP 234)]
[New Thread 2051 (LWP 235)]
[New Thread 3076 (LWP 236)]
[New Thread 4101 (LWP 237)]
[New Thread 5126 (LWP 238)]

<Not responsive>

is what happens when i try to attach gdb.
after that, also asterisk is no longer responsive.


Immediately after i start asterisk, (before i make a call) asterisk seems to have 1 thread with 99,9% cpu usage.

At the same time, asterisk is slowly restarting channels. (takes 5 seconds/ chan).

By: zoa (zoa) 2004-04-05 09:13:23

latest cvs fix by citats seems to resolve this 99,9% cpu.

Didnt check it very well so far, will do so for the rest of the day.

By: zoa (zoa) 2004-04-05 10:26:19

k, it did some more tests, the latest cvs no longer seems to hang.

The frequent non asterisk coredumps stay, but my collegue claims that on 1 of the 2 servers that went away after an upgrade to kernel 2.6.5.

By: zoa (zoa) 2004-04-05 12:19:54

i upgraded to kernel 2.6.5 and now i see no more coredumps, whiiiiiiiiii.

By: James Golovich (jamesgolovich) 2004-04-05 12:29:49

The fix I made in CVS today had nothing to do with this at all, just something I noticed while looking around.

By: Mark Spencer (markster) 2004-04-05 15:48:49

Apparent kernel incompatibility.