Summary: | ASTERISK-01315: Zaptel module makes system unstable | ||
Reporter: | zoa (zoa) | Labels: | |
Date Opened: | 2004-03-31 03:55:45.000-0600 | Date Closed: | 2011-06-07 14:05:32 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | This is kind of a weird bug report i think, but i have reasons to believe that the current zaptel makes servers unstable by causing random segfaults. ****** ADDITIONAL INFORMATION ****** I noticed that on one of my servers i had some issues with random segfaults of simple commands like ping, ls, mc, ftp etc. I thought i had bad memory, so i did a memory check, found nothing, thought cpu was overheating so checked the temperature, nothing seemed wrong, so i thought it would be the mainboard. Then suddenly i noticed the same behaviour on a second server, did some quick checks and noticed that as long as i didnt load any zaptel modules the server is rockstable. I was running: Asterisk CVS-03/27/04-11:26:53 on the two servers, both are full intel dual xeons (different mobos, cpu's mem, both scsi). I'm using wct4xxp.so and zaptel.so on kernel 2.4.18 Anyone else experiencing weird behaviour ? | ||
Comments: | By: alric (alric) 2004-03-31 12:39:34.000-0600 Were the zaptel drivers updated on 3/27/04 as well, or are they older than the asterisk version mentioned? By: zoa (zoa) 2004-03-31 13:02:08.000-0600 they were updated as well afaik By: Mark Spencer (markster) 2004-03-31 14:13:21.000-0600 Did you compile for SMP? Does the problem go away if you run UP? By: Mark Spencer (markster) 2004-03-31 14:21:20.000-0600 Also, can you confirm you do NOT have MMX optimization turned on in the Makefile? By: zoa (zoa) 2004-04-01 03:26:53.000-0600 the box was taken out of the colo to go into RMA, it will be placed back tonight i think and i'll have a closer look at it. By: zoa (zoa) 2004-04-01 09:19:17.000-0600 hmmz, i tried todays cvs, when doing insmod zaptel, then insmod wct4xxp, then doing ztcfg Boom Kernel panic. By: zoa (zoa) 2004-04-01 09:22:52.000-0600 hmm, no kernel panic, box just hung, last thing on the screen is the standard ztcfg output that it found the xilinx. (tried it twice already, hung twice... fs fuxored.) By: zoa (zoa) 2004-04-01 09:53:45.000-0600 Looks like some people on the mailinglist have the same issues today. To: asterisk-users@lists.digium.com Subject: Re: [Asterisk-Users] Zaptel/PRI problem -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 01 April 2004 10:35, Mickey Binder wrote: > >I used the following in the zaptel cvs directory to roll back the zaptel > >sources: > >cvs up -D "2004-03-05 10:28" > Do you have to recompile both zaptel and asterisk in order to use the older > zaptel? > If I try to recompile after issuing the cvs up -D "2004-03-05 10:28" > command the compiler tells my I need a newer zaptel. I therefore downgraded > my Asterisk to same date, but are there any other options? Yeah, that happened to me as well today. So I tried the latest zaptel version which is currently locking up my system. I'm currently trying to get some debug output from zaptel/wct4xxp for one of the Asterisk devs. By: zoa (zoa) 2004-04-01 09:59:32.000-0600 oké, asterisk doesnt lockup with yesterdays cvs, looks like citats' smp patch has some special effects :) By: thansen (thansen) 2004-04-01 10:09:01.000-0600 It still locks up my system, as written in mailinglist. I am using MMX optimizations however. Is this bad? By: zoa (zoa) 2004-04-01 10:10:38.000-0600 now i installed yesterdays cvs, (cvs checkout -D yesterday) , MMX disabled and now it no longer hardlocks the server, it just makes sure asterisk hangs when a first call attempt is made. By: Mark Spencer (markster) 2004-04-02 02:17:11.000-0600 please be sure you did a make clean ; make install on all of zaptel, libpri, and asterisk. By: zoa (zoa) 2004-04-02 03:06:13.000-0600 i removed /usr/src before grabbing the goodies. i'm now running last weeks libpri and zaptel, and yesterdays version of asterisk, that seems to work. By: Mark Spencer (markster) 2004-04-02 10:47:36.000-0600 And again, can you test latest CVS and see if it's broken. Again please try just zaptel, then zaptel and libpri, then all three, so we can narrow down the problem. in each case, be sure to "make clean ; make install" after updating and in the case of zaptel be sure to unload both your wct4xxp *and* zaptel modules. With recent CVS, I will also want to pay close attention to the timing. By: zoa (zoa) 2004-04-02 11:20:15.000-0600 i just played around with it all afternoon, somehow i feel that it might be something very weird, as in: I tried to revert to a version that was stable for me and still on my hd in /usr/src/asterisk-23mar. when i recompile it (both libpri, and wct4xxp and asterisk), now it always hangs on the first call. I did a make clean removed all asterisk files i could find in /usr/bin/ /usr/sbin/ /usr/lib/ then did a make libpri, make install, make zaptel make install, make asterisk, make install. if i have enough time left (and the fsck stops fast enough) i'll try again with latest cvs today. By: zoa (zoa) 2004-04-02 11:21:49.000-0600 oh, btw on the other unstable server i was using kernel 2.4.20 so that doesnt seem to influence it. Im on debian woody 3.0 with gcc 2.95. Would changing the compile flags influence the results ? (I could not find any info on -O6.) By: zoa (zoa) 2004-04-02 11:26:52.000-0600 this starts to look like this bug: (but then on another server). http://bugs.digium.com/bug_view_page.php?bug_id=0001224 By: James Golovich (jamesgolovich) 2004-04-02 12:24:07.000-0600 I've not seen this issue on any of my boxes. If reverting to a version that was stable before doesn't work then I strongly suspect asterisk is fine and its probably related to the system. When your reverting to older versions, are you also reverting to older versions of libpri and zaptel? Unless a newer module was added to one of the releases and it was never removed. So you might want to 'rm /usr/lib/asterisk/modules/*' and recompile/install asterisk By: Mark Spencer (markster) 2004-04-02 12:54:23.000-0600 Fixed in CVS By: zoa (zoa) 2004-04-03 03:41:49.000-0600 i had some more fun with the box, Looks like bad voodoo happens on restart or shutdown: *CLI> stop now Beginning asterisk shutdown.... Executing last minute cleanups == Destroying any remaining musiconhold processes Yuck! Error in buffer handling...: Connection reset by peer Yuck! Error in buffer handling...: Connection reset by peer Yuck! Error in buffer handling...: Connection reset by peer Asterisk cleanly ending (0). -> was like this some minutes, meanwhile ps -auwx showed defunct asterisk threads and a 99,9% asterisk process. when doing a ctrl-c, it showed me once Beginning asterisk shutdown.... so it was not completely unresponsive yet. I was able to killall asterisk, killing all threads (also the CLI thread) except for 1 thread that took 99,9% cpu. By: zoa (zoa) 2004-04-03 05:02:48.000-0600 maybe related, maybe not: (this was on a ctrl-c or a stop now, dont know for sure.) And again it leaves 3 99,9% cpu processes. Reading symbols from /usr/lib/asterisk/modules/app_nbscat.so...done. Loaded symbols for /usr/lib/asterisk/modules/app_nbscat.so Reading symbols from /usr/lib/asterisk/modules/format_g726.so...done. Loaded symbols for /usr/lib/asterisk/modules/format_g726.so #0 0x40021e90 in pthread_mutex_lock () from /lib/libpthread.so.0 (gdb) (gdb) bt full #0 0x40021e90 in pthread_mutex_lock () from /lib/libpthread.so.0 No symbol table info available. #1 0x4010da8b in free () from /lib/libc.so.6 No symbol table info available. #2 0x08093a0a in el_end (el=0x80d31c8) at el.c:125 el = (EditLine *) 0x80d31c8 #3 0x08082a3c in quit_handler (num=2, nice=0, safeshutdown=1, restart=0) at asterisk.c:549 nice = 0 safeshutdown = -1073746744 restart = 0 filename = "/root/.asterisk_history", '\0' <repeats 56 times> s = 1080985669 e = 1080985669 x = -1073746744 #4 0x08085701 in __quit_handler (num=2) at asterisk.c:597 num = -1069183166 ASTERISK-1 0x40023f54 in pthread_sighandler () from /lib/libpthread.so.0 No symbol table info available. ASTERISK-2 0x400c86b8 in sigaction () from /lib/libc.so.6 No symbol table info available. ASTERISK-3 0x08095d3f in read_char (el=0x80d31c8, cp=0xbffff1ab "") at read.c:302 el = (EditLine *) 0x0 ---Type <return> to continue, or q <return> to quit--- cp = 0x0 num_read = -1069183166 tried = 0 ASTERISK-4 0x08095ddd in el_getc (el=0x80d31c8, cp=0xbffff1ab "") at read.c:347 el = (EditLine *) 0x80d31c8 cp = 0xbffff1ab "" num_read = -1069183166 ma = (c_macro_t *) 0x80d3454 ASTERISK-5 0x08095c4a in read_getcmd (el=0x80d31c8, cmdnum=0xbffff1aa "", ch=0xbffff1ab "") at read.c:243 el = (EditLine *) 0x80d31c8 cmdnum = (el_action_t *) 0xc0458f42 <Address 0xc0458f42 out of bounds> ch = 0xbffff1ab "" cmd = 255 'ÿ' num = -1069183166 ASTERISK-6 0x08096085 in el_gets (el=0x80d31c8, nread=0xbffff1f0) at read.c:443 el = (EditLine *) 0x80d31c8 retval = 1075530176 cmdnum = 0 '\0' num = -1 ch = 0 '\0' ASTERISK-7 0x08085216 in main (argc=2, argv=0xbffff584) at asterisk.c:1635 title = "Asterisk Console on 'imroVOIP1' (pid 1105)\0@ÃÃ\n@\fóÿ¿Òd\0@ì0\---Type <return> to continue, or q <return> to quit--- 001@xV\n@Èõ\002@ql\0@ì0\001@Ï\a\0\0Ð<\001@ql\0@\0\0\0\0¸\004\0\0Ð<\001@\fã\001@\234Ó\001@L¿\001@Ð<\001@\0\0\0\0\0\0\0\0\b\0\0\0<ø\002@\002\0\0\0´òÿ¿xV\n@\f÷\002@{{\027\006{{\027\006\bóÿ¿Èõ\002@ÃÃ\n@\024>\001@d¬E\006d¬E\006Xóÿ¿,É\001@Ð<\001@\fóÿ¿2\226\0@"... argc = -1073745412 c = 1075530176 filename = "/root/.asterisk_history", '\0' <repeats 56 times> hostname = "imroVOIP1\0\002@\006\0\0\0\034ó\n@Àôÿ¿<o\0@\0\0\0\0Áò\n@\230ôÿ¿ql\0@ãÎ\004\b\0.\0\0H\237\n@Ý\a\0\0ø¢\n@x!\n@Èõ\002@\b\0\0\0<ø\002@\006\0\0\0Hôÿ¿\024\236\004\b¼7\001@\017S\216\a\017S\216\aÜôÿ¿x6\001@)Ó\004\bÈõ\002@\b\0\0\0H\237\n@Èõ\002@xôÿ¿\230k\n@\f÷\002@taß\003taß\003\fõÿ¿Èõ\002@øÐ\n@Èõ\002@\b\0\0\0\230k\n@>\\\002@àê\002@ ~\e@\002\0\0\0\216ÿw\001àôÿ¿"... tmp = "\e[1;37;40mAsterisk Ready.\n\e[0;37;40m\0U\n@\0\0\0\0\202\211¹\n\034ó\n@\220ôÿ¿<o\0@\225ò\n@\223ý\004\bÝ\a\0\0ql\0@ì0\001@" xarg = 0x0 x = 1075530176 f = (FILE *) 0xbffff1fc sigs = {__val = {134238211, 0 <repeats 31 times>}} num = 1770 buf = 0xbffff1fc "Asterisk Console on 'imroVOIP1' (pid 1105)" By: zoa (zoa) 2004-04-03 11:26:05.000-0600 hmmz now i noticed that asterisk has 2 threads at 99% cpu however, asterisk is still working fine. (As long as i don't do a restart now or a stop now, as that will cause asterisk to stop but the ports for iax2 etc will stay in use). By: Mark Spencer (markster) 2004-04-03 20:58:08.000-0600 The last crash was unrelated. That's just because editline doesn't exit cleanly always. Anyway if you attach gdb to Asterisk when it's taking 99% CPU what do you see? By: Mark Spencer (markster) 2004-04-04 12:32:37 zoa: did you have a TDM card in this system too? If so, please cvs update to latest zaptel. I found a problem which might have led to some problems in the TDM driver. By: zoa (zoa) 2004-04-05 03:56:05 the only card in the server is a te410p. i'll try to attach gdb later today. By: zoa (zoa) 2004-04-05 04:56:08 imroVOIP1:~# gdb -p 232 GNU gdb 2002-04-01-cvs Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-linux". Attaching to process 232 Reading symbols from /usr/sbin/asterisk...done. Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libpthread.so.0...done. [New Thread 1024 (LWP 232)] [New Thread 2049 (LWP 233)] [New Thread 1026 (LWP 234)] [New Thread 2051 (LWP 235)] [New Thread 3076 (LWP 236)] [New Thread 4101 (LWP 237)] [New Thread 5126 (LWP 238)] <Not responsive> is what happens when i try to attach gdb. after that, also asterisk is no longer responsive. Immediately after i start asterisk, (before i make a call) asterisk seems to have 1 thread with 99,9% cpu usage. At the same time, asterisk is slowly restarting channels. (takes 5 seconds/ chan). By: zoa (zoa) 2004-04-05 09:13:23 latest cvs fix by citats seems to resolve this 99,9% cpu. Didnt check it very well so far, will do so for the rest of the day. By: zoa (zoa) 2004-04-05 10:26:19 k, it did some more tests, the latest cvs no longer seems to hang. The frequent non asterisk coredumps stay, but my collegue claims that on 1 of the 2 servers that went away after an upgrade to kernel 2.6.5. By: zoa (zoa) 2004-04-05 12:19:54 i upgraded to kernel 2.6.5 and now i see no more coredumps, whiiiiiiiiii. By: James Golovich (jamesgolovich) 2004-04-05 12:29:49 The fix I made in CVS today had nothing to do with this at all, just something I noticed while looking around. By: Mark Spencer (markster) 2004-04-05 15:48:49 Apparent kernel incompatibility. |