[Home]

Summary:ASTERISK-12369: Crash in iax2_destroy at chan_iax2.c:1309
Reporter:Matt Riddell (zx81)Labels:
Date Opened:2008-07-12 19:12:34Date Closed:2008-07-17 08:43:28
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 13063_bt_full.TXT
( 1) 13063_core2_bt_full.TXT
( 2) 13063_core2_thread_apply_all_bt.TXT
( 3) 13063_thread_apply_all_bt.TXT
Description:Brand new system taken from 1.4svn (r129803).

The machine is at my house and the network is particularly bad today (~60% packet loss).

I have a caching name server, and have registrations to three Asterisk 1.4 machines I own (trunking enabled).

There are no users on the machine as it is being built for production for next week.

With no activity except for the outgoing registrations and qualify statements, Asterisk crashed with the following:

#0  0xb732a9dc in iax2_destroy (callno=4176) at chan_iax2.c:1309
#1  0xb7334dc3 in __iax2_poke_noanswer (data=0x81f02e0) at chan_iax2.c:8869
#2  0xb734d3d5 in iax2_process_thread (data=0x81f40d0) at chan_iax2.c:8642
#3  0x080ff690 in dummy_start (data=0x81e9c00) at utils.c:912
#4  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-1  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

The machine is:

2.6.18-5-686 #1 SMP Tue Dec 18 21:24:20 UTC 2007 i686 GNU/Linux

It crashed twice earlier (within 10 minutes of each other), but has not crashed in the last 2 hours.

The only difference I can see (seeing as the box is doing nothing) is that in the period where it crashed twice, the packet loss was extreme.

This may prove to be a difficult one to track down, however I have recompiled Asterisk with DEBUG_THREADS and DONT_OPTIMIZE and will upload another core if I get one.

****** ADDITIONAL INFORMATION ******

Thread 29 (process 3140):
#0  0xb7f4f410 in ?? ()
#1  0xbf90d7a8 in ?? ()
#2  0x00000001 in ?? ()
#3  0xbf90d8fb in ?? ()
#4  0xb7f3c82b in __read_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-1  0x08109911 in read_char (el=0x8170fc8, cp=0xbf90d8fb "") at read.c:296
ASTERISK-2  0x08104a82 in el_getc (el=0x8170fc8, cp=0xbf90d8fb "") at read.c:347
ASTERISK-3  0x08104e8b in el_gets (el=0x8170fc8, nread=0xbf90de74) at read.c:243
ASTERISK-4  0x0806f493 in main (argc=4, argv=0xbf90df14) at asterisk.c:3140
ASTERISK-5  0xb7d9bea8 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-6 0x08057e11 in ?? () at ../sysdeps/i386/elf/start.S:119

Thread 28 (process 3146):
#0  0xb7f4f410 in ?? ()
#1  0xb7d85338 in ?? ()
#2  0xffffffff in ?? ()
#3  0x00000001 in ?? ()
#4  0xb7e488f3 in poll () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0x0806adce in listener (unused=0x0) at asterisk.c:996
ASTERISK-2  0x080ff690 in dummy_start (data=0x8172db0) at utils.c:912
ASTERISK-3  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-4  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 27 (process 3147):
#0  0xb7f4f410 in ?? ()
#1  0xb7d49408 in ?? ()
#2  0x08176a30 in ?? ()
#3  0x00000000 in ?? ()

Thread 26 (process 3148):
#0  0xb7f4f410 in ?? ()
#1  0xb7d0d388 in ?? ()
#2  0x00001388 in ?? ()
#3  0x00000001 in ?? ()
#4  0xb7e488f3 in poll () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0x080b40de in accept_thread (ignore=0x0) at manager.c:2399
ASTERISK-2  0x080ff690 in dummy_start (data=0x817a558) at utils.c:912
ASTERISK-3  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-4  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 25 (process 3149):
#0  0xb7f4f410 in ?? ()
#1  0xb7cd1418 in ?? ()
#2  0x00000011 in ?? ()
#3  0x00000000 in ?? ()

Thread 24 (process 3150):
#0  0xb7f4f410 in ?? ()
#1  0xb7c75418 in ?? ()
#2  0x00000000 in ?? ()

Thread 23 (process 3151):
#0  0xb7f4f410 in ?? ()
#1  0xb7b73398 in ?? ()
#2  0x000003e8 in ?? ()
#3  0x00000001 in ?? ()
#4  0xb7e488f3 in poll () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0x080ac824 in ast_io_wait (ioc=0x817e3e8, howlong=1000) at io.c:266
ASTERISK-2  0xb7b938db in do_monitor (data=0x0) at chan_mgcp.c:3514
ASTERISK-3  0x080ff690 in dummy_start (data=0x8189fd0) at utils.c:912
ASTERISK-4  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-5  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 22 (process 3152):
#0  0xb7f4f410 in ?? ()
#1  0xb7a7ff88 in ?? ()
#2  0x000003e8 in ?? ()
#3  0x00000000 in ?? ()

Thread 21 (process 3153):
#0  0xb7f4f410 in ?? ()
#1  0xb7a3a018 in ?? ()
#2  0xb7eb3ff4 in ?? () from /lib/tls/i686/cmov/libc.so.6
#3  0xb7a3a004 in ?? ()
#4  0xb7e14ab6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0xb7e148db in sleep () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-2  0xb7a3ce1f in scan_thread (unused=0x0) at pbx_spool.c:446
ASTERISK-3  0x080ff690 in dummy_start (data=0x8194b88) at utils.c:912
ASTERISK-4  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-5  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 20 (process 3154):
#0  0xb7f4f410 in ?? ()
#1  0xb79dc418 in ?? ()
#2  0x00000000 in ?? ()

Thread 19 (process 3155):
#0  0xb7f4f410 in ?? ()
#1  0xb76af318 in ?? ()
#2  0x000003e8 in ?? ()
#3  0x00000001 in ?? ()
#4  0xb7e488f3 in poll () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0x080ac824 in ast_io_wait (ioc=0x81b5ef0, howlong=1000) at io.c:266
ASTERISK-2  0xb76f1f02 in do_monitor (data=0x0) at chan_sip.c:15817
ASTERISK-3  0x080ff690 in dummy_start (data=0x81ba6e8) at utils.c:912
ASTERISK-4  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-5  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 18 (process 3156):
#0  0xb7f4f410 in ?? ()
#1  0xb765a418 in ?? ()
#2  0xb765a308 in ?? ()
#3  0xb765a388 in ?? ()
#4  0xb7e4b131 in select () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0xb765db91 in sound_thread (arg=0x81cc4c0) at /usr/src/asterisk/include/asterisk/channel.h:1320
ASTERISK-2  0x080ff690 in dummy_start (data=0x81b7b68) at utils.c:912
ASTERISK-3  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-4  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 17 (process 3157):
#0  0xb7f4f410 in ?? ()
#1  0xb761e418 in ?? ()
#2  0xb761e308 in ?? ()
#3  0xb761e388 in ?? ()
#4  0xb7e4b131 in select () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0xb765db91 in sound_thread (arg=0x81ccb78) at /usr/src/asterisk/include/asterisk/channel.h:1320
ASTERISK-2  0x080ff690 in dummy_start (data=0x81bc4b8) at utils.c:912
ASTERISK-3  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-4  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 16 (process 3158):
#0  0xb7f4f410 in ?? ()
#1  0xb75bb418 in ?? ()
#2  0x0000000f in ?? ()
#3  0x00000000 in ?? ()

Thread 15 (process 3159):
#0  0xb7f4f410 in ?? ()
#1  0xb74f9418 in ?? ()
#2  0xb750caf8 in ?? () from /usr/lib/asterisk/modules/chan_skinny.so
#3  0xb74f9350 in ?? ()
#4  0xb7f3ca48 in accept () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-1  0xb74fe043 in accept_thread (ignore=0x0) at chan_skinny.c:4586
ASTERISK-2  0x080ff690 in dummy_start (data=0x81e5d40) at utils.c:912
ASTERISK-3  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-4  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 14 (process 3160):
#0  0xb7f4f410 in ?? ()
#1  0xb74bd3b8 in ?? ()
#2  0x000003e8 in ?? ()
#3  0x00000000 in ?? ()

Thread 13 (process 3166):
#0  0xb7f4f410 in ?? ()
#1  0xb7292418 in ?? ()
#2  0x00000029 in ?? ()
#3  0x00000000 in ?? ()

Thread 12 (process 3167):
#0  0xb7f4f410 in ?? ()
#1  0xb7256418 in ?? ()
#2  0x00000029 in ?? ()
#3  0x00000000 in ?? ()

Thread 11 (process 3168):
#0  0xb7f4f410 in ?? ()
#1  0xb721a418 in ?? ()
#2  0x00000029 in ?? ()
#3  0x00000000 in ?? ()

Thread 10 (process 3169):
#0  0xb7f4f410 in ?? ()
#1  0xb71de418 in ?? ()
#2  0x00000029 in ?? ()
#3  0x00000000 in ?? ()

Thread 9 (process 3171):
#0  0xb7f4f410 in ?? ()
#1  0xb7166418 in ?? ()
#2  0x00000027 in ?? ()
#3  0x00000000 in ?? ()

Thread 8 (process 3172):
#0  0xb7f4f410 in ?? ()
#1  0xb712a418 in ?? ()
#2  0x00000027 in ?? ()
#3  0x00000000 in ?? ()

Thread 7 (process 3173):
#0  0xb7f4f410 in ?? ()
#1  0xb70ee418 in ?? ()
#2  0x00000027 in ?? ()
#3  0x00000000 in ?? ()

Thread 6 (process 3174):
#0  0xb7f4f410 in ?? ()
#1  0xb70b2418 in ?? ()
#2  0x00000027 in ?? ()
#3  0x00000000 in ?? ()

Thread 5 (process 3175):
#0  0xb7f4f410 in ?? ()
#1  0xb7076418 in ?? ()
#2  0x00000027 in ?? ()
#3  0x00000000 in ?? ()

Thread 4 (process 3176):
#0  0xb7f4f410 in ?? ()
#1  0xb703a400 in ?? ()
#2  0x000002fb in ?? ()
#3  0x00000000 in ?? ()

Thread 3 (process 3177):
#0  0xb7f4f410 in ?? ()
#1  0xb6ffe368 in ?? ()
#2  0xffffffff in ?? ()
#3  0x00000002 in ?? ()
#4  0xb7e488f3 in poll () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0x080ac824 in ast_io_wait (ioc=0x81e80d0, howlong=-1) at io.c:266
ASTERISK-2  0xb7334b95 in network_thread (ignore=0x0) at chan_iax2.c:9138
ASTERISK-3  0x080ff690 in dummy_start (data=0x81ee5d8) at utils.c:912
ASTERISK-4  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-5  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 2 (process 3178):
#0  0xb7f4f410 in ?? ()
#1  0xb6ee43e8 in ?? ()
#2  0xffffffff in ?? ()
#3  0x00000001 in ?? ()
#4  0xb7e488f3 in poll () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0x08068cf1 in monitor_sig_flags (unused=0x0) at asterisk.c:2599
ASTERISK-2  0x080ff690 in dummy_start (data=0x81b57a8) at utils.c:912
ASTERISK-3  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-4  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 1 (process 3170):
#0  0xb732a9dc in iax2_destroy (callno=4176) at chan_iax2.c:1309
#1  0xb7334dc3 in __iax2_poke_noanswer (data=0x81f02e0) at chan_iax2.c:8869
#2  0xb734d3d5 in iax2_process_thread (data=0x81f40d0) at chan_iax2.c:8642
#3  0x080ff690 in dummy_start (data=0x81e9c00) at utils.c:912
#4  0xb7f37240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-1  0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6
Comments:By: Matt Riddell (zx81) 2008-07-12 19:37:36

Sorry for posting the thread bt in additional info - don't need to be reminded to attach it.

Uptime with recompiled Asterisk is now 25 mins, will try to force packet loss and see if I can get it to crash again.

By: snuffy (snuffy) 2008-07-12 20:04:26

if you want to force some packet loss within linux.. can follow these few examples
http://www.kdedevelopers.org/node/1878

i'm sure there are others.. but this came to my google search 1st..

By: Matt Riddell (zx81) 2008-07-12 21:26:56

cool, thanks - except it just states delay.  But for loss do:

tc qdisc add dev eth0 root handle 1: netem delay 100ms
tc qdisc add dev eth0 parent 1:1 handle 10: netem loss 60%

Oh, and make sure you're by the machine - my ssh connection isn't too great from the other room now :)

Will post updates with these settings.

By: Matt Riddell (zx81) 2008-07-12 22:03:46

heh, I had another crash, but in the interim I'd had to restart the machine.

When it starts up from the init script it doesn't dump core (guess I could add -g to startup script).

I've now started it back up with safe_asterisk (which dumps the core to /tmp), and I'll leave it running and see if it dumps it again.

I'm still running with 60 percent packet loss so if that's the cause it should happen again.

By: Matt Riddell (zx81) 2008-07-12 22:08:40

Heh, as I was typing that I got two core dumps.  Will attach gdb logs in a sec.

Anything in particular I should add from the cores?

By: Matt Riddell (zx81) 2008-07-13 00:31:12

I now have the following cores:
-rw------- 1 root root 17596416 2008-07-13 15:09 core.hostname-2008-07-13T15:09:18+1200
-rw------- 1 root root 17092608 2008-07-13 15:14 core.hostname-2008-07-13T15:14:02+1200
-rw------- 1 root root 17727488 2008-07-13 15:40 core.hostname-2008-07-13T15:40:16+1200
-rw------- 1 root root 17113088 2008-07-13 15:44 core.hostname-2008-07-13T15:44:50+1200
-rw------- 1 root root 16965632 2008-07-13 16:06 core.hostname-2008-07-13T16:06:04+1200
-rw------- 1 root root 17108992 2008-07-13 16:12 core.hostname-2008-07-13T16:12:43+1200
-rw------- 1 root root 17006592 2008-07-13 16:16 core.hostname-2008-07-13T16:16:48+1200
-rw------- 1 root root 17125376 2008-07-13 16:20 core.hostname-2008-07-13T16:20:37+1200
-rw------- 1 root root 16957440 2008-07-13 16:28 core.hostname-2008-07-13T16:28:16+1200
-rw------- 1 root root 17125376 2008-07-13 17:00 core.hostname-2008-07-13T17:00:13+1200
-rw------- 1 root root 17104896 2008-07-13 17:04 core.hostname-2008-07-13T17:04:27+1200
-rw------- 1 root root 16957440 2008-07-13 17:08 core.hostname-2008-07-13T17:08:31+1200
-rw------- 1 root root 17121280 2008-07-13 17:13 core.hostname-2008-07-13T17:13:14+1200
-rw------- 1 root root 17096704 2008-07-13 17:18 core.hostname-2008-07-13T17:18:20+1200
-rw------- 1 root root 17121280 2008-07-13 17:23 core.hostname-2008-07-13T17:23:35+1200
-rw------- 1 root root 16961536 2008-07-13 17:28 core.hostname-2008-07-13T17:28:04+1200
-rw------- 1 root root 16965632 2008-07-13 17:32 core.hostname-2008-07-13T17:32:19+1200
-rw------- 1 root root 17125376 2008-07-13 17:37 core.hostname-2008-07-13T17:37:13+1200
-rw------- 1 root root 16965632 2008-07-13 17:41 core.hostname-2008-07-13T17:41:17+1200
-rw------- 1 root root 17092608 2008-07-13 17:45 core.hostname-2008-07-13T17:45:16+1200
-rw------- 1 root root 16961536 2008-07-13 17:57 core.hostname-2008-07-13T17:57:06+1200
-rw------- 1 root root 17096704 2008-07-13 18:08 core.hostname-2008-07-13T18:08:24+1200
-rw------- 1 root root 17121280 2008-07-13 18:12 core.hostname-2008-07-13T18:12:28+1200
-rw------- 1 root root 16957440 2008-07-13 18:16 core.hostname-2008-07-13T18:16:54+1200
-rw------- 1 root root 16957440 2008-07-13 18:23 core.hostname-2008-07-13T18:23:56+1200
-rw------- 1 root root 17231872 2008-07-13 18:28 core.hostname-2008-07-13T18:28:00+1200
-rw------- 1 root root 17121280 2008-07-13 18:31 core.hostname-2008-07-13T18:31:50+1200
-rw------- 1 root root 17121280 2008-07-13 18:36 core.hostname-2008-07-13T18:36:40+1200
-rw------- 1 root root 17096704 2008-07-13 18:40 core.hostname-2008-07-13T18:40:48+1200
-rw------- 1 root root 17121280 2008-07-13 18:45 core.hostname-2008-07-13T18:45:32+1200
-rw------- 1 root root 16973824 2008-07-13 18:49 core.hostname-20



By: Michiel van Baak (mvanbaak) 2008-07-13 14:16:08

Do all cores show it dies in the same place ?

By: Matt Riddell (zx81) 2008-07-13 16:06:19

Yep.

I now have around 200 cores.

While I can remove the tc packet loss simulation, I can't guarantee the network won't have packet loss, so I still get cores.

I've also seen this happen on an embedded machine.

What was the revision of the last release (I'm assuming its not happening there) so I can go back and test to see how long this has been a problem.

By: Matt Riddell (zx81) 2008-07-13 16:12:29

Downloading revision 126590 now (1.4.20.1)  will post back if it is experiencing the same problem.



By: Michiel van Baak (mvanbaak) 2008-07-13 16:14:18

126590 is the rev for 1.4.21.1

It can be a bit harder to find the correct version in branches/1.4 because 1.4.21.1 was copied from the tag 1.4.21 and got a fix in

By: Matt Riddell (zx81) 2008-07-13 19:54:34

Ok, so I've run a few tests.

I checked out the revision that was related to the last release (assuming that there would have been more reports if it was in release code).

That was 126590 and did not crash.  I've then gone through the commit logs and found the revision numbers of changes to chan_iax2.c

I then went through and picked the revision half way between current and not crashing.  If it crashed I went halfway between that one and the non crasher.

126590 - [checked] does not crash (after 15 minutes)
126999 - [checked] does not crash (after 4 and a half hours)
127068 - [checked] does crash (after 3 hours)
127069 - [checked] does crash (after 53 minutes)
128639 - [checked] does crash (after 24 minutes)
128737 - [checked] does crash (after 27 minutes)
128795 - [checked] does crash (after 4 minutes)
129047 - [unchecked]
129803 - [checked] does crash (after 6 minutes)
130169 - [checked] does crash (after 10 minutes)



By: Matt Riddell (zx81) 2008-07-14 16:13:45

After running for a couple of days, it has again crashed in the same place (this time with 128737).

By: Matt Riddell (zx81) 2008-07-14 16:14:31

This time though there was no network cable plugged in.

By: Matt Riddell (zx81) 2008-07-14 16:16:47

I had been removing chan_iax2 then updating.  Will do a fresh checkout and see how that goes.

By: Matt Riddell (zx81) 2008-07-14 16:17:19

Oh, and by the way, the only IAX2 accounts are using trunking.

By: Matt Riddell (zx81) 2008-07-14 16:36:36

Have gone back to the 1.4.21.1 release and it seems to be working, although peppered with:

WARNING[17183]: chan_iax2.c:1595 __find_callno: Old call search code found call number 8891 that was not in hash table!

By: Matt Riddell (zx81) 2008-07-14 18:18:56

Box leaves tomorrow, so that will be the end of any testing I can do on it.  If there is anything required, let me know before then.

It's not really a show stopper for me, will just use an older version, but I am available for any testing until then.

By: Matt Riddell (zx81) 2008-07-14 18:36:03

[Jul 15 11:43:08] Connected to Asterisk SVN-branch-1.4-r126573
Verbosity is at least 100
wickliffe*CLI> show uptime
System uptime: 2 hours, 26 minute, 36 seconds



By: Matt Riddell (zx81) 2008-07-14 18:59:35

Will do extended tests again.

By: Matt Riddell (zx81) 2008-07-14 19:19:29

Doing a 2 hour test of 127069 now.

By: Matt Riddell (zx81) 2008-07-14 21:33:17

Ok, so, turns out its in 127069 too:

[Jul 15 14:40:31] Connected to Asterisk SVN-branch-1.4-r127069 currently running on wickliffe (pid = 31973)
Verbosity is at least 100

#0  0xb72c7ddc in iax2_destroy (callno=11570) at chan_iax2.c:1308
       _sched_res = <value optimized out>
       pvt = (struct chan_iax2_pvt *) 0x0
       owner = (struct ast_channel *) 0xb7363250
       __PRETTY_FUNCTION__ = "iax2_destroy"
#1  0xb72d9743 in __iax2_poke_noanswer (data=0x82661a8) at chan_iax2.c:8854
       callno = 11570
       __PRETTY_FUNCTION__ = "__iax2_poke_noanswer"
#2  0xb72f13b5 in iax2_process_thread (data=0x825ea30) at chan_iax2.c:8627
       curelm = <value optimized out>
       _buffer = {__routine = 0xb72ce8f0 <iax2_process_thread_cleanup>, __arg = 0x825ea30, __canceltype = 135599656,
 __prev = 0xb700f43c}
       thread = <value optimized out>
       ts = {tv_sec = -1209669838, tv_nsec = -1208847168}
       put_into_idle = 1
#3  0x080ff4a0 in dummy_start (data=0x8259450) at utils.c:912
       _buffer = {__routine = 0x806a620 <ast_unregister_thread>, __arg = 0xb700fbb0, __canceltype = 0, __prev = 0x0}
       ret = <value optimized out>
#4  0xb7fa9240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
ASTERISK-1  0xb7ec449e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.

By: Matt Riddell (zx81) 2008-07-14 22:25:09

Running 2 hour test of 126999

By: Matt Riddell (zx81) 2008-07-15 16:25:19

[Jul 16 09:31:51] Connected to Asterisk SVN-branch-1.4-r126999 currently running on xxx (pid = 31973)
Verbosity is at least 100
xxx*CLI> show uptime
System uptime: 18 hours, 40 minutes, 33 seconds

Ok, so 126999 looks stable



By: Matt Riddell (zx81) 2008-07-15 16:27:13

I'd really like to avoid getting my head around the iax2 code, so if anyone else can pitch in it would be cool.  I'll run two more tests, and post the results.

Running two hour test of 127068.



By: Matt Riddell (zx81) 2008-07-15 22:04:36

127068 crashed after 3 hours.  Going back to the previous to test again

By: Matt Riddell (zx81) 2008-07-15 22:33:34

Running 126999 with 80% packet loss.  If it's going to crash this should do it.

48 minutes so far



By: Matt Riddell (zx81) 2008-07-16 03:23:02

So, 126999 has been running for just under four and a half hours.

If the tests are correct, then it should actually be the 127068 and 127069 commits.

Basically just one commit.

http://svn.digium.com/view/asterisk?view=rev&rev=127068
http://svn.digium.com/view/asterisk?view=rev&rev=127069

There are actually no other commits to the Asterisk 1.4 branch between 126999 and 127068.  There are so many team branches!  I looked in my svn-commits folder and didn't find the patches between so I went to the above URL and went through each revision between 126999 and 127068.

As a side note there are some numbers which return a python error which dumps some info on the server such as file path and the fact it is running on a 64bit os.  Doesn't really matter, but would probably be cleaner if it returned an error:

http://svn.digium.com/view/asterisk?view=rev&revision=127036

The actual line of code that it crashes in was added in that commit.

The diff is here:

http://svn.digium.com/view/asterisk/branches/1.4/channels/chan_iax2.c?r1=127069&r2=127067&pathrev=127069

I'm recompiling and retesting 127069, so it should replicate the same result by crashing.  

Inside the function:

static unsigned char compress_subclass(int subclass)

There is the entry at 1309 which now does:

AST_SCHED_DEL(sched, iaxs[callno]->lagid);
AST_SCHED_DEL(sched, iaxs[callno]->pingid);

In a block which is inside an if(!owner) block.  Why its crashing in AST_SCHED_DEL(sched, iaxs[callno]->pingid); I don't know.  I have over 400 cores if someone wants some :)

Let me know if there's any other info I can provide.



By: Matt Riddell (zx81) 2008-07-16 03:30:56

Heh, true to form 127069 crashed within about 10 minutes.

By: Matt Riddell (zx81) 2008-07-16 16:19:11

Yep, and 126999 keeps running:

Connected to Asterisk SVN-branch-1.4-r126999 currently running on xxx(pid = 25016)
Verbosity is at least 100
xxx*xCLI> core show uptime
System uptime: 12 hours, 21 minutes, 18 seconds

By: Matt Riddell (zx81) 2008-07-16 17:01:59

Ok, as per discussion with Corydon76-dig on asterisk-dev irc, I'm now running a test with 131480.

[Jul 17 10:02:31] Connected to Asterisk SVN-branch-1.4-r131480 currently running on xxx (pid = 28783)

Will post results.

By: Matt Riddell (zx81) 2008-07-16 17:52:21

Ok, looks like this fixes it.

It's been running an hour under simulated packet loss without any crashes.

This can probably be closed with a note: "Fixed in revision 131480".

/me wants some karma for the testing :D

By: Matt Riddell (zx81) 2008-07-16 20:51:40

Been running fine for four hours now.

By: Tilghman Lesher (tilghman) 2008-07-17 08:43:28

Fixed in revision 131480