[Home]

Summary:ASTERISK-06735: Asterisk randomly segfaults - Appears to be chan_iax2
Reporter:Trevor Hammonds (trevmeister)Labels:
Date Opened:2006-04-08 09:11:39Date Closed:2006-05-11 04:38:59
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) output.txt
( 1) output2.txt
( 2) output2-server2.txt
( 3) output-server2.txt
Description:Not sure why this is happening, but it is happening very often since updating two days ago.  

Server is a Dell PowerEdge 2850, dual Xeon 3GHz, 2GB RAM, CentOS 4.3, Kernel 2.6.9-34.ELsmp

Let me know if there is any more information I may provide that can help...
Comments:By: Andrey S Pankov (casper) 2006-04-08 16:21:13

Maybe some logs...

By: Andrey S Pankov (casper) 2006-04-08 16:24:28

print iaxs[callno]->sockfd
print iaxs[callno]->addr
or something more appropiate (at line 1629) in gdb

By: Trevor Hammonds (trevmeister) 2006-04-08 22:22:39

FYI. I downgraded several revisions 'till I found a stable revision.  SVN rev 16006 has not crashed for several hours (all day), whereas 16306 was the last rev that crashed with the same condition with regularity.

By: Trevor Hammonds (trevmeister) 2006-04-08 22:27:14

Casper,
Sorry, I have no idea how to do what you are asking.  I will revert to the most current revision and get the info you need, if you let me know what I should do.

As to logs, they do not indicate anything before the crash.



By: BJ Weschke (bweschke) 2006-04-10 19:57:48

fixed in r16386 of /trunk

By: Trevor Hammonds (trevmeister) 2006-04-11 04:36:21

Sorry to re-open this, but this issue was not fixed in r16386.  I tried r16759, r16745, r16671, r16558, and r16386.  All had the same random segfault.  IAX calls were not usually in progress when the crashes happened.

By: Andrey S Pankov (casper) 2006-04-11 04:46:47

Any logs with 'set verbose 4', 'set debug 4', 'iax2 debug' and log output enabled for warning,notice,error,verbose,debug?

Can you do in gdb:
(gdb) bt <press Enter>
(gdb) print iaxs[callno]->sockfd <press Enter>
(gdb) print iaxs[callno]->addr <press Enter>

It seems like sockfd or addr is null there...



By: BJ Weschke (bweschke) 2006-04-11 07:00:43

Trevmeister - we're up to 19000+ with the commits at this point and chan_iax2 has had more fixes/improvements since 16386 which was a pretty significant bug fix. If you can, please test on the most current /trunk and post a complete bt to this bug and we'll get the right folks to take a look at it. Thanks.

By: Trevor Hammonds (trevmeister) 2006-04-11 09:27:14

Updated to latest SVN trunk.  Asterisk died within a couple of hours.  I have reverted to the last known stable revision (16006) for my setup.

I have attached the latest gdb information.  The first line of the file, however, is what appeared on the console:

*** glibc detected *** double free or corruption (!prev): 0x0000002a97c61490 ***

I don't see anything mentioning IAX in this baktrace, so I suspect this has nothing to do with chan_iax.  Perhaps someone can make a correllation between the two backtraces?

Thanks again, guys.



By: Trevor Hammonds (trevmeister) 2006-04-11 09:47:53

The "output-server2.txt" file is from an identically-configured server, sans the Sangoma A104D.  It is running SVN-trunk-r19160, and I will leave it at that revision, as it is a less-critical server.  

Please note that this backtrace is nearly identical to the original...  

Is it possible that this is related to libpthread or libc from CentOS 4.3 x86_64?

By: BJ Weschke (bweschke) 2006-04-11 09:53:05

joshnet: this one is pretty odd. Can you take a look at the chan_iax2 bt's?

By: BJ Weschke (bweschke) 2006-04-11 10:04:26

Trevmeister - can you attach the relevant sections of extensions.conf when these crashes are happening? There's a few of us now scratching our heads on this one.

By: BJ Weschke (bweschke) 2006-04-11 10:07:46

moving to core since we don't have a real good idea what's causing various parts of the platform to dump.

By: Joshua C. Colp (jcolp) 2006-04-11 10:12:51

Would it be possible to get access to one of the boxes that are exhibiting this problem so that I can put in some extra debug information so we can see what's causing the segfault?

By: Trevor Hammonds (trevmeister) 2006-04-11 22:20:09

joshnet:  Certainly.  Contact me directly at <address removed>.

bweschke:  The "server2" crashes are happening when the server is just sitting idle.  Do you want the entire extenstions.conf?

Here is the message from /var/log/messages from the most recent crash on "server 2":
Apr 11 20:02:11 XXXXXXXX kernel: asterisk[31669]: segfault at 0000000000000000 rip 0000002a96b5d005 rsp 0000000040595130 error 4

Backtrace from this crash has been posted as output2-server2.txt.



By: Serge Vecher (serge-v) 2006-05-05 15:28:40

trevmeister: any chance of an update here to see if commit in r24422 fixes the issue?

By: Mark Spencer (markster) 2006-05-11 03:55:53

Trevmeister: please confirm whether the issue has now been fixed in latest trunk.  Thanks!

By: Trevor Hammonds (trevmeister) 2006-05-11 04:30:39

The problem appears to have been corrected, though I have not tried Trunk on a heavily-loaded server, yet.  I will re-open the bug if it is still an issue in the future.  Thanks for all your great work, guys.

By: Joshua C. Colp (jcolp) 2006-05-11 04:38:59

Issue has not reappeared on reporter's machine. If it does, don't hesitate to reopen. Have a great day!