Summary: | ASTERISK-15320: [patch] Lots of crashes after upgrading to latest 1.6.0.20-rc1 | ||
Reporter: | Andrey Solovyev (corruptor) | Labels: | |
Date Opened: | 2009-12-16 04:23:50.000-0600 | Date Closed: | 2010-01-05 09:03:50.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 20091221__issue16452.diff.txt ( 1) backtrace.txt ( 2) backtrace2.txt ( 3) backtrace5.txt ( 4) full.txt ( 5) valgrind.txt | |
Description: | I've tried to upgrade my asterisk 1.6.19 to latest RC. Asterisk crashes every 30-40 minutes. There are about 15 concurrent calls. All calls are recorded using MixMonitor. Asterisk crashes after hanging up one of the calls. I upload the backtrace. Asterisk 1.6.19 same config works ok. | ||
Comments: | By: Leif Madsen (lmadsen) 2009-12-16 07:51:22.000-0600 Any chance you could also provide the scenario / dialplan that causes this? We may need to reproduce it, but this looks serious enough. Also, are you utilizing any database connectivity? Just looking really for any additional information which may help to narrow this down. Thanks for the backtrace! By: Leif Madsen (lmadsen) 2009-12-16 07:55:30.000-0600 Also, just to confirm, you're using a Linux distribution, and not an alternative OS right? By: Leif Madsen (lmadsen) 2009-12-16 08:13:27.000-0600 Per IRC: <mvanbaak> DocAwesome: not that it matters wether you use linux or an alternative <mvanbaak> DocAwesome: "it works on linux, wontfix" <--- not acceptable in my setup ;) ... <DocAwesome> although, I mostly wanted to know for reproducability <DocAwesome> not for deciding whether to fix it or not By: Andrey Solovyev (corruptor) 2009-12-16 10:35:48.000-0600 I use Centos 5. I've uploaded second backtrace. It's seemed for me different, Maybe it may help. I've noticed this happens on our outgoing calls. I've uploaded part of full log related to first backtrace. Asterisk has crashed after last message in this log. It doesn't meter if call was answered or not. I use database connectivity. We write QueueLog messages straight to database MySQL. You can see that in full log. It seems to work ok. I also write CDR to MySQL. (works ko also). Unfortunately I am not able upload other backtraces because I have installed earlier version. I can try tomorrow. By: Russell Bryant (russell) 2009-12-16 10:56:11.000-0600 Your backtraces are very different. It sort of looks like an invalid channel is being accessed and has resulted in two different crashes. It's hard to know. We're going to try to do a bit of testing with the information we have, but we will most likely need more information from you to address this. Some analysis using valgrind would be very useful. See the instructions in doc/valgrind.txt. However, note that this will introduce a lot of extra load on your system, so it may not be able to handle your full load while doing this testing. By: Russell Bryant (russell) 2009-12-16 10:56:40.000-0600 Also, if you'd like, join us in #asterisk-bugs on IRC (Freenode), and we may be able to look at this in real time. By: Andrey Solovyev (corruptor) 2009-12-17 04:02:41.000-0600 I should say that we use very small patches to app_queue but very important for us and I can't run asterisk without them for a long time in our office. They are definitely not related to these crashes. I have just tried running absolutely clean asterisk-1.6.0.20-rc1 with addons 1.6.0.3 for 2 hours and have got 3 crashes. I upload one more backtrace. I am also at the #asterisk-bugs channel on IRC. I think it won't be possible to use valgrind because call center has to work. By: Tilghman Lesher (tilghman) 2009-12-17 10:18:07.000-0600 We will not be able to proceed until you upload the full set of those patches. When we require Valgrind, it's usually because something that you wouldn't think is related usually is the cause. By: Andrey Solovyev (corruptor) 2009-12-17 12:29:20.000-0600 As I've said the last backtrace5.txt is without any patches and in the future I won't use any patches regarding this issue. Ok I will try Valgrind and see if it's possible to get information from it. By: Tilghman Lesher (tilghman) 2009-12-18 17:51:07.000-0600 corruptor: have you had any success with running Valgrind (and by that, I mean do you have any suspicious output, regardless of whether a crash occurred)? By: Andrey Solovyev (corruptor) 2009-12-21 10:17:10.000-0600 I've tried to use valgrind now for half an hour... Some people hate me now :). Unfortunately asterisk hasn't crashed and I haven't been able to use valgrind more time. I upload the output I've got. By: Leif Madsen (lmadsen) 2009-12-21 12:15:46.000-0600 It's possible it wouldn't crash in valgrind, so hopefully the information is useful here. Thanks! By: Tilghman Lesher (tilghman) 2009-12-21 17:10:25.000-0600 I have uploaded a patch, based upon the output in your valgrind. What it looks like is that the SIP session timers are firing after they have been deallocated, so I've moved the deallocation elsewhere and added a flag so that it short-circuits the session timers. Hopefully, that corrects the crash condition. By: Andrey Solovyev (corruptor) 2009-12-22 02:21:56.000-0600 Thank you. I will test the patch a little bit later. By the way could it be related to this issue https://issues.asterisk.org/view.php?id=16270 in some way? That issue is also related to SIP session timers and that patch hasn't fixed it for me so I am going to reopen it. (I wasn't able to test it because I was on vacation). By: Tilghman Lesher (tilghman) 2009-12-22 11:29:27.000-0600 ASTERISK-15161 does not appear to be at all related to SIP timers, but to a reference counting issue. This is a completely different issue. Are you still having problems with UDP ports staying open in 1.6.0.20-rc1? If not, then that issue is solved. By: Andrey Solovyev (corruptor) 2009-12-23 02:38:08.000-0600 I have applied the patch. As for now asterisk is running for more than 2 hours without crash. I will report later after our workday. Tilghman, yes, UDP ports stay open in 1.6.0.20-rc1 so I reopen issue 0016270. By: Andrey Solovyev (corruptor) 2009-12-23 13:53:20.000-0600 More than 12 hours of uptime. Asterisk works fine. By: Digium Subversion (svnbot) 2009-12-29 17:05:46.000-0600 Repository: asterisk Revision: 236802 U trunk/channels/chan_sip.c ------------------------------------------------------------------------ r236802 | tilghman | 2009-12-29 17:05:46 -0600 (Tue, 29 Dec 2009) | 7 lines Shut down the SIP session timers more gracefully, in order to prevent a possible crash. (closes issue ASTERISK-15320) Reported by: corruptor Patches: 20091221__issue16452.diff.txt uploaded by tilghman (license 14) Tested by: corruptor ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=236802 By: Digium Subversion (svnbot) 2009-12-29 17:08:06.000-0600 Repository: asterisk Revision: 236803 _U branches/1.6.1/ U branches/1.6.1/channels/chan_sip.c ------------------------------------------------------------------------ r236803 | tilghman | 2009-12-29 17:08:06 -0600 (Tue, 29 Dec 2009) | 14 lines Merged revisions 236802 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r236802 | tilghman | 2009-12-29 17:05:45 -0600 (Tue, 29 Dec 2009) | 7 lines Shut down the SIP session timers more gracefully, in order to prevent a possible crash. (closes issue ASTERISK-15320) Reported by: corruptor Patches: 20091221__issue16452.diff.txt uploaded by tilghman (license 14) Tested by: corruptor ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=236803 By: Digium Subversion (svnbot) 2009-12-29 17:08:14.000-0600 Repository: asterisk Revision: 236804 _U branches/1.6.2/ U branches/1.6.2/channels/chan_sip.c ------------------------------------------------------------------------ r236804 | tilghman | 2009-12-29 17:08:14 -0600 (Tue, 29 Dec 2009) | 14 lines Merged revisions 236802 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r236802 | tilghman | 2009-12-29 17:05:45 -0600 (Tue, 29 Dec 2009) | 7 lines Shut down the SIP session timers more gracefully, in order to prevent a possible crash. (closes issue ASTERISK-15320) Reported by: corruptor Patches: 20091221__issue16452.diff.txt uploaded by tilghman (license 14) Tested by: corruptor ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=236804 By: Digium Subversion (svnbot) 2009-12-29 17:11:01.000-0600 Repository: asterisk Revision: 236805 _U branches/1.6.0/ U branches/1.6.0/channels/chan_sip.c ------------------------------------------------------------------------ r236805 | tilghman | 2009-12-29 17:11:00 -0600 (Tue, 29 Dec 2009) | 14 lines Merged revisions 236802 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r236802 | tilghman | 2009-12-29 17:05:45 -0600 (Tue, 29 Dec 2009) | 7 lines Shut down the SIP session timers more gracefully, in order to prevent a possible crash. (closes issue ASTERISK-15320) Reported by: corruptor Patches: 20091221__issue16452.diff.txt uploaded by tilghman (license 14) Tested by: corruptor ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=236805 |