[Home]

Summary:ASTERISK-15320: [patch] Lots of crashes after upgrading to latest 1.6.0.20-rc1
Reporter:Andrey Solovyev (corruptor)Labels:
Date Opened:2009-12-16 04:23:50.000-0600Date Closed:2010-01-05 09:03:50.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20091221__issue16452.diff.txt
( 1) backtrace.txt
( 2) backtrace2.txt
( 3) backtrace5.txt
( 4) full.txt
( 5) valgrind.txt
Description:I've tried to upgrade my asterisk 1.6.19 to latest RC. Asterisk crashes every 30-40 minutes. There are about 15 concurrent calls. All calls are recorded using MixMonitor. Asterisk crashes after hanging up one of the calls.
I upload the backtrace.
Asterisk 1.6.19 same config works ok.
Comments:By: Leif Madsen (lmadsen) 2009-12-16 07:51:22.000-0600

Any chance you could also provide the scenario / dialplan that causes this? We may need to reproduce it, but this looks serious enough. Also, are you utilizing any database connectivity? Just looking really for any additional information which may help to narrow this down.

Thanks for the backtrace!

By: Leif Madsen (lmadsen) 2009-12-16 07:55:30.000-0600

Also, just to confirm, you're using a Linux distribution, and not an alternative OS right?

By: Leif Madsen (lmadsen) 2009-12-16 08:13:27.000-0600

Per IRC:

<mvanbaak> DocAwesome: not that it matters wether you use linux or an alternative
<mvanbaak> DocAwesome: "it works on linux, wontfix" <--- not acceptable in my setup ;)
...
<DocAwesome> although, I mostly wanted to know for reproducability
<DocAwesome> not for deciding whether to fix it or not

By: Andrey Solovyev (corruptor) 2009-12-16 10:35:48.000-0600

I use Centos 5.
I've uploaded second backtrace. It's seemed for me different, Maybe it may help.
I've noticed this happens on our outgoing calls. I've uploaded part of full log related to first backtrace. Asterisk has crashed after last message in this log. It doesn't meter if call was answered or not.

I use database connectivity. We write QueueLog messages straight to database MySQL. You can see that in full log. It seems to work ok. I also write CDR to MySQL. (works ko also).

Unfortunately I am not able upload other backtraces because I have installed earlier version. I can try tomorrow.

By: Russell Bryant (russell) 2009-12-16 10:56:11.000-0600

Your backtraces are very different.  It sort of looks like an invalid channel is being accessed and has resulted in two different crashes.  It's hard to know.  We're going to try to do a bit of testing with the information we have, but we will most likely need more information from you to address this.

Some analysis using valgrind would be very useful.  See the instructions in doc/valgrind.txt.  However, note that this will introduce a lot of extra load on your system, so it may not be able to handle your full load while doing this testing.

By: Russell Bryant (russell) 2009-12-16 10:56:40.000-0600

Also, if you'd like, join us in #asterisk-bugs on IRC (Freenode), and we may be able to look at this in real time.

By: Andrey Solovyev (corruptor) 2009-12-17 04:02:41.000-0600

I should say that we use very small patches to app_queue but very important for us and I can't run asterisk without them for a long time in our office. They are definitely not related to these crashes. I have just tried running absolutely clean asterisk-1.6.0.20-rc1 with addons 1.6.0.3 for 2 hours and have got 3 crashes. I upload one more backtrace. I am also at the #asterisk-bugs channel on IRC.
I think it won't be possible to use valgrind because call center has to work.

By: Tilghman Lesher (tilghman) 2009-12-17 10:18:07.000-0600

We will not be able to proceed until you upload the full set of those patches.  When we require Valgrind, it's usually because something that you wouldn't think is related usually is the cause.

By: Andrey Solovyev (corruptor) 2009-12-17 12:29:20.000-0600

As I've said the last backtrace5.txt is without any patches and in the future I won't use any patches regarding this issue.

Ok I will try Valgrind and see if it's possible to get information from it.

By: Tilghman Lesher (tilghman) 2009-12-18 17:51:07.000-0600

corruptor: have you had any success with running Valgrind (and by that, I mean do you have any suspicious output, regardless of whether a crash occurred)?

By: Andrey Solovyev (corruptor) 2009-12-21 10:17:10.000-0600

I've tried to use valgrind now for half an hour... Some people hate me now :).
Unfortunately asterisk hasn't crashed and I haven't been able to use valgrind more time.
I upload the output I've got.

By: Leif Madsen (lmadsen) 2009-12-21 12:15:46.000-0600

It's possible it wouldn't crash in valgrind, so hopefully the information is useful here. Thanks!

By: Tilghman Lesher (tilghman) 2009-12-21 17:10:25.000-0600

I have uploaded a patch, based upon the output in your valgrind.  What it looks like is that the SIP session timers are firing after they have been deallocated, so I've moved the deallocation elsewhere and added a flag so that it short-circuits the session timers.  Hopefully, that corrects the crash condition.

By: Andrey Solovyev (corruptor) 2009-12-22 02:21:56.000-0600

Thank you. I will test the patch a little bit later.
By the way could it be related to this issue https://issues.asterisk.org/view.php?id=16270 in some way? That issue is also related to SIP session timers and that patch hasn't fixed it for me so I am going to reopen it. (I wasn't able to test it because I was on vacation).

By: Tilghman Lesher (tilghman) 2009-12-22 11:29:27.000-0600

ASTERISK-15161 does not appear to be at all related to SIP timers, but to a reference counting issue.  This is a completely different issue.  Are you still having problems with UDP ports staying open in 1.6.0.20-rc1?  If not, then that issue is solved.

By: Andrey Solovyev (corruptor) 2009-12-23 02:38:08.000-0600

I have applied the patch. As for now asterisk is running for more than 2 hours without crash. I will report later after our workday.

Tilghman, yes, UDP ports stay open in 1.6.0.20-rc1 so I reopen issue 0016270.

By: Andrey Solovyev (corruptor) 2009-12-23 13:53:20.000-0600

More than 12 hours of uptime. Asterisk works fine.

By: Digium Subversion (svnbot) 2009-12-29 17:05:46.000-0600

Repository: asterisk
Revision: 236802

U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r236802 | tilghman | 2009-12-29 17:05:46 -0600 (Tue, 29 Dec 2009) | 7 lines

Shut down the SIP session timers more gracefully, in order to prevent a possible crash.
(closes issue ASTERISK-15320)
Reported by: corruptor
Patches:
      20091221__issue16452.diff.txt uploaded by tilghman (license 14)
Tested by: corruptor

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=236802

By: Digium Subversion (svnbot) 2009-12-29 17:08:06.000-0600

Repository: asterisk
Revision: 236803

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_sip.c

------------------------------------------------------------------------
r236803 | tilghman | 2009-12-29 17:08:06 -0600 (Tue, 29 Dec 2009) | 14 lines

Merged revisions 236802 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r236802 | tilghman | 2009-12-29 17:05:45 -0600 (Tue, 29 Dec 2009) | 7 lines
 
 Shut down the SIP session timers more gracefully, in order to prevent a possible crash.
 (closes issue ASTERISK-15320)
  Reported by: corruptor
  Patches:
        20091221__issue16452.diff.txt uploaded by tilghman (license 14)
  Tested by: corruptor
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=236803

By: Digium Subversion (svnbot) 2009-12-29 17:08:14.000-0600

Repository: asterisk
Revision: 236804

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_sip.c

------------------------------------------------------------------------
r236804 | tilghman | 2009-12-29 17:08:14 -0600 (Tue, 29 Dec 2009) | 14 lines

Merged revisions 236802 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r236802 | tilghman | 2009-12-29 17:05:45 -0600 (Tue, 29 Dec 2009) | 7 lines
 
 Shut down the SIP session timers more gracefully, in order to prevent a possible crash.
 (closes issue ASTERISK-15320)
  Reported by: corruptor
  Patches:
        20091221__issue16452.diff.txt uploaded by tilghman (license 14)
  Tested by: corruptor
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=236804

By: Digium Subversion (svnbot) 2009-12-29 17:11:01.000-0600

Repository: asterisk
Revision: 236805

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_sip.c

------------------------------------------------------------------------
r236805 | tilghman | 2009-12-29 17:11:00 -0600 (Tue, 29 Dec 2009) | 14 lines

Merged revisions 236802 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r236802 | tilghman | 2009-12-29 17:05:45 -0600 (Tue, 29 Dec 2009) | 7 lines
 
 Shut down the SIP session timers more gracefully, in order to prevent a possible crash.
 (closes issue ASTERISK-15320)
  Reported by: corruptor
  Patches:
        20091221__issue16452.diff.txt uploaded by tilghman (license 14)
  Tested by: corruptor
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=236805