Summary:ASTERISK-15066: full system crash every other day
Reporter:moshe Teitelbaum (moshe)Labels:
Date Opened:2009-11-03 00:19:56.000-0600Date Closed:2009-11-17 07:54:42.000-0600
Versions:Frequency of
Environment:Attachments:( 0) backtrace.txt
Description:we have a server running as MTE with 52 tenants and over 200 extensions recently asterisk is crashing every other day, Obviously we have no way of knowing how to recreate the problem. Its possible that whatever causes the crash can be repeated with 100% crash-success, but we still cant figure out what specifically is causing it.

following is the last CLI before the crash.

[Nov  2 15:15:42] ERROR[1928]: /usr/src/asterisk/1.4.26/asterisk- __ast_pthread_mutex_lock: chan_local.c line 542 (local_hangup): Error obtaining mutex: Invalid argument
[Nov  2 15:15:42] ERROR[1928]: /usr/src/asterisk/1.4.26/asterisk- __ast_pthread_mutex_unlock: chan_local.c line 597 (local_hangup): mutex '&p->lock' freed more times than we've locked!
[Nov  2 15:15:42] ERROR[1928]: /usr/src/asterisk/1.4.26/asterisk- __ast_pthread_mutex_unlock: chan_local.c line 597 (local_hangup): Error releasing mutex: Invalid argument
[Nov  2 15:15:42] ERROR[1928]: /usr/src/asterisk/1.4.26/asterisk- __ast_pthread_mutex_destroy: chan_local.c line 158 (local_pvt_destroy): Error: attempt to destroy invalid mutex '&pvt->lock'.

i could supply with backtrace
Comments:By: moshe Teitelbaum (moshe) 2009-11-03 10:12:24.000-0600

im not sure if it is related but every other call is getting the following warning
[Nov  3 11:05:35] WARNING[12706]: app_dial.c:1275 dial_exec_full: Unable to create channel of type 'SIP' (cause 20 - Unknown)

By: moshe Teitelbaum (moshe) 2009-11-04 08:29:57.000-0600

additional errors coming up every now and than , and again not  sure if it is related

[Nov  4 09:14:43] ERROR[27826]: utils.c:966 ast_carefulwrite: write() returned error: Connection reset by peer
[Nov  4 09:14:43] ERROR[27826]: utils.c:966 ast_carefulwrite: write() returned error: Broken pipe

as well as the following error which is kind of new to me ( i haven't seen it till today)

[Nov  4 05:56:52] WARNING[1979]: chan_sip.c:7053 determine_firstline_parts: Bad request protocol OK

i would like to know how i could expedite things around hare

By: Joshua C. Colp (jcolp) 2009-11-04 15:14:54.000-0600

Can you please try to reproduce this issue with 1.4 from SVN? it looks like something that has already been fixed. Thanks!

By: Erik Smith (eeman) 2009-11-05 11:34:17.000-0600

file, are you referring to a changelog remark in SVN regarding issue 16027? This is a heavy-use production box and he is weary of buying new problems with SVN Branch, if its this particular issue fix can he just remove the 1 line in chan_sip.c detailed in the notes of the issue?

By: Leif Madsen (lmadsen) 2009-11-06 09:25:57.000-0600

Just assigned to file for comment back. Move back to appropriate status after commenting. Thanks!

By: Leif Madsen (lmadsen) 2009-11-13 08:48:25.000-0600

Do you happen to be using an AGI here? If so, this could possibly be related to a couple other issues I've just found.

By: Erik Smith (eeman) 2009-11-13 09:00:58.000-0600

negative, this is just a macro for a ringgroup that invokes a bunch of local/exten@context technologies.

By: Leif Madsen (lmadsen) 2009-11-13 10:36:36.000-0600

OK thanks, so this is a separate issue.

By: Leif Madsen (lmadsen) 2009-11-17 07:31:50.000-0600

Are you able to test on the latest release candidates? There is a feeling this may already be fixed. Thanks!

By: Erik Smith (eeman) 2009-11-17 07:38:01.000-0600

well what I did was I backported the patch in 16027

+  if (c) {
+ }

and recompiled. I have found that sometimes upgrading causes one to buy a new problem in the trade-off. We havent had a crash since but its only been 4 production days. However, it used to crash every 2 - 3 production days. If there is no crash by friday, november 20th; I will assume it resolved the problem.

By: Leif Madsen (lmadsen) 2009-11-17 07:54:41.000-0600

I'm going to close this issue as that is the one I figured had resolved this. If you're still having issues going forward, please open a new issue, but for now this one is resolved. Thanks for reporting back!