[Home]

Summary:ASTERISK-03469: Asterisk deadlocks from time to time
Reporter:hal (hal)Labels:
Date Opened:2005-02-10 06:50:06.000-0600Date Closed:2011-06-07 14:00:18
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) Asterisk_debug.txt
( 1) make_fails.txt
Description:All connected phones (SNOM 190) show NR (not registered).
No incoming or outgoing calls are processed.
Commands in the cli have no effect.
Only killing all mpg123 by killall -9 mpg123 and an /etc/init.d/asterisk stop and start brings system back.
Neither systemlogs nor asterisk full-log gave me a hint.
Debug file log/asterisk/full is attached. Tell me, if other logs/debugs are needed. I'll try to get them.
If something was wrong with this bug report, please be kind to my karma ;-)

****** ADDITIONAL INFORMATION ******

This problem shows up on two hardware-different systems.
We have these systems - to prevent this deadlocks - reboot every 24 h by cron-job to release memory and clean up things.
Systems are Fedora Core 2 with Kernel 2.6.8-1.521.
Hardware ~1400 Athlon, 512 MB RAM, 120GB HD no Telefon-Hardware integrated. We use Patton/INALP ISDN-Gateways, so SIP is the only protocol, beeing used. We don't use any external connections, so routing and firewalling shouldn't be a problem...
Comments:By: Andrey S Pankov (casper) 2005-02-10 07:14:41.000-0600

Why do you think this is a deadlock?

In asterisk Makefile change:
DEBUG_THREADS = #-DDEBUG_THREADS #-DDO_CRASH #-DDETECT_DEADLOCKS
to be:
DEBUG_THREADS = -DDEBUG_THREADS -DDETECT_DEADLOCKS #-DDO_CRASH

make clean; make valgrind

Then follow the instructions in http://www.voip-info.org/wiki-Asterisk+debugging
to find where asterisk is locked.

By: hal (hal) 2005-02-10 07:36:04.000-0600

Casper,
thanks for your really quick response.
I found the definition of an Asterisk deadlock here:
http://www.voip-info.org/wiki-Asterisk+deadlock
Reading the above, let me suspect having a "real" deadlock in our systems.
I'll follow your tip, enable dedect deadlocks and "bring" some more informations.

By: hal (hal) 2005-02-10 08:38:26.000-0600

Make valgrind fails with errors.
Since this is another problem, I'm not shure how to handle it correct in this bug report.
Please give me some advice.
In the hope not getting shot for this, I attached the output of make to make_fails.txt .

Thanks

By: hal (hal) 2005-02-10 09:01:27.000-0600

I found the patch for lock.h enabling detection of deadlocks.
Please ignore the make failure.

By: Andrey S Pankov (casper) 2005-02-10 09:40:07.000-0600

BTW, you can delete your message if there is no need in it:

HAL
02-10-05 08:38

[ Edit ][ Delete ]
          ^^^^^^^^

By: Mark Spencer (markster) 2005-02-10 10:35:39.000-0600

Also even without deadlock detection it would be helpful to follow the directions provided in the link from the bug tracker guidelines for attaching gdb to the deadlocked asterisk and supplying the results of a "thread apply all bt", again when in the deadlocked state.  Without any more information there will be no way to assist you with your issue.

By: hal (hal) 2005-02-10 11:24:12.000-0600

I have prepared one of the systems for detecting deadlocks and activated debugging threads.
Now, I'm waiting for the deadlock or whatever we can call this behaviour.
I hope, I can provide you with these informations in the next days.
BTW, what can I do, if "thread apply all bt" does not give any informations?
I have another bug here with the cli reload command, which puts asterisk in the same (?) state as this bug and neither the full log nor the gdb provides further informations.

Thanks for your patience

By: nick (nick) 2005-02-10 19:13:53.000-0600

Just for giggles, are you using mpg123 .59r?

By: Brian West (bkw918) 2005-02-10 23:52:29.000-0600

Nick i'll take "NO" for 800 please....

By: hal (hal) 2005-02-11 02:48:06.000-0600

Nick, yes on this system mpg123.59r is installed.
I heard of some security related things about this version, but thought it
wouldn't reflect our installation...
But your question and bkw918's statement let my alarm bells ring ;-)
On our asterisk production server we use for 3 weeks the mpg123 version provided in the source from asterisk. But in the past, with the 59r version we diddn't have any issues like these, described in this bug.
Do you have further informations concerning this mpg123 stuff?

Thanks

By: Anthony Minessale (anthm) 2005-02-11 12:38:00.000-0600

Why not convert all your music to slin it's more efficient anyway.
see contrib/utils/README.rawplayer

or use the moh_files and format_mp3 from asterisk-addons
or use the moh_files and decode your music into every possible format for efficiency.

Not to sweep your problem under the rug, but if this is a prod system you may want to try one of these alternatives to help prove if its mpg player related.
The worst that could happen is your problem goes away.

By: Mark Spencer (markster) 2005-02-11 16:30:09.000-0600

You have to have a working gdb.  Without working gdb, there will be nothing we can do for you.

By: hal (hal) 2005-02-11 16:52:32.000-0600

Mark,
gdb is configured and I'll start it, as soon as the deadlock happens again.
Since I don't know, how to reproduce this deadlock, all I can do in the moment is to wait for it. Following murphy's law, it will never happen again ;-)

By: hal (hal) 2005-02-11 17:25:57.000-0600

Anthm,
thanks for your tips on alternatives to mpg123. I'll have a look at them.

By: Clod Patry (junky) 2005-02-12 01:13:37.000-0600

And, like your version is 1 month old, try to update your cvs version (and paste which one you're making your test now) and give us some feedbacks related to this.

Also, like this bug is set to block, i'd like to see that one close as soon as possible.

Thanks.

And, my opinion is you don't have to reboot your system to make some clean up in the memory. That's a really bad thing to do, it means there's some memory leaks somewhere and we have to find where exactly, if it's really the case.
Do you have any infos related to that memory leak ? Add somes output of meminfo, etc. for we can see that's a lot of consumation of memory.

edited on: 02-12-05 01:18

By: hal (hal) 2005-02-12 07:56:54.000-0600

Junky, I agree with you, this bug should be closed as soon as possible.
I think we have two prospects to do so:
Closing this bug by upgrading to newer CVS, have a look if it happens again
and if so, reopening it
or
leaving it open and wait until the deadlock comes again, so I can provide you with the outputs of gdb.
Since there seem to be some other issues, tranferring calls coming from a queue, cli command reload deadlocks the system too in CVS-Release from 01/08/05, I think, I should really update to actual CVS.
But as I remember, I updated all our asterisk installations in the beginning of January precisely because of this deadlocks and nothing seemed to change..
Perhaps, changing from CVS to STABLE and eventually loosing some features could be a solution too.

>Reboots/24h
I don't think we have had a memory leak, since after each in the above described deadlocks, I examined the system and could not find any hints (top, meminfo, logfiles etc.) All other applications on these servers (postfix, cyrus-imap, spamassassin etc.) were running fine.
But these reboots seemed to be helpful to prevent deadlocks in asterisk, since without rebooting, results in nearly daily deadlocks.
If you think it could be helpful, I'll post the used asterisk-config on the whole as an txt-file.

Once more, thanks to all for your great patience and helpfulness.

By: Clod Patry (junky) 2005-02-12 10:41:22.000-0600

I'm closing this bug now, if you need to re-open it HAL, feel free to contact us on #asterisk-bugs on freenode.