ASTERISK-04954: [patch] Using printf in the urg_handler implicated in a deadlock

[Home]

Summary: ASTERISK-04954: [patch] Using printf in the urg_handler implicated in a deadlock

Reporter: Steve Davies . (stevedavies) Labels:

Date Opened: 2005-09-01 05:46:40 Date Closed: 2008-01-15 15:46:45.000-0600

Priority: Blocker Regression? No

Status: Closed/Complete Components: Core/General

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments: ( 0) deadlock.gdb
( 1) noprintf.patch

Description: For my glibc, at least, the stdio library is thread-safe through the use internal locking. See flockfile(3) for comments about this. (My box is gentoo, glibc-2.3.4.20041102-r1)

So I really don't think it's a good idea to use printf in the urg_handler. It can be called at any time and block a thread. (Never quite figured out the value of the message anyway).

I have a reproducible deadlock which involves this printf call in the urg_handler. Commenting out the call to printf eliminated the deadlock. I'll attach the trivial patch.

You've got my disclaimer.

****** ADDITIONAL INFORMATION ******

I've attached some gdb output from attaching to the deadlock asterisk.
The Asterisk was running a test with non-stop test calls over 4 loopbacked PRIs. Calls generated via callfiles.

Here's my analysis of the gdb session:

Thread 128 was holding msglist_lock, and in the middle of fputs (I think holding stdio's lock too, but busy doing funlockfile). This thread was using 100% of the CPU.

Thread 18 was blocked waiting on iflock at chan_zap.c line 6247. Then, the urg_handler was called, which calls printf which tries to do flockfile and blocks again.

Thread 9 is a fun one. It's the iax2 network_thread and was doing a iax2_do_register via the sched_runq. This called iax2_send, which, in iax2_transmit does pthread_kill(netthreadid, SIGURG), which is us ourselves. The urg_handler is entered which calls printf which tries to do flockfile and blocks. This thread is blocked with iax2's sched->lock locked; can't find anything else.

Thread 4 was trying to write out the "Avoided initial deadlock..." message, but got blocked again via stdio's flockfile.

Thread 1 was blocked via stdio's flockfile as it tried to do some cursor moving stuff I think.

Pretty much all other threads were also blocked by the flockfile.

I don't quite see why thread 128, which was holding the lock, was looping and not releasing the lock - perhaps its some issue with so many threads blocked - a glibc/pthread issue of some sort...

Comments: By: Michael Jerris (mikej) 2005-09-01 06:49:39

Is this a new deadlock, or did you just find it and it has been around for a while. The reason I ask is because that message was just changed recently (it's somewhat longer now) and I was wondering if it had any weird relation to that change or not.
By: Steve Davies . (stevedavies) 2005-09-01 12:58:31

Hi Mike,

Can't say for sure if this is new IIRC, the urg_handler only ever did this printf if you ran with debug on, and most people don't run that way, and also don't run so heavily loaded as I load my test machines.

To see this deadlock you'll need to run with debug on, and I think with lots of load. In my case, I run 120 concurrent calls via 4 PRIs looped 2 -> 2. That's a load of 120 calls, with a new call initiated every 1/2 second or so. On that setup, the deadlock generally occurs within an hour or two.

I did see that the message changed a little, but I doubt that that is the cause.

Steve
By: Kevin P. Fleming (kpfleming) 2005-09-01 18:02:51

Based on your excellent analysis, I've committed this one to CVS HEAD. Thanks!

(I never understood the value of the 'urgent handler' debug message anyway, but I'm sure Mark put it there for some reason)
By: Digium Subversion (svnbot) 2008-01-15 15:46:45.000-0600

Repository: asterisk
Revision: 6492

U trunk/asterisk.c

------------------------------------------------------------------------
r6492 | kpfleming | 2008-01-15 15:46:45 -0600 (Tue, 15 Jan 2008) | 2 lines

eliminate the urgent handler message, since it can causing blocking in the stdio library (issue ASTERISK-4954)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=6492