Summary:ASTERISK-21389: res_timing_pthread fails to return from write, causing timer dependent operations to block indefinitely
Reporter:Matt Jordan (mjordan)Labels:
Date Opened:2013-04-08 15:16:47Date Closed:2013-04-19 11:00:03
Status:Closed/CompleteComponents:Channels/chan_sip/General Resources/res_timing_pthread
Versions: Frequency of
must be completed before resolvingASTERISK-21773 Asterisk Open Blockers
must be completed before resolvingASTERISK-21774 Asterisk 11.4.0 Open Blockers
is related toASTERISK-19754 Deadlock in chan_sip / pthread_timing
is related toASTERISK-20577 Asterisk deadlocks waiting for timer in res_timing_pthread while running AGI script
is related toASTERISK-14050 [patch] Asterisk loses SIP phones, possible deadlock,
is related toASTERISK-17436 random deadlocks - SIP messages not being processed
is related toASTERISK-17458 Deadlocks when using pthread timer
Environment:Attachments:( 0) 0001-res_timing_pthread-Reduce-probability-of-deadlocking.patch
( 1) backtrace1-threads.txt
( 2) backtrace-threads.txt
Description:Tony from Schmooze Com reported that since upgrading to Asterisk 1.8.20.x, there have been numerous apparent lock ups in {{chan_sip}}, wherein all calls stop being processed. Other operations (such as querying from the CLI) continue to work correctly.

Two backtraces taken from affected systems point to a call to {{res_timing_pthread}}'s write_byte not returning. This call is attempting to write a byte into a timer's pipe. In general, while the pipe is blocking, this operation should write the byte and immediately return. Failure to return typically indicates that the pipe is full.

It isn't clear right now which timer is actually causing the problem, nor is it clear why a read operation isn't being performed on some timer. However, as noted in ASTERISK-14050, having a single timer run into this state will cause this exact problem.
Comments:By: Shaun Ruffell (sruffell) 2013-04-08 17:45:33.515-0500

I would like to review this later, but I have prepared a [patch against trunk|http://git.asterisk.org/gitweb/?p=team/sruffell/asterisk-working.git;a=patch;h=419f633aebc1c1f545d9f024973940464ae41c7c] which also [applies cleanly to 1.8|http://git.asterisk.org/gitweb/?p=team/sruffell/asterisk-working.git;a=shortlog;h=refs/heads/svn_1.8-res_timing_pthread].

This patch can be applied to your Asterisk working copy like:
curl "http://git.asterisk.org/gitweb/?p=team/sruffell/asterisk-working.git;a=patch;h=419f633aebc1c1f545d9f024973940464ae41c7c" | patch -p1

This patch basically puts the pipe in non-blocking mode and uses only the "pending_tick" member of the timer to control how many ticks are pending, and not also the number of bytes in the pipe. The pipe just has two states now...either there is a byte in it in order to make poll return as long as there are pending_ticks, or it's empty if there are no pending ticks.

By: Shaun Ruffell (sruffell) 2013-04-10 15:12:20.259-0500

Attached patch which is also posted at https://reviewboard.asterisk.org/r/2441/