Summary: | ASTERISK-21389: res_timing_pthread fails to return from write, causing timer dependent operations to block indefinitely | ||||||||||||||||
Reporter: | Matt Jordan (mjordan) | Labels: | |||||||||||||||
Date Opened: | 2013-04-08 15:16:47 | Date Closed: | 2013-04-19 11:00:03 | ||||||||||||||
Priority: | Major | Regression? | |||||||||||||||
Status: | Closed/Complete | Components: | Channels/chan_sip/General Resources/res_timing_pthread | ||||||||||||||
Versions: | 1.8.20.0 1.8.21.0 | Frequency of Occurrence | |||||||||||||||
Related Issues: |
| ||||||||||||||||
Environment: | Attachments: | ( 0) 0001-res_timing_pthread-Reduce-probability-of-deadlocking.patch ( 1) backtrace1-threads.txt ( 2) backtrace-threads.txt | |||||||||||||||
Description: | Tony from Schmooze Com reported that since upgrading to Asterisk 1.8.20.x, there have been numerous apparent lock ups in {{chan_sip}}, wherein all calls stop being processed. Other operations (such as querying from the CLI) continue to work correctly.
Two backtraces taken from affected systems point to a call to {{res_timing_pthread}}'s write_byte not returning. This call is attempting to write a byte into a timer's pipe. In general, while the pipe is blocking, this operation should write the byte and immediately return. Failure to return typically indicates that the pipe is full. It isn't clear right now which timer is actually causing the problem, nor is it clear why a read operation isn't being performed on some timer. However, as noted in ASTERISK-14050, having a single timer run into this state will cause this exact problem. | ||||||||||||||||
Comments: | By: Shaun Ruffell (sruffell) 2013-04-08 17:45:33.515-0500 I would like to review this later, but I have prepared a [patch against trunk|http://git.asterisk.org/gitweb/?p=team/sruffell/asterisk-working.git;a=patch;h=419f633aebc1c1f545d9f024973940464ae41c7c] which also [applies cleanly to 1.8|http://git.asterisk.org/gitweb/?p=team/sruffell/asterisk-working.git;a=shortlog;h=refs/heads/svn_1.8-res_timing_pthread]. This patch can be applied to your Asterisk working copy like: {noformat} curl "http://git.asterisk.org/gitweb/?p=team/sruffell/asterisk-working.git;a=patch;h=419f633aebc1c1f545d9f024973940464ae41c7c" | patch -p1 {noformat} This patch basically puts the pipe in non-blocking mode and uses only the "pending_tick" member of the timer to control how many ticks are pending, and not also the number of bytes in the pipe. The pipe just has two states now...either there is a byte in it in order to make poll return as long as there are pending_ticks, or it's empty if there are no pending ticks. By: Shaun Ruffell (sruffell) 2013-04-10 15:12:20.259-0500 Attached patch which is also posted at https://reviewboard.asterisk.org/r/2441/ |