[Home]

Summary:ASTERISK-14326: [patch] lock in sip_tcp_helper_thread
Reporter:pj (pj)Labels:
Date Opened:2009-06-17 06:17:39Date Closed:2011-06-07 14:00:48
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/TCP-TLS
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20090828__issue15343.diff.txt
( 1) gdb.txt
( 2) lock.txt
Description:Asterisk locked out during normal sip call processing. It seems, that bug is in sip/tcp handling (one peer used sip over tcp when this lock happened).


****** ADDITIONAL INFORMATION ******

attached "core show locks" and gdb output,
gdb was attached to locked asterisk process
"thread apply all bt full" give no output
bt and "bt full" worked

Build Options: DONT_OPTIMIZE, DEBUG_THREADS, LOADABLE_MODULES, DEBUG_FD_LEAKS, MALLOC_DEBUG
Comments:By: Mark Michelson (mmichelson) 2009-06-18 14:30:08

This is interesting. I can see from the "core show locks" output that the tcptls_session->lock is held by thread -1275495568. This lock is not supposed to be held for very long. In fact, the only thing that is done with this lock held is to call fgets() to read input from the socket. For some reason this call to fgets() is blocking indefinitely. This should not happen, though, because there is a call to ast_wait_for_input earlier.

I'm not sure why this is happening.

By: pj (pj) 2009-06-22 09:06:13

I'm not sure, if this bugreport can be related to ASTERISK-1431464, because lock described in bugreport ASTERISK-1431464 appears with sip/udp and this bugreport ASTERISK-1516343 describes lock in handling sip/tcp.

By: Mark Michelson (mmichelson) 2009-06-22 09:20:29

I am certain that this is not related to ASTERISK-13569 and ASTERISK-14326. I will remove the relationship.

By: Tilghman Lesher (tilghman) 2009-08-28 17:36:06

mmichelson:  the call to ast_wait_for_input() is only valid for determining whether there is at least one character available to be read.  I think we could switch from using fgets() to using read() with a non-blocking fd, cycle out the lock, and save ourselves some grief.

By: Tilghman Lesher (tilghman) 2009-08-28 18:40:13

This should fix it.  I think the problem is that the SSL layer may need to do renegotiation, and when that happens, SSL_read and SSL_write may return errors indefinitely.  Therefore, the underlying implementation of the FILE pointer needs to have visibility to the locking (and, in fact, control the locking).

By: pj (pj) 2009-09-01 12:54:31

patch applied, can't tell currently, if it will help or not, because lockouts appear before in random manner.

By: Tilghman Lesher (tilghman) 2009-09-15 11:33:19

pj: given that it's been 2 weeks, is it safe to say that the patch has fixed your lockups?

By: pj (pj) 2009-09-16 03:32:57

It doesn't lock in last weeks. But also can't confirm, if this patch help resolving initial issue, because, if I read your comments corrently, you tried to fix something in SSL/TLS handling, but my initial locking issue appears in pure SIP/TCP (I have tls still disabled). But if you think, that your patch can help something else, it can be commited, I don't found negative impact of this patch to asterisk operation ;-)
thanks!

By: Leif Madsen (lmadsen) 2010-01-06 11:00:21.000-0600

Closing this as the reporter can't reproduce, so no change is required at this time.