[Home]

Summary:ASTERISK-13569: [patch] lock during simple call processing
Reporter:pj (pj)Labels:
Date Opened:2009-02-12 06:27:37.000-0600Date Closed:2009-08-10 09:30:19
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) gdb.txt
( 1) showlocks.txt
( 2) sip_inf_loop.patch
Description:During normal operation, pure sip-sip calls, without special features (eg. transfer), asterisk locks. Asterisk server had very small utilization - three concurrent sip calls at maximum. After this lockout, all peers becomed unreachable and asterisk must be manually restarted. CLI was not locked.
"core show locks" attached


****** ADDITIONAL INFORMATION ******

=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1216238704 (do_monitor           started at [21149] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 20656 handle_request_do &netlock 0xb78e18a0 (1)
       asterisk(ast_bt_get_addresses+0x19) [0x80fe866]
       /usr/lib/asterisk/modules/chan_sip.so [0xb785b6a9]
       /usr/lib/asterisk/modules/chan_sip.so [0xb78b4227]
       /usr/lib/asterisk/modules/chan_sip.so [0xb78b3fa9]
       asterisk(ast_io_wait+0x14d) [0x80f4581]
       /usr/lib/asterisk/modules/chan_sip.so [0xb78b5bf0]
       asterisk [0x8175261]
       /lib/i686/libpthread.so.0 [0xb7c694e2]
       /lib/i686/libc.so.6(clone+0x5e) [0xb7ec047e]
=== ---> Lock #1 (astobj2.c): MUTEX 198 ao2_trylock &p->priv_data.lock 0xb7351928 (1)
       asterisk(ast_bt_get_addresses+0x19) [0x80fe866]
       asterisk [0x808266a]
       asterisk(ao2_trylock+0x4e) [0x80825d5]
       /usr/lib/asterisk/modules/chan_sip.so [0xb786f0b0]
       /usr/lib/asterisk/modules/chan_sip.so [0xb78b4243]
       /usr/lib/asterisk/modules/chan_sip.so [0xb78b3fa9]
       asterisk(ast_io_wait+0x14d) [0x80f4581]
       /usr/lib/asterisk/modules/chan_sip.so [0xb78b5bf0]
       asterisk [0x8175261]
       /lib/i686/libpthread.so.0 [0xb7c694e2]
       /lib/i686/libc.so.6(clone+0x5e) [0xb7ec047e]
Comments:By: Mark Michelson (mmichelson) 2009-02-12 15:46:18.000-0600

This is a case where it is very difficult to tell where the problem is just based on the core show locks output.

If the problem happens again, could you upload both core show locks and a thread apply all bt full from gdb? That way, we can see the state of all the threads on the system when the lock occurs. Thanks!

By: pj (pj) 2009-02-12 15:58:30.000-0600

OK, when next lock will appear, I will try to do:
gdb /usr/sbin/asterisk <pid of main * process>
and then:
thread apply all bt full
core show locks

so, please keep this bugreport open for some time

By: pj (pj) 2009-03-17 08:32:25

locked again, uploaded gdb debug output from locked asterisk, and "core show locks"
Asterisk SVN-trunk-r176174



By: pj (pj) 2009-03-22 04:34:50

feedback and required traces was given, please change status of this bugreport.

By: pj (pj) 2009-04-10 07:35:47

Do you have sufficient information to find source of this lockout yet? I posted gdb, that you require in the past.
If you can't find where problem is, according to supplied debugs, you can close this bugreport, because I can't supply better outputs, like valgrind, because I don't know, how to invoke this lock. And also can't run from valgrind all the time, for eg. until another lock appears again. thanks for understanding!



By: Leif Madsen (lmadsen) 2009-06-10 12:42:31

Assigned to Mark to review the information. If this is useful information, but you don't have the time to resolve this issue, please un-assign yourself and set status to Acknowledged. Thanks!

By: Mark Michelson (mmichelson) 2009-06-10 17:28:47

I just took a look at this, and it has me a bit baffled.

I can see from the "core show locks" output that the SIP monitor thread is blocking at some point. The problem is that the corresponding thread in the gdb output (thread 6) has a bunch of ?? frames. In addition, the portion of the stack that is not ?? has garbage (e.g. free() does not call poll()).

It's very frustrating that the most important thread from the gdb output is the only one that has garbage data in it.

By: Mark Michelson (mmichelson) 2009-06-11 09:20:49

I don't know if I can trust the gdb output for thread 6, but I just noticed that the "free" in frame 2 says that it came from libresolv.so.2, so this may mean that the SIP monitor thread is blocking during a DNS lookup for some reason.

By: pj (pj) 2009-06-11 15:20:12

so, do you agree, that this is real bug?
I think so, because asterisk stability shouldn't rely on dns system and definitively shouldn't lock in case of slow or no response from dns.

By: Mark Michelson (mmichelson) 2009-06-11 17:07:06

I don't even know for sure if the problem is related to DNS. That thread in the trace provided might be completely bogus. That's what's a bit frustrating with this issue right now. It's hard to tell what the cause is, and the tools that are supposed to help with that are not helpful at the moment.

By: David Brillert (aragon) 2009-06-17 23:23:10

Possibly related to this revision?
I only began seeing this issue in 1.4.25, it did not occur in 1.4.24.1

2009-05-28 15:27 +0000 [r197588]  Mark Michelson <mmichelson@digium.com>

* main/rtp.c, channels/chan_sip.c, include/asterisk/rtp.h: Allow
 for media to arrive from an alternate source when responding to a
 reinvite with 491. When we receive a SIP reinvite, it is possible
 that we may not be able to process the reinvite immediately since
 we have also sent a reinvite out ourselves. The problem is that
 whoever sent us the reinvite may have also sent a reinvite out to
 another party, and that reinvite may have succeeded. As a result,
 even though we are not going to accept the reinvite we just
 received, it is important for us to not have problems if we
 suddenly start receiving RTP from a new source. The fix for this
 is to grab the media source information from the SDP of the
 reinvite that we receive. This information is passed to the RTP
 layer so that it will know about the alternate source for media.
 Review: https://reviewboard.asterisk.org/r/252



By: David Brillert (aragon) 2009-06-17 23:26:46

I must concur with jvandal on his note
https://issues.asterisk.org/view.php?id=15213#105717
jvandal (reporter)
2009-05-29 10:28

If I check on my server, the working revision for is r197562 but fail with r197588

-ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197562M $")
+ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197588M $")
jvandal (reporter)
2009-05-29 10:28



By: Mark Michelson (mmichelson) 2009-06-18 11:02:38

aragon, if you are also seeing this issue, would you mind uploading "core show locks" from the Asterisk CLI and "thread apply all bt" from gdb when the issue occurs? When pj tried, gdb crapped all over the thread I cared about. Maybe you'll have better luck.

By: David Brillert (aragon) 2009-06-18 11:24:37

I opened another bug report for this here
https://issues.asterisk.org/view.php?id=15345

All of my debugs are uploaded to that ticket...

By: Mark Michelson (mmichelson) 2009-06-18 12:50:50

Discussion on issue 15345 leads me to believe that the problem is due to an infinite loop under certain circumstances. Please try applying sip_inf_loop.patch and see if you still experience the problem. Thanks!

By: pj (pj) 2009-06-18 13:34:17

it will be hard to prove, if this patch will solve initial issue, because, as I wrote before, I don't know, how to invoke this lock and currently I haven't even access to system, that was originally affected.
But recently, I discovered another lockout, for what I created bugreport 0015343 so you can look at it, if can be eventually related to issue, for what you created patch.

By: Digium Subversion (svnbot) 2009-06-22 09:34:11

Repository: asterisk
Revision: 202336

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines

Fix a possible infinite loop in SDP parsing during glare situation.

There was a while loop in get_ip_and_port_from_sdp which was controlled
by a call to get_sdp_iterate. The loop would exit either if what we were
searching for was found or if the return was NULL. The problem is that
get_sdp_iterate never returns NULL. This means that if what we were searching
for was not present, the loop would run infinitely. This modification of the
loop fixes the problem.

(closes issue ASTERISK-14217)
Reported by: schmidts

(closes issue ASTERISK-14332)
Reported by: samy

(closes issue ASTERISK-13569)
Reported by: pj

(closes issue ASTERISK-14328)
Reported by: aragon
Patches:
     sip_inf_loop.patch uploaded by mmichelson (license 60)
Tested by: aragon


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202336

By: Digium Subversion (svnbot) 2009-06-22 09:35:12

Repository: asterisk
Revision: 202337

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r202337 | mmichelson | 2009-06-22 09:35:10 -0500 (Mon, 22 Jun 2009) | 31 lines

Merged revisions 202336 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
 
 Fix a possible infinite loop in SDP parsing during glare situation.
 
 There was a while loop in get_ip_and_port_from_sdp which was controlled
 by a call to get_sdp_iterate. The loop would exit either if what we were
 searching for was found or if the return was NULL. The problem is that
 get_sdp_iterate never returns NULL. This means that if what we were searching
 for was not present, the loop would run infinitely. This modification of the
 loop fixes the problem.
 
 (closes issue ASTERISK-14217)
 Reported by: schmidts
 
 (closes issue ASTERISK-14332)
 Reported by: samy
 
 (closes issue ASTERISK-13569)
 Reported by: pj
 
 (closes issue ASTERISK-14328)
 Reported by: aragon
 Patches:
       sip_inf_loop.patch uploaded by mmichelson (license 60)
 Tested by: aragon
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202337

By: Digium Subversion (svnbot) 2009-06-22 09:35:38

Repository: asterisk
Revision: 202338

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_sip.c

------------------------------------------------------------------------
r202338 | mmichelson | 2009-06-22 09:35:35 -0500 (Mon, 22 Jun 2009) | 38 lines

Merged revisions 202337 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines
 
 Merged revisions 202336 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
   
   Fix a possible infinite loop in SDP parsing during glare situation.
   
   There was a while loop in get_ip_and_port_from_sdp which was controlled
   by a call to get_sdp_iterate. The loop would exit either if what we were
   searching for was found or if the return was NULL. The problem is that
   get_sdp_iterate never returns NULL. This means that if what we were searching
   for was not present, the loop would run infinitely. This modification of the
   loop fixes the problem.
   
   (closes issue ASTERISK-14217)
   Reported by: schmidts
   
   (closes issue ASTERISK-14332)
   Reported by: samy
   
   (closes issue ASTERISK-13569)
   Reported by: pj
   
   (closes issue ASTERISK-14328)
   Reported by: aragon
   Patches:
         sip_inf_loop.patch uploaded by mmichelson (license 60)
   Tested by: aragon
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202338

By: Digium Subversion (svnbot) 2009-06-22 09:36:03

Repository: asterisk
Revision: 202339

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_sip.c

------------------------------------------------------------------------
r202339 | mmichelson | 2009-06-22 09:36:00 -0500 (Mon, 22 Jun 2009) | 38 lines

Merged revisions 202337 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines
 
 Merged revisions 202336 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
   
   Fix a possible infinite loop in SDP parsing during glare situation.
   
   There was a while loop in get_ip_and_port_from_sdp which was controlled
   by a call to get_sdp_iterate. The loop would exit either if what we were
   searching for was found or if the return was NULL. The problem is that
   get_sdp_iterate never returns NULL. This means that if what we were searching
   for was not present, the loop would run infinitely. This modification of the
   loop fixes the problem.
   
   (closes issue ASTERISK-14217)
   Reported by: schmidts
   
   (closes issue ASTERISK-14332)
   Reported by: samy
   
   (closes issue ASTERISK-13569)
   Reported by: pj
   
   (closes issue ASTERISK-14328)
   Reported by: aragon
   Patches:
         sip_inf_loop.patch uploaded by mmichelson (license 60)
   Tested by: aragon
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202339

By: Digium Subversion (svnbot) 2009-06-22 09:36:29

Repository: asterisk
Revision: 202340

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_sip.c

------------------------------------------------------------------------
r202340 | mmichelson | 2009-06-22 09:36:26 -0500 (Mon, 22 Jun 2009) | 38 lines

Merged revisions 202337 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines
 
 Merged revisions 202336 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
   
   Fix a possible infinite loop in SDP parsing during glare situation.
   
   There was a while loop in get_ip_and_port_from_sdp which was controlled
   by a call to get_sdp_iterate. The loop would exit either if what we were
   searching for was found or if the return was NULL. The problem is that
   get_sdp_iterate never returns NULL. This means that if what we were searching
   for was not present, the loop would run infinitely. This modification of the
   loop fixes the problem.
   
   (closes issue ASTERISK-14217)
   Reported by: schmidts
   
   (closes issue ASTERISK-14332)
   Reported by: samy
   
   (closes issue ASTERISK-13569)
   Reported by: pj
   
   (closes issue ASTERISK-14328)
   Reported by: aragon
   Patches:
         sip_inf_loop.patch uploaded by mmichelson (license 60)
   Tested by: aragon
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202340