[Home]

Summary:ASTERISK-16111: [patch] Asterisk deadlock
Reporter:aleksey2000 (aleksey2000)Labels:
Date Opened:2010-05-18 05:11:55Date Closed:2010-05-20 14:47:46
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) btfull_20100519.txt
( 1) btfull.txt
( 2) coreshowcalls_20100519.txt
( 3) coreshowlocks_20100519.txt
( 4) issue_17352.diff
( 5) threadapplyallbt_20100519.txt
( 6) threadapplyallbt.txt
( 7) top_asterproc_20100519.txt
Description:This asterisk used for transit traffic between MERA and unknown Cisco device. Protocol: SIP Full proxy. Accept alaw, g729 than convert to alaw. Timing: dahdi.
Usually,  after 2-3 hours asterisk going to deadlock. Average channels: 20 - 40.

Deadlock happened on 1.6.1.18, 1.6.1.19, 1.6.2.7. (with AGI (a2billing) and without)



****** ADDITIONAL INFORMATION ******

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 1094760768 (do_monitor           started at [22514] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 21965 handle_request_do &netlock 0x2aaab60cc7c0 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x1a) [0x4d6616]
/usr/lib/asterisk/modules/chan_sip.so [0x2aaab5e38d4c]
/usr/lib/asterisk/modules/chan_sip.so [0x2aaab5e99aed]
/usr/lib/asterisk/modules/chan_sip.so [0x2aaab5e998a0]
/usr/sbin/asterisk(ast_io_wait+0x1ba) [0x4cb1f0]
/usr/lib/asterisk/modules/chan_sip.so [0x2aaab5e9b511]
/usr/sbin/asterisk [0x559e81]
/lib64/libpthread.so.0 [0x3288e06617]
/lib64/libc.so.6(clone+0x6d) [0x32886d3c2d]
=== ---> Waiting for Lock #1 (chan_sip.c): MUTEX 7404 find_call sip_pvt_ptr 0x1f2c8060 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x1a) [0x4d6616]
/usr/sbin/asterisk [0x44110d]
/usr/sbin/asterisk(_ao2_lock+0x53) [0x44145b]
/usr/lib/asterisk/modules/chan_sip.so [0x2aaab5e4f067]
/usr/lib/asterisk/modules/chan_sip.so [0x2aaab5e99b01]
/usr/lib/asterisk/modules/chan_sip.so [0x2aaab5e998a0]
/usr/sbin/asterisk(ast_io_wait+0x1ba) [0x4cb1f0]
/usr/lib/asterisk/modules/chan_sip.so [0x2aaab5e9b511]
/usr/sbin/asterisk [0x559e81]
/lib64/libpthread.so.0 [0x3288e06617]
/lib64/libc.so.6(clone+0x6d) [0x32886d3c2d]
=== --- ---> Locked Here: chan_sip.c line 7047 (sip_read)
=== -------------------------------------------------------------------
===
=== Thread ID: 1077872960 (pbx_thread           started at [ 4572] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 2644 __ast_read (channel lock) 0x1fb28d80 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x1a) [0x4d6616]
/usr/sbin/asterisk [0x473458]
/usr/sbin/asterisk(__ast_channel_trylock+0xa4) [0x473153]
/usr/sbin/asterisk [0x466152]
/usr/sbin/asterisk(ast_read+0x1a) [0x46838f]
/usr/sbin/asterisk [0x46f1aa]
/usr/sbin/asterisk(ast_channel_bridge+0x1477) [0x471387]
/usr/sbin/asterisk(ast_bridge_call+0xc35) [0x4ab84c]
/usr/lib/asterisk/modules/app_dial.so [0x2aaab1083a83]
/usr/lib/asterisk/modules/app_dial.so [0x2aaab1084e47]
/usr/sbin/asterisk(pbx_exec+0x1d3) [0x4e9eb5]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab000bdeb]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab000f14f]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab000fb41]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab00110d9]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab0011237]
/usr/sbin/asterisk(pbx_exec+0x1d3) [0x4e9eb5]
/usr/sbin/asterisk [0x4f33cb]
/usr/sbin/asterisk(ast_spawn_extension+0x64) [0x4f4b7b]
/usr/sbin/asterisk [0x4f52f8]
/usr/sbin/asterisk [0x4f725f]
/usr/sbin/asterisk [0x559e81]
/lib64/libpthread.so.0 [0x3288e06617]
/lib64/libc.so.6(clone+0x6d) [0x32886d3c2d]
=== ---> Lock #1 (chan_sip.c): MUTEX 7047 sip_read p 0x1f2c8060 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x1a) [0x4d6616]
/usr/sbin/asterisk [0x44110d]
/usr/sbin/asterisk(_ao2_lock+0x53) [0x44145b]
/usr/lib/asterisk/modules/chan_sip.so [0x2aaab5e4cb1b]
/usr/sbin/asterisk [0x466960]
/usr/sbin/asterisk(ast_read+0x1a) [0x46838f]
/usr/sbin/asterisk [0x46f1aa]
/usr/sbin/asterisk(ast_channel_bridge+0x1477) [0x471387]
/usr/sbin/asterisk(ast_bridge_call+0xc35) [0x4ab84c]
/usr/lib/asterisk/modules/app_dial.so [0x2aaab1083a83]
/usr/lib/asterisk/modules/app_dial.so [0x2aaab1084e47]
/usr/sbin/asterisk(pbx_exec+0x1d3) [0x4e9eb5]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab000bdeb]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab000f14f]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab000fb41]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab00110d9]
/usr/lib/asterisk/modules/res_agi.so [0x2aaab0011237]
/usr/sbin/asterisk(pbx_exec+0x1d3) [0x4e9eb5]
/usr/sbin/asterisk [0x4f33cb]
/usr/sbin/asterisk(ast_spawn_extension+0x64) [0x4f4b7b]
/usr/sbin/asterisk [0x4f52f8]
/usr/sbin/asterisk [0x4f725f]
/usr/sbin/asterisk [0x559e81]
/lib64/libpthread.so.0 [0x3288e06617]
/lib64/libc.so.6(clone+0x6d) [0x32886d3c2d]
=== -------------------------------------------------------------------
===
=======================================================================
Comments:By: aleksey2000 (aleksey2000) 2010-05-18 05:16:25

Fogot: compiled with DONT_OPTIMIZE, DEBUG_THREADS, DEBUG_CHANNEL_LOCKS.

In attach "thread apply all bt" and "bt full", got it from attaching to locked PID

By: Leif Madsen (lmadsen) 2010-05-18 13:06:12

We're proceeding on this issue, however if you wish to test the latest 1.6.2.8-rc1 (or 1.6.2 branch directly) then that may be useful just to verify it has not already been resolved. Thanks for the info!

By: aleksey2000 (aleksey2000) 2010-05-18 13:13:06

Is it kown problem or probably you need more information?

By: David Vossel (dvossel) 2010-05-18 17:32:28

The 'core show locks' output is kind of odd.  It doesn't look like a classic deadlock. The second thread isn't waiting on any locks.

By: aleksey2000 (aleksey2000) 2010-05-19 01:16:44

Ok, it's probably not 'real' deadlock, but when i getting 'core show locks' asterisks process takes 100% processor time, asterisk not reacting on 'core restart now', does not accepting registries, not making/accepting calls. Let me know if you need any additional information.

By: aleksey2000 (aleksey2000) 2010-05-19 04:02:49

New stop. Uploaded new bt full, thread apply all bt, part from 'top', 'core show locks' and 'core show calls'.

By: David Vossel (dvossel) 2010-05-19 09:21:11

Thread 15 (Thread 0x41fe1940 (LWP 1031)):
#0  0x00000000005521be in decode_length (buf=0x2aaac8564938 "\200\022\032zÚgÞüó\036\027-¦ \217¦\202X\021Ú\236\220ZÖ-6\214¦\t\030ãð", limit=32,
   len=0x41fd87d0, pvalue=0x41fd8658) at udptl.c:221
#1  0x000000000055220a in decode_open_type (buf=0x2aaac8564938 "\200\022\032zÚgÞüó\036\027-¦ \217¦\202X\021Ú\236\220ZÖ-6\214¦\t\030ãð", limit=32,
   len=0x41fd87d0, p_object=0x41fd86f0, p_num_octets=0x41fd86b0) at udptl.c:235
#2  0x00000000005526a9 in udptl_rx_packet (s=0x2aaac85641f0, buf=0x2aaac8564938 "\200\022\032zÚgÞüó\036\027-¦ \217¦\202X\021Ú\236\220ZÖ-6\214¦\t\030ãð",
   len=32) at udptl.c:370
#3  0x00000000005540f0 in ast_udptl_read (udptl=0x2aaac85641f0) at udptl.c:716
#4  0x00002aaab5e4c5c2 in sip_rtp_read (ast=0x2aaac8378970, p=0x2aaac82fabc8, faxdetect=0x41fd8a04) at chan_sip.c:6988
ASTERISK-1  0x00002aaab5e4cb2c in sip_read (ast=0x2aaac8378970) at chan_sip.c:7048


It looks to me like it is this thread that is causing the problem.  Maybe we are never escaping the loop in decode_open_type().  Are you using fax?

By: aleksey2000 (aleksey2000) 2010-05-19 12:27:31

Yes. Both t.38 and not. This server for transit different kind of traffic.

By: David Vossel (dvossel) 2010-05-19 12:41:51

Ah, got it.  I uploaded a patch, please test it.

By: Digium Subversion (svnbot) 2010-05-19 15:30:33

Repository: asterisk
Revision: 264400

U   trunk/main/udptl.c

------------------------------------------------------------------------
r264400 | dvossel | 2010-05-19 15:30:33 -0500 (Wed, 19 May 2010) | 11 lines

fixes infinite loop during udptl.c's decode_open_type

When decode_length returns the length there is a check to see if that
length is negative, if so the decode loop breaks as this means the
limit has been reached.  The problem here is that length is an
unsigned int, so length can never be negative.  This resulted in
an infinite loop.

(issue ASTERISK-16111)


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=264400

By: Digium Subversion (svnbot) 2010-05-19 15:31:35

Repository: asterisk
Revision: 264405

_U  branches/1.6.2/
U   branches/1.6.2/main/udptl.c

------------------------------------------------------------------------
r264405 | dvossel | 2010-05-19 15:31:35 -0500 (Wed, 19 May 2010) | 17 lines

Merged revisions 264400 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r264400 | dvossel | 2010-05-19 15:30:33 -0500 (Wed, 19 May 2010) | 11 lines
 
 fixes infinite loop during udptl.c's decode_open_type
 
 When decode_length returns the length there is a check to see if that
 length is negative, if so the decode loop breaks as this means the
 limit has been reached.  The problem here is that length is an
 unsigned int, so length can never be negative.  This resulted in
 an infinite loop.
 
 (issue ASTERISK-16111)
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=264405

By: David Vossel (dvossel) 2010-05-19 15:36:17

That patch was correct, so I went ahead and committed it.  I'll leave this open for feedback for a while just to make sure it resolves the issue you are seeing.

By: aleksey2000 (aleksey2000) 2010-05-20 14:45:36

Results: 24 hours uptime, > 20k calls served, no deadlocks. Looks like problem solved.

By: David Vossel (dvossel) 2010-05-20 14:47:46

F7ESH!