[Home]

Summary:ASTERISK-17528: Lock on Asterisk 1.6.2.17-rc2
Reporter:Carlos Oliva (coliva)Labels:
Date Opened:2011-03-09 07:56:17.000-0600Date Closed:2011-05-25 03:51:57
Priority:MajorRegression?No
Status:Closed/CompleteComponents:PBX/General
Versions:1.6.2.17 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 14032011_092102-asterisk_showlocks.txt
( 1) 14032011_145202-asterisk_showlocks.txt
( 2) 14032011_152801-asterisk_showlocks.txt
( 3) asterisk_backtrace-14032011_101813.txt
( 4) asterisk_backtrace-17032011_161925.txt
( 5) asterisk_showlocks-14032011_101813.txt
( 6) asterisk_showlocks-17032011_161925.txt
( 7) asterisk-core-show-locks.txt
( 8) bt_threads_asterisk.txt
( 9) core_show_locks_04_04_2011.txt
(10) core_show_locks_14_03_2011.txt
(11) core_show_locks_31_03_2011.txt
(12) show_locks_10_03_2011.txt
(13) thread_appy_all_bt_10_03_2011.txt
Description:At random times, asterisk Stops to process sip packets, all udp packets are queued as we can see in:

netstat -anp |grep 5060
udp   124904     0 0.0.0.0:5060     0.0.0.0:*     15732/asterisk

It seems to be a lock. We had compiled asterisk with thread debugging, I will put the info of "core show locks" in additional information, and will attach a gdb backtracing of all threads.

After this lock, asterisk must be killed to recover the sip channel, we tried to reload sip and all modules using asteriskcli, but nothing works.
The S.O. is Debian Squeeze, but it seems to be the same in debian Lenny. I think the same problem is in Asterisk 1.6.2.14, but I can not confirm this because the machine with this version has no debuging enabled

****** ADDITIONAL INFORMATION ******

Connected to Asterisk 1.6.2.17-rc2 currently running on ivr7 (pid = 15732)
Verbosity is at least 5
ivr7*CLI> core show locks

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 140290879936256 (do_monitor           started at [22809] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 22259 handle_request_do &netlock 0x7f97e780d420 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4fab1e]
       /usr/lib/asterisk/modules/chan_sip.so(+0x75093) [0x7f97e75df093]
       /usr/lib/asterisk/modules/chan_sip.so(+0x77633) [0x7f97e75e1633]
       /usr/sbin/asterisk(ast_io_wait+0x7a) [0x4f0e2a]
       /usr/lib/asterisk/modules/chan_sip.so(+0x86819) [0x7f97e75f0819]
       /usr/sbin/asterisk() [0x58f23d]
       /lib/libpthread.so.0(+0x68ba) [0x7f980653e8ba]
       /lib/libc.so.6(clone+0x6d) [0x7f9806d7702d]
=== ---> Lock #1 (chan_sip.c): MUTEX 6734 sip_new i 0x1f5afb0 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4fab1e]
       /usr/sbin/asterisk(_ao2_lock+0x247) [0x444247]
       /usr/lib/asterisk/modules/chan_sip.so(+0x1c855) [0x7f97e7586855]
       /usr/lib/asterisk/modules/chan_sip.so(+0x72219) [0x7f97e75dc219]
       /usr/lib/asterisk/modules/chan_sip.so(+0x7374f) [0x7f97e75dd74f]
       /usr/lib/asterisk/modules/chan_sip.so(+0x74d2f) [0x7f97e75ded2f]
       /usr/lib/asterisk/modules/chan_sip.so(+0x77633) [0x7f97e75e1633]
       /usr/sbin/asterisk(ast_io_wait+0x7a) [0x4f0e2a]
       /usr/lib/asterisk/modules/chan_sip.so(+0x86819) [0x7f97e75f0819]
       /usr/sbin/asterisk() [0x58f23d]
       /lib/libpthread.so.0(+0x68ba) [0x7f980653e8ba]
       /lib/libc.so.6(clone+0x6d) [0x7f9806d7702d]
=== ---> Lock #2 (channel.c): RDLOCK 1355 ast_channel_search_locked &(&channels)->lock 0x81df30 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4fab1e]
       /usr/sbin/asterisk(ast_channel_search_locked+0x2be) [0x47695e]
       /usr/sbin/asterisk(ast_pickup_call+0x18) [0x4bfd28]
       /usr/lib/asterisk/modules/chan_sip.so(+0x70e83) [0x7f97e75dae83]
       /usr/lib/asterisk/modules/chan_sip.so(+0x7374f) [0x7f97e75dd74f]
       /usr/lib/asterisk/modules/chan_sip.so(+0x74d2f) [0x7f97e75ded2f]
       /usr/lib/asterisk/modules/chan_sip.so(+0x77633) [0x7f97e75e1633]
       /usr/sbin/asterisk(ast_io_wait+0x7a) [0x4f0e2a]
       /usr/lib/asterisk/modules/chan_sip.so(+0x86819) [0x7f97e75f0819]
       /usr/sbin/asterisk() [0x58f23d]
       /lib/libpthread.so.0(+0x68ba) [0x7f980653e8ba]
       /lib/libc.so.6(clone+0x6d) [0x7f9806d7702d]
=== ---> Waiting for Lock #3 (channel.c): MUTEX 1357 ast_channel_search_locked (channel lock) 0x7f97d4399430 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4fab1e]
       /usr/sbin/asterisk() [0x470a46]
       /usr/sbin/asterisk(ast_channel_search_locked+0xd0) [0x476770]
       /usr/sbin/asterisk(ast_pickup_call+0x18) [0x4bfd28]
       /usr/lib/asterisk/modules/chan_sip.so(+0x70e83) [0x7f97e75dae83]
       /usr/lib/asterisk/modules/chan_sip.so(+0x7374f) [0x7f97e75dd74f]
       /usr/lib/asterisk/modules/chan_sip.so(+0x74d2f) [0x7f97e75ded2f]
       /usr/lib/asterisk/modules/chan_sip.so(+0x77633) [0x7f97e75e1633]
       /usr/sbin/asterisk(ast_io_wait+0x7a) [0x4f0e2a]
       /usr/lib/asterisk/modules/chan_sip.so(+0x86819) [0x7f97e75f0819]
       /usr/sbin/asterisk() [0x58f23d]
       /lib/libpthread.so.0(+0x68ba) [0x7f980653e8ba]
       /lib/libc.so.6(clone+0x6d) [0x7f9806d7702d]
=== --- ---> Locked Here: channel.c line 2744 (__ast_read)
=== -------------------------------------------------------------------
===
=== Thread ID: 140290298709760 (iax2_process_thread  started at [11765] chan_iax2.c start_network_thread())
=== ---> Waiting for Lock #0 (channel.c): WRLOCK 949 __ast_channel_alloc_ap &(&channels)->lock 0x81df30 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4fab1e]
       /usr/sbin/asterisk() [0x48146d]
       /usr/sbin/asterisk(__ast_channel_alloc+0xeb) [0x48209b]
       /usr/lib/asterisk/modules/chan_iax2.so(+0x17e7c) [0x7f97f192ce7c]
       /usr/lib/asterisk/modules/chan_iax2.so(+0x48a6e) [0x7f97f195da6e]
       /usr/lib/asterisk/modules/chan_iax2.so(+0x4a6a2) [0x7f97f195f6a2]
       /usr/sbin/asterisk() [0x58f23d]
       /lib/libpthread.so.0(+0x68ba) [0x7f980653e8ba]
       /lib/libc.so.6(clone+0x6d) [0x7f9806d7702d]
=== -------------------------------------------------------------------
===
=== Thread ID: 140290298201856 (iax2_process_thread  started at [11765] chan_iax2.c start_network_thread())
=== ---> Waiting for Lock #0 (channel.c): WRLOCK 949 __ast_channel_alloc_ap &(&channels)->lock 0x81df30 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4fab1e]
       /usr/sbin/asterisk() [0x48146d]
       /usr/sbin/asterisk(__ast_channel_alloc+0xeb) [0x48209b]
       /usr/lib/asterisk/modules/chan_iax2.so(+0x17e7c) [0x7f97f192ce7c]
       /usr/lib/asterisk/modules/chan_iax2.so(+0x48a6e) [0x7f97f195da6e]
       /usr/lib/asterisk/modules/chan_iax2.so(+0x4a6a2) [0x7f97f195f6a2]
       /usr/sbin/asterisk() [0x58f23d]
       /lib/libpthread.so.0(+0x68ba) [0x7f980653e8ba]
       /lib/libc.so.6(clone+0x6d) [0x7f9806d7702d]
=== -------------------------------------------------------------------
===
=== Thread ID: 140290286012160 (pbx_thread           started at [ 4627] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 2744 __ast_read (channel lock) 0x7f97d4399430 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4fab1e]
       /usr/sbin/asterisk() [0x47235e]
       /usr/sbin/asterisk() [0x47a8ba]
       /usr/lib/asterisk/modules/app_dial.so(+0x90d1) [0x7f97ef6930d1]
       /usr/lib/asterisk/modules/app_dial.so(+0xc716) [0x7f97ef696716]
       /usr/sbin/asterisk(pbx_exec+0x112) [0x51f1c2]
       /usr/sbin/asterisk() [0x535185]
       /usr/sbin/asterisk() [0x53734b]
       /usr/sbin/asterisk() [0x53854f]
       /usr/sbin/asterisk() [0x58f23d]
       /lib/libpthread.so.0(+0x68ba) [0x7f980653e8ba]
       /lib/libc.so.6(clone+0x6d) [0x7f9806d7702d]
=== -------------------------------------------------------------------
===
=== Thread ID: 140290284996352 (pbx_thread           started at [ 4627] pbx.c ast_pbx_start())
=== ---> Lock #0 (app_queue.c): MUTEX 3708 try_calling queues 0x7f97de3bca10 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4fab1e]
       /usr/sbin/asterisk(_ao2_lock+0x247) [0x444247]
       /usr/lib/asterisk/modules/app_queue.so(+0x1369b) [0x7f97e374469b]
       /usr/lib/asterisk/modules/app_queue.so(+0x1ac7a) [0x7f97e374bc7a]
       /usr/sbin/asterisk(pbx_exec+0x112) [0x51f1c2]
       /usr/sbin/asterisk() [0x535185]
       /usr/sbin/asterisk() [0x53734b]
       /usr/sbin/asterisk() [0x53854f]
       /usr/sbin/asterisk() [0x58f23d]
       /lib/libpthread.so.0(+0x68ba) [0x7f980653e8ba]
       /lib/libc.so.6(clone+0x6d) [0x7f9806d7702d]
=== ---> Waiting for Lock #1 (channel.c): WRLOCK 949 __ast_channel_alloc_ap &(&channels)->lock 0x81df30 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4fab1e]
       /usr/sbin/asterisk() [0x48146d]
       /usr/sbin/asterisk(__ast_channel_alloc+0xeb) [0x48209b]
       /usr/lib/asterisk/modules/chan_iax2.so(+0x17e7c) [0x7f97f192ce7c]
       /usr/lib/asterisk/modules/chan_iax2.so(+0x48a6e) [0x7f97f195da6e]
       /usr/lib/asterisk/modules/chan_iax2.so(+0x4a6a2) [0x7f97f195f6a2]
       /usr/sbin/asterisk() [0x58f23d]
       /lib/libpthread.so.0(+0x68ba) [0x7f980653e8ba]
       /lib/libc.so.6(clone+0x6d) [0x7f9806d7702d]
=== -------------------------------------------------------------------
===
=======================================================================
Comments:By: Alastair Battrick (ajbattrick) 2011-03-10 02:49:34.000-0600

I am having the exact same issue in Centos 5.something using Asterisk 1.6.2.17.

My core show locks is attached

By: Freddi Hansen (freddi_fonet) 2011-03-10 07:22:59.000-0600

I had the same problem with 2-4 deadlocks per day, installed the patch from issue 0018497 and haven't had deadlocks for more than 2 weeks now.

By: Carlos Oliva (coliva) 2011-03-10 08:34:35.000-0600

Thanks Freddi_Fonet, but this patch seems to be only for TLS, and I only use UDP packets. Both are locks, but I think the isues are not related.

By: wufan (wufan) 2011-03-10 08:36:43.000-0600

maybe related to my bug? 0018819

By: Carlos Oliva (coliva) 2011-03-10 08:53:14.000-0600

Maybe related Wufan, both are locks of chan_sip.c and channel.c but I don't have enough knowledge to understand a "core show locks" properly. I don't know

By: Carlos Oliva (coliva) 2011-03-10 09:30:59.000-0600

Another Lock of the same Asterisk, I Attach the core show locks and aplly_all_bt, but as I can see is very near to last I reported. Must be the same lock

By: Carlos Oliva (coliva) 2011-03-10 10:37:14.000-0600

Another Lock, the day is too hot :( The core show locks looks very near, but this time I attach too a gdb thread apply all bt, perhaps it gives a bit more info

By: Carlos Oliva (coliva) 2011-03-14 09:33:41

Another Lock today. Asking the users, they reported me that Asterisk locks when they are trying to capture a call on the same group, dialing *8. Looking in the logs, the last info I saw before the lock is:
[Mar 14 13:40:34] VERBOSE[22525] netsock.c:   == Using SIP RTP TOS bits 184
[Mar 14 13:40:34] VERBOSE[22525] netsock.c:   == Using SIP RTP CoS mark 5

In the proxy server logs (OpenSips) I can see an INVITE packet exactly at the same time:
Mar 14 13:40:34 Routing [INVITE] ruri: [sip:*8@172.16.78.37] May be related the issue to call capturing? I attach another "core show locks"

By: Sébastien Couture (sysreq) 2011-03-14 10:07:04

We've been experiencing the same issue since around 1.6.2.10. We're now using 1.6.2 SVN r308989 (from around march 3rd) and still having the same problem.

I'll include a backtrace of the running process as well as the output of a 'core show locks'.

By: JP Ramoul (jpramoul) 2011-03-14 10:10:44

Same thing here, asterisk 1.6.2.16.2 was used last week. I've upgraded to 1.6.2.17 and we still having the problem (three times just for today !). I've attached the core show locks files.

By: Sébastien Couture (sysreq) 2011-03-17 15:30:18

It happened once again today! I've attached a backtrace and output of 'core show locks'.

By: Carlos Oliva (coliva) 2011-03-31 05:22:55

Two locks again this morning. This time appears iax channels involved... I upload two core show locks too.

By: Carlos Oliva (coliva) 2011-04-04 10:15:11

Another lock today, this is maddening... I attach another core show locks.

By: Leif Madsen (lmadsen) 2011-04-05 16:39:16

There is plenty of data here and there is no need to attach the same data every time it happens. I am triaging the issue now and it will be dealt with as time and resources are available.

By: Carlos Oliva (coliva) 2011-04-06 02:18:36

Thank you very much Imadsen, I will only submit more info if somebody request it. I thought more info was better but you are right, repeat the same info all the time has no sense.

By: m.krenn (martin_kkk) 2011-05-02 10:45:45

Hi,

we encounter the described sip hangs too on 2 machines with debian squeeze.
2.6.32-5-686 ---- Asterisk 1.6.2.17.3 --> former 1.6.2.17.2 hung too.

but have a big stable installation with debian lenny with no problems...
2.6.26-2-amd64 ---- Asterisk 1.6.2.17.2

the 2 server were our first squeeze installations.

By: Sebastian Hegarty (hegars) 2011-05-02 21:06:01

Are you guys running Opterons?

I’m using Debian Squeeze v6.0.1 with kernel 2.6.32-5-686 running 1.8.3.3 on an AMD Opteron CPU and I’m getting 2-4 deadlocks a day as well (I’ve had this issue ever since 1.8 was installed on squeeze and have been stepping the versions hoping it would fix it).

I have other systems running only on exactly the same hardware running squeeze and *1.8.3.2 that don’t lock and the only difference I can see between these systems is the locking server runs app_queue and app_chanspy

By: cutzmf (cutzmf) 2011-05-03 02:24:19

Have random deadlocks (without load at night, with load at day)

* 1.6.2.17.2
Debian Squeeze 2.6.32-5-amd64
Quad-Core AMD Opteron(tm) Processor 2386 SE

Earlier i had no locks on this machine with * 1.6.2.12-rc1 (nothing changed in hw, only * version)



By: Alastair Battrick (ajbattrick) 2011-05-03 15:37:08

Not Opteron here; it is an Intel Atom:
Intel(R) Atom(TM) CPU  330   @ 1.60GHz


processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 28
model name      :          Intel(R) Atom(TM) CPU  330   @ 1.60GHz
stepping        : 2
cpu MHz         : 1600.116
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov                                                                                         pat clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni m                                                                                        onitor ds_cpl tm2 ssse3 cx16 xtpr lahf_lm
bogomips        : 3200.23
clflush size    : 64
cache_alignment : 64
address sizes   : 32 bits physical, 48 bits virtual
power management:

By: Carlos Oliva (coliva) 2011-05-04 02:51:39

My machine is a Intel too: Intel(R) Xeon(TM) CPU 3.00GHz
with Debian kernel: 2.6.32-5-amd64

I've switched to asterisk 1.6.2.13 over squeeze, It's more stable for my system. I know there are corrected bugs in newer versions, but 1.6.2.13 is more or less stable and is a nightmare for me to have a production system with 1 or 2 locks for day...
Nowadays, if somebody submits a patch I will test the release in the pre-production environment.

By: jamhed (jamhed) 2011-05-04 05:05:24

guys, here is a script to kill a deadlocked asterisk.

http://pastebin.com/JZTJWHiw

this is an extremly ugly solution, but it is better than to do it by hand.


it may need some tweaks though: packets in queue allowed (it seems 1000 is too low for a loaded system, and the poll time).



By: Carlos Oliva (coliva) 2011-05-04 05:35:56

Ummm thanks for your solution Jamhed, I use another method to do it, This is my solution:

In Asterisk I have a user that can only register, and in Nagios I've a check that uses sipsak utility to try registering in Asterisk. If register fails four times, I use nagios NRPE to send a kill -9 to asterisk and start it again. Nowadays this solution is very ugly too, because all calls in progress will hang when asterisk is killed but, as you say, is better than do it by hand...

Thanks for your help

By: jamhed (jamhed) 2011-05-04 06:23:17

> Thanks for your help

I have the same issue -- random deadlock in few days.

By: Alec Davis (alecdavis) 2011-05-19 05:20:34

This deadlock appears to be with the directed pickup code.

Please either try issue18654_v1.6.2.patch on ASTERISK-17264 or a later 1.6.2 checkout.

By: Alec Davis (alecdavis) 2011-05-25 03:51:55

from coliva's ~132899 using *8 pickup. This has been fixed in SVN, and the patch noted in ~135144 should also help if not prepared to go to 1.62svn.

Please reopen, if this is not the case.

By: Joao Carvalho (foxfire) 2011-06-27 03:41:02.967-0500

Has this issue been fixed in 1.6.2.18 ?
Because i still get random lockups,i did noticed that this mostly happens when i get an incoming call from another server using IAX2 to a queue or the local channel. When the remote call is hangup sometimes the local call is not.Then sometimes during a reload i get a lockup. I have changed the comunication between the servers to SIP to see if this does anything.