[Home]

Summary:ASTERISK-14787: [patch] segfault when transferring a queue caller
Reporter:fhackenberger (fhackenberger)Labels:
Date Opened:2009-09-08 03:55:33Date Closed:2009-11-09 16:23:25.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt.txt
( 1) bt2.txt
( 2) bt3.txt
( 3) digium_bug_0015848
Description:The segfault happens after transferring (SIP re-invite) a queue call.

#0  0xb7d5da04 in __pthread_mutex_unlock_usercnt () from /lib/tls/i686/cmov/libpthread.so.0
#0  0xb7d5da04 in __pthread_mutex_unlock_usercnt () from /lib/tls/i686/cmov/libpthread.so.0
#1  0xb61ef779 in sip_hangup (ast=0xb3902a20) at /tmp/buildd/asterisk-1.6.1.0~dfsg/include/asterisk/lock.h:1708
#2  0x08087eb9 in ast_do_masquerade (original=0x8722db8) at channel.c:4406
#3  0x0808b589 in ast_waitfor_nandfds (c=0xb41fa200, n=2, fds=0x0, nfds=0, exception=0x0, outfd=0x0, ms=0xb41fa214) at channel.c:2004
#4  0x0808bb4f in ast_waitfor_n (c=0xb41fa200, n=2, ms=0xb41fa214) at channel.c:2318
ASTERISK-1  0x0808e2d4 in ast_channel_bridge (c0=0x8722db8, c1=0x87407f8, config=0xb41fa61c, fo=0xb41fa2ec, rc=0xb41fa2e8) at channel.c:4733
ASTERISK-2  0x080b06b6 in ast_bridge_call (chan=0x8722db8, peer=0x87407f8, config=0xb41fa61c) at features.c:2538
ASTERISK-3  0xb5bfa2e9 in dial_exec_full (chan=0x8722db8, data=0xb41fd138, peerflags=0xb41fae90, continue_exec=0x0) at app_dial.c:1986
ASTERISK-4  0xb5bfc174 in dial_exec (chan=0x8722db8, data=0xb41fd138) at app_dial.c:2060
ASTERISK-5  0x080ddab5 in pbx_exec (c=0x8722db8, app=0x82b0d00, data=0xb41fd138) at pbx.c:960
ASTERISK-6 0x080e4632 in pbx_extension_helper (c=0x8722db8, con=0x0, context=0x8723028 "agents", exten=0x8723078 "972605572372450", priority=9,
   label=0x0, callerid=0x848d910 "0314224713906", action=E_SPAWN, found=0xb41ff244, combined_find_spawn=1) at pbx.c:3201
ASTERISK-7 0x080e6ecc in __ast_pbx_run (c=0x8722db8, args=0x0) at pbx.c:3745
ASTERISK-8 0x080e89d0 in pbx_thread (data=0x8722db8) at pbx.c:4032
ASTERISK-9 0x0811ca47 in dummy_start (data=0x86be7f8) at utils.c:968
ASTERISK-10 0xb7d5a4fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-11 0xb6f7be5e in clone () from /lib/tls/i686/cmov/libc.so.6


****** ADDITIONAL INFORMATION ******

The asterisk version I'm using is r209515 from the 1.6.1 branch. I did not use the 1.6.1.2 version (which was the newest when I set up the system), because I needed the T.38 fixes which went in just after 1.6.1.2 was released. If necessary, I can recompile 1.6.1.6 and try to reproduce. But that would be quite some effort on the production system, especially as the segfault is not reproducible every time.
Comments:By: fhackenberger (fhackenberger) 2009-09-08 10:16:29

added 'bt; thread apply all bt' output

By: Leif Madsen (lmadsen) 2009-09-15 12:11:06

I'm seeing in your backtrace <value optimized out> which is indicative of not having DONT_OPTIMIZE enabled in the Compiler Flags section of menuselect. If you're able to recompile your current version of Asterisk with this flag enabled, then the backtrace will typically be a lot more useful for the developers.

Thanks!

By: fhackenberger (fhackenberger) 2009-09-30 04:42:47

Attached a new backtrace with a DONT_OPTIMIZE build.

By: fhackenberger (fhackenberger) 2009-09-30 07:29:19

I added another backtrace of the same segfault to aid with analysis.

By: Leif Madsen (lmadsen) 2009-09-30 10:39:20

Thanks!

By: triccyx (triccyx) 2009-10-10 03:47:09

Same problem with 1.6.1.6

By: triccyx (triccyx) 2009-10-10 03:48:39

#0  0xb7ce6a04 in __pthread_mutex_unlock_usercnt () from /lib/tls/i686/cmov/libpthread.so.0
(gdb) bt
#0  0xb7ce6a04 in __pthread_mutex_unlock_usercnt () from /lib/tls/i686/cmov/libpthread.so.0
#1  0xb7ebeee6 in pthread_mutex_unlock () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7514228 in ast_mutex_unlock (pmutex=0x98) at /usr/src/asterisk_1.6.1.6/include/asterisk/lock.h:1708
#3  0xb752266a in sip_hangup (ast=0x853fd20) at chan_sip.c:5357
#4  0x08094c7d in ast_do_masquerade (original=0x84de880) at channel.c:4408
ASTERISK-1  0x0808dcd1 in ast_waitfor_nandfds (c=0xb6a14558, n=2, fds=0x0, nfds=0, exception=0x0, outfd=0x0, ms=0xb6a14554) at channel.c:2006
ASTERISK-2  0x0808e47c in ast_waitfor_n (c=0xb6a14558, n=2, ms=0xb6a14554) at channel.c:2320
ASTERISK-3  0x08095f58 in ast_generic_bridge (c0=0x84de880, c1=0x85c5658, config=0xb6a157e4, fo=0xb6a14a8c, rc=0xb6a14a88, bridge_end={tv_sec = 0, tv_usec = 0}) at channel.c:4735
ASTERISK-4  0x08097a26 in ast_channel_bridge (c0=0x84de880, c1=0x85c5658, config=0xb6a157e4, fo=0xb6a14a8c, rc=0xb6a14a88) at channel.c:5150
ASTERISK-5  0x080bcebf in ast_bridge_call (chan=0x84de880, peer=0x85c5658, config=0xb6a157e4) at features.c:2538
ASTERISK-6 0xb760ea5a in dial_exec_full (chan=0x84de880, data=0xb6a17cdc, peerflags=0xb6a15b68, continue_exec=0x0) at app_dial.c:1986
ASTERISK-7 0xb760f1da in dial_exec (chan=0x84de880, data=0xb6a17cdc) at app_dial.c:2060
ASTERISK-8 0x080e869c in pbx_exec (c=0x84de880, app=0x8247af8, data=0xb6a17cdc) at pbx.c:960
ASTERISK-9 0x080eee1c in pbx_extension_helper (c=0x84de880, con=0x0, context=0x84deaf0 "macro-call_generic_fork_tks", exten=0x84deb40 "s", priority=1, label=0x0,
   callerid=0x8598768 "5020000000697", action=E_SPAWN, found=0xb6a1a460, combined_find_spawn=1) at pbx.c:3201
ASTERISK-10 0x080f0071 in ast_spawn_extension (c=0x84de880, context=0x84deaf0 "macro-call_generic_fork_tks", exten=0x84deb40 "s", priority=1, callerid=0x8598768 "5020000000697",
   found=0xb6a1a460, combined_find_spawn=1) at pbx.c:3656
ASTERISK-11 0xb7503486 in _macro_exec (chan=0x84de880, data=0xb6a1ce8c, exclusive=0) at app_macro.c:342
ASTERISK-12 0xb750458f in macro_exec (chan=0x84de880, data=0xb6a1ce8c) at app_macro.c:505
ASTERISK-13 0x080e869c in pbx_exec (c=0x84de880, app=0x82289c8, data=0xb6a1ce8c) at pbx.c:960
ASTERISK-14 0x080eee1c in pbx_extension_helper (c=0x84de880, con=0x0, context=0x84deaf0 "macro-call_generic_fork_tks", exten=0x84deb40 "s", priority=2, label=0x0,
   callerid=0x8598768 "5020000000697", action=E_SPAWN, found=0xb6a1f2f0, combined_find_spawn=1) at pbx.c:3201
ASTERISK-15 0x080f0071 in ast_spawn_extension (c=0x84de880, context=0x84deaf0 "macro-call_generic_fork_tks", exten=0x84deb40 "s", priority=2, callerid=0x8598768 "5020000000697",
   found=0xb6a1f2f0, combined_find_spawn=1) at pbx.c:3656
ASTERISK-16 0x080f070c in __ast_pbx_run (c=0x84de880, args=0x0) at pbx.c:3745
ASTERISK-17 0x080f1b4c in pbx_thread (data=0x84de880) at pbx.c:4032
ASTERISK-18 0x0813db9e in dummy_start (data=0x81ec2a0) at utils.c:968
ASTERISK-19 0xb7ce34fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-20 0xb7eb1e5e in clone () from /lib/tls/i686/cmov/libc.so.6

By: fhackenberger (fhackenberger) 2009-10-14 04:55:27

I should mention that we use attended transfer.

By: Thomas Schaefer (toms) 2009-10-14 05:06:27

I have the same problem with Asterisk 1.6.1.6 using attended transfer via SIP-Phones (snom m3)

kern.log:Oct  9 17:28:57 pbx kernel: [90654.872004] asterisk[4828]: segfault at a4 ip b7c0efd4 sp b68b8bd4 error 4 in libpthread-2.7.so[b7c06000+15000]
kern.log:Oct 12 17:19:45 pbx kernel: [349302.452573] asterisk[7202]: segfault at a4 ip b7ca0fd4 sp b6af7fc4 error 4 in libpthread-2.7.so[b7c98000+15000]
kern.log:Oct 13 09:15:39 pbx kernel: [22498.700417] asterisk[3445]: segfault at a4 ip b7c72fd4 sp b6c77fc4 error 4 in libpthread-2.7.so[b7c6a000+15000]

By: fhackenberger (fhackenberger) 2009-10-14 05:12:09

I'm going to test the attached patch digium_bug_0015848

By: fhackenberger (fhackenberger) 2009-10-16 10:24:12

No segfault for the last two days, looks good.

By: fhackenberger (fhackenberger) 2009-10-19 02:42:14

Still up and running, I'd consider the patch ready for inclusion, although a second opinion would be great. TomS or triccyx, would you be willing to test the patch?

By: Thomas Schaefer (toms) 2009-10-24 16:53:52

fhackenberger, thanks for the patch.
I installed the patch today on our 1.6.1.6-System.
Without the patch our last crashes where 10 days apart (1x Oct 13, 3x Oct 23)
So lets wait and see.

By: Michael J. Miller (shin-shoryuken) 2009-10-26 13:26:38

Same issue as documented above, with the following system:

OS: Debian Lenny (kernel 2.6.26-2-amd64)
Asterisk ver: 1.6.1.6
Sip hardware: Polycom 601 (firmware 2.2.2 and 3.1.2)

One or two segfaults a day, and I could cause them on command by making attended transfers back and forth to different queues/phones; since installing fhackenberger's patch on 2009-10-23, I haven't seen a single crash.  If you need a backtrace to confirm that it was the same issue, I can provide one.

By: Thomas Schaefer (toms) 2009-10-27 06:12:18

After 3 days with chan_sip-patch we had no more "transfer-crashes" yet (1.6.1.6 on lenny).

Today the asterisk didn't accept incomming SIP-connections anymore.
I hadn't the chance to check for cause of the problem (its running in an production environment so i had to reboot immediately, sorry).
So this could have been an unfortunate coincidence and may had nothing to do with the patch. But I thought its better to mention it.

By: Atis Lezdins (atis) 2009-10-27 06:38:06

TomS: create a script if it happens next time:

asterisk -rx "core show locks" > locks.txt
killall -6 asterisk

also make sure that asterisk is compiled with DONT_OPTIMIZE and DEBUG_THREADS

So, the "kill -6" will make asterisk dump to core and if necessary you may get some debug out of it.

By: Thomas Schaefer (toms) 2009-10-29 10:05:14

Today we had the problem again (no sip-connections). I wasn't available for debugging, so the pbx was restarted manually.
I'm sorry, but i have to abort the test. That installation isn't suitable for testing.

I just saw: after applying the patch, we had 3 segfaults in the last 3 days (none of the tranfer-segfault with libpthread):
kern.log:Oct 27 13:53:35 pbx kernel: [ 9552.763000] asterisk[3488]: segfault at 56 ip b7c84c15 sp b6c83c6c error 6 in libc-2.7.so[b7c0d000+155000]
kern.log:Oct 28 10:54:18 pbx kernel: [28417.334776] asterisk[3657]: segfault at 703a7069 ip 0808ebd0 sp b6ce5ff0 error 4 in asterisk[8048000+156000]
kern.log:Oct 29 12:59:51 pbx kernel: [35951.025626] asterisk[3564]: segfault at 430a0d38 ip 0809d210 sp b6cabbc8 error 4 in asterisk[8048000+156000]

I have to downgrade to asterisk 1.6.0.15 on this installation, sorry.

By: fhackenberger (fhackenberger) 2009-10-30 06:02:55

TomS: Judging from you comments, it seems that you upgraded to 1.6.1.6 in order to apply the patch. Unfortunately that makes your test results unreliable, because that upgrade pulled in a lot of other, unrelated, changes. Have you been unable to apply it on your current 1.6.0.15 installation? Can I help you with that?

By: Thomas Schaefer (toms) 2009-11-03 01:41:54.000-0600

fhackenberger: Our new problem with the lack of incomming SIP connections was not related with your patch. After downgrading from 1.6.1.6 (with your patch) to 1.6.0.15 and then to 1.6.0.13 (because of hint-issues with .15) I didnt apply your patch since there is no known issue with transfers on the 1.6.0-tree.
The new problem with no sip connections reapeard on the downgraded system too.
Yesterday I replaced the asterisk-hardware and switch and downgraded to 1.4.26.2 to get hopefully finally rid of all problems.

I would have liked to continue testing your patch which certenly changed the behavier of 1.6.1.6 on transfers in a positive way, but the productive system was to important for longer tests.
My other (testing) platform with 1.6.1.4 (without patch) hadn't any problems with transfers yet but on the other hand we use the transfer-function not very often there.

I just want to repeat: my problem with the missing incomming sip connections was not related to your patch.

By: Leif Madsen (lmadsen) 2009-11-03 09:09:30.000-0600

Thanks for clearing that up. Any other issues you have should be reported as separate bugs.

To clear this up for the developers, this issue is ready for review, and potentially ready for merging.

By: Leif Madsen (lmadsen) 2009-11-04 08:35:09.000-0600

Hmm, I just saw this same thing on the weekend. Perhaps I should try to reproduce the issue and test this patch today...

By: Alec Davis (alecdavis) 2009-11-05 19:54:16.000-0600

We may be seeing similar here. Twice today. No debug yet, just waiting for next crash, have recompiled with DON'T OPTIMIZE.

Asterisk SVN-branch-1.6.1-r226977

It's not reproducible, I've tried!
I believe the transfer is an attended trasnfer as you see the incomimg line (DAHDI/1-1) start music on hold, then the iaxtrunk, then they both stop music then seems to be all over.

console output is all I have so far.
<pre>!! Unknown IE 50 (cs5, Unknown Information Element)
   -- Executing [9340@incoming:1] NoOp("DAHDI/1-1", "Incoming call callingsubaddr= calledsubaddr=") in new stack
   -- Executing [9340@incoming:2] GotoIf("DAHDI/1-1", "1?internal,8801,1") in new stack
   -- Goto (internal,8801,1)
   -- Executing [8801@internal:1] NoOp("DAHDI/1-1", "queue 2 dialled callingsubaddr= calledsubaddr=") in new stack
   -- Executing [8801@internal:2] NoOp("DAHDI/1-1", "Queue_member logged=2 free=2") in new stack
   -- Executing [8801@internal:3] Ringing("DAHDI/1-1", "") in new stack
   -- Accepting call from '95551234' to '9340' on channel 0/1, span 1
   -- Executing [8801@internal:4] GotoIf("DAHDI/1-1", "0?nightservice") in new stack
   -- Executing [8801@internal:5] GotoIf("DAHDI/1-1", "1?noannounce") in new stack
   -- Goto (internal,8801,11)
   -- Executing [8801@internal:11] Queue("DAHDI/1-1", "queue2,rhH,,,90") in new stack
 == Using SIP RTP CoS mark 5
 == Using SIP RTP CoS mark 5
 == Extension Changed 8820[internal] new state Ringing for Notify User 9001
 == Extension Changed 8820[internal] new state Ringing for Notify User 9000
   -- SIP/9015-000003ef is ringing
   -- SIP/9001-000003f0 is ringing
 == Extension Changed 8820[internal] new state InUse for Notify User 9001
   -- SIP/9001-000003f0 answered DAHDI/1-1
 == Extension Changed 8820[internal] new state InUse for Notify User 9000
   -- Started music on hold, class 'default', on DAHDI/1-1
 == Using SIP RTP CoS mark 5
   -- Executing [5053@internal:1] Dial("SIP/9001-000003f1", "IAX2/siteb/5053,,r") in new stack
   -- Called siteb/5053
   -- Call accepted by 192.168.x.yy (format alaw)
   -- Format for call is alaw
   -- IAX2/siteb-21016 is proceeding passing it to SIP/9001-000003f1
   -- IAX2/siteb-21016 is ringing
   -- IAX2/siteb-21016 answered SIP/9001-000003f1
   -- Started music on hold, class 'default', on IAX2/siteb-21016
   -- Stopped music on hold on DAHDI/1-1
   -- Stopped music on hold on IAX2/siteb-21016
sitea*CLI>
Disconnected from Asterisk server
Executing last minute cleanups
Asterisk cleanly ending (0).</pre>



By: Digium Subversion (svnbot) 2009-11-09 16:23:22.000-0600

Repository: asterisk
Revision: 229012

U   branches/1.6.2/channels/chan_sip.c

------------------------------------------------------------------------
r229012 | dvossel | 2009-11-09 16:23:22 -0600 (Mon, 09 Nov 2009) | 12 lines

fixes segfault when transferring a queue caller

In sip_hangup we attempted to lock p->owner after we set it to NULL.
Thanks to fhackenberger for reporting the issue and submitting a patch.

(closes issue ASTERISK-14787)
Reported by: fhackenberger
Patches:
     digium_bug_0015848 uploaded by fhackenberger (license 592)
Tested by: fhackenberger, lmadsen, TomS, shin-shoryuken, dvossel


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=229012