[Home]

Summary:ASTERISK-12882: [patch] Asterisk sleeps forever in poll() when terminating both SIP endpoints of a bridged channel
Reporter:fhackenberger (fhackenberger)Labels:
Date Opened:2008-10-13 11:27:26Date Closed:2011-06-07 14:02:47
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/Channels
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bridge_indefinite_poll
Description:As you can see from the following backtrace, the thread bridging the two SIP channels is within poll(), which is called with -1 for the timeout parameter. The file descriptors it watches are the associated UDP sockets (which never get any IO again, because the endpoints are terminated), as well as the alert and timing pipe. On my system this poll never returns (waited more than 30 minutes).

Thread 3 (Thread 0xb547bb90 (LWP 10734)):
#0  0xb7ef3410 in __kernel_vsyscall ()
#1  0xb7dc7c07 in poll () from /lib/tls/i686/cmov/libc.so.6
#2  0x08086f48 in ast_waitfor_nandfds (c=0xb54743f0, n=2, fds=0x0, nfds=0, exception=0x0, outfd=0x0, ms=0xb5474404) at channel.c:2026
#3  0x0808a6bd in ast_channel_bridge (c0=0xb5327e48, c1=0xb58a4be0, config=0xb5474c08, fo=0xb54744cc, rc=0xb54744c8) at channel.c:2088
#4  0xb73428cd in ast_bridge_call (chan=0xb5327e48, peer=0xb58a4be0, config=0xb5474c08) at res_features.c:1483
ASTERISK-1  0xb6cf9410 in try_calling (qe=0xb5476888, options=<value optimized out>, announceoverride=0xb54767f8 "", url=0xb54767f7 "", tries=0xb5476880,
   noption=0xb547687c, agi=0x0) at app_queue.c:3077
ASTERISK-2  0xb6cfd41f in queue_exec (chan=0xb5327e48, data=0xb5476b08) at app_queue.c:3940
ASTERISK-3  0x080c376e in pbx_exec (c=0xb5327e48, app=0x833d6e0, data=0xb5476b08) at pbx.c:532
ASTERISK-4  0xb6d82741 in realtime_exec (chan=0xb5327e48, context=0x835ee13 "queues", exten=0xb5328018 "9001", priority=5, callerid=0xb53bc5f0 "sipp",
   data=0x83640c1 "@") at pbx_realtime.c:216
ASTERISK-5  0x080cb897 in pbx_extension_helper (c=0xb5327e48, con=0x0, context=0x835ee13 "queues", exten=0xb5328018 "9001", priority=5, label=0x0,
   callerid=0xb53bc5f0 "sipp", action=E_SPAWN) at pbx.c:1862
ASTERISK-6 0x080cd5a9 in __ast_pbx_run (c=0xb5327e48) at pbx.c:2306
ASTERISK-7 0x080ce46e in pbx_thread (data=0xb5327e48) at pbx.c:2623
ASTERISK-8 0x080fe4bb in dummy_start (data=0xb5304090) at utils.c:852
ASTERISK-9 0xb7eb74fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-10 0xb7dd1e5e in clone () from /lib/tls/i686/cmov/libc.so.6

Here is the full backtrace of all threads:
http://pastebin.com/fa7d0582

The attached workaround patch fixes this problem. It makes sure that the parameter bridge_end.tv_sec passed to ast_generic_bridge is always set to !=0. If that is not the case, ast_generic_bridge passes -1 to poll().

****** ADDITIONAL INFORMATION ******

I tested this on 1.4.17, but the relevant code is unchanged as of #141156
Comments:By: Leif Madsen (lmadsen) 2008-10-14 11:12:41

Thanks for the patch! I've given you karma already, but we're currently waiting for your license to be approved before we can move forward. Hopefully that only takes a day or so.

Thanks again for the contribution and hopefully it is reviewed soon.

By: fhackenberger (fhackenberger) 2008-10-15 18:31:43

Sorry, forget the patch. I misread the code. I'll prepare another one.

By: Leif Madsen (lmadsen) 2008-10-21 13:59:02

OK, should I delete what is there currently then?

By: fhackenberger (fhackenberger) 2008-10-22 01:02:14

Yes, the patch currently attached to this bug should not be applied. It may however serve as a hint to someone who would like to work on this issue.

By: Leif Madsen (lmadsen) 2008-10-22 09:42:49

OK, then I will leave it attached as a reference point. Thanks!

By: Leif Madsen (lmadsen) 2009-02-02 15:31:46.000-0600

fhackenberger:  any chance of getting that updated patch?

By: fhackenberger (fhackenberger) 2009-02-02 16:49:52.000-0600

It's not yet written :-)
I'm quite busy at the moment. But I will of course share the patch as soon as it is done. I'd be glad to mentor someone writing this patch BTW!

By: Joshua C. Colp (jcolp) 2009-02-10 13:55:37.000-0600

I don't believe this is actually the solution to the problem here. We need to figure out why either chan_sip did not queue a hangup frame and write to the alert pipe or why that was not acted upon properly in the bridging core.

By: Joshua C. Colp (jcolp) 2009-03-12 12:28:56

After looking at this issue closer and examining things I've come to the conclusion that this was an issue with handling of the SIP message causing the channel to never get hung up, thus causing this issue. I've looked through the commits and found numerous ones that fix this issue. I'm closing this out because I strongly feel that these resolved the issue some time ago.

By: fhackenberger (fhackenberger) 2009-03-31 03:03:34

file: Could you please provide revision numbers, I'd like to backport the fix for our asterisk system.

By: Joshua C. Colp (jcolp) 2009-03-31 07:43:00

167620 is one potential one, but I would feel more comfortable if you tested the latest version to confirm this instead of having me grab all the possible revisions that may not work.

If it still isn't really fixed then we need to figure out what is different about your environment because there have been no other reports of this happening.

By: fhackenberger (fhackenberger) 2009-03-31 08:17:55

The report is for the 1.4 branch, I'll try to reproduce with 1.4 HEAD. That may take a few weeks though, so please be patient.



By: Joshua C. Colp (jcolp) 2009-04-27 09:33:08

I'll be glad to look at this once you are able to get it up to date and reproduced.