[Home]

Summary:ASTERISK-10489: Crash in ast_queue_frame
Reporter:Chase Venters (chaseventers)Labels:
Date Opened:2007-10-10 12:38:56Date Closed:2007-11-05 14:12:57.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_local
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:We've been getting this crash randomly with a frequency of one every day or two (sometimes, two in one day). We first saw this problem in 1.4.6, and we switched to 1.4.11, then 1.4.12.1 to see if the problem might have been fixed. On 1.4.12.1, we still have the problem.

This appears to be related to an issue, ASTERISK-9639694, that was reported in May, then closed in August due to a lack of further information. That reporter claims disabling compiler optimizations made his problem go away, but that sounds like he just changed the timing of his race condition.

What further information can I provide?

****** ADDITIONAL INFORMATION ******

#0  0x0807c249 in ast_queue_frame (chan=0xa109f58, fin=0xa1f18ac) at channel.c:934
934             AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list);
(gdb) bt
#0  0x0807c249 in ast_queue_frame (chan=0xa109f58, fin=0xa1f18ac) at channel.c:934
#1  0x005fd51e in local_queue_frame (p=0xa19e020, isoutbound=1, f=0xa1f18ac, us=0xa084fb8) at chan_local.c:194
#2  0x005fd80c in local_write (ast=0xa084fb8, f=0xa1f18ac) at chan_local.c:292
#3  0x0808307f in ast_write (chan=0xa084fb8, fr=0xa1f18ac) at channel.c:2958
#4  0x08087358 in ast_channel_bridge (c0=0xa084fb8, c1=0xa093a88, config=0xb75aa470, fo=0xb75aa060, rc=0xb75aa064)
   at channel.c:4087
ASTERISK-1  0x0019496d in ast_bridge_call (chan=0xa084fb8, peer=0xa093a88, config=0xb75aa470) at res_features.c:1404
ASTERISK-2  0x00dc19a5 in dial_exec_full (chan=0xa084fb8, data=Variable "data" is not available.
) at app_dial.c:1656
ASTERISK-3  0x00dc46c4 in dial_exec (chan=0x1, data=0xb6e179b0) at app_dial.c:1710
ASTERISK-4  0x080ca371 in pbx_extension_helper (c=0xa084fb8, con=Variable "con" is not available.
) at pbx.c:532
ASTERISK-5  0x080ca94a in ast_spawn_extension (c=0x1, context=0xb6e179b0 "\002", exten=0xb6e179b0 "\002",
   priority=-1226737232, callerid=0xb6e179b0 "\002") at pbx.c:2288
ASTERISK-6 0x007a8ff6 in _macro_exec (chan=0xa084fb8, data=Variable "data" is not available.
) at app_macro.c:308
ASTERISK-7 0x080ca371 in pbx_extension_helper (c=0xa084fb8, con=Variable "con" is not available.
) at pbx.c:532
ASTERISK-8 0x080ca94a in ast_spawn_extension (c=0x1, context=0xb6e179b0 "\002", exten=0xb6e179b0 "\002",
   priority=-1226737232, callerid=0xb6e179b0 "\002") at pbx.c:2288
ASTERISK-9 0x007a8ff6 in _macro_exec (chan=0xa084fb8, data=Variable "data" is not available.
) at app_macro.c:308
ASTERISK-10 0x080ca371 in pbx_extension_helper (c=0xa084fb8, con=Variable "con" is not available.
) at pbx.c:532
ASTERISK-11 0x080cb6bf in __ast_pbx_run (c=0xa084fb8) at pbx.c:2288
ASTERISK-12 0x080cd63e in pbx_thread (data=0xa084fb8) at pbx.c:2603
ASTERISK-13 0x080fb4e5 in dummy_start (data=0xb6e179b0) at utils.c:793
ASTERISK-14 0x00cf93cc in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-15 0x00c66c3e in clone () from /lib/tls/libc.so.6
Comments:By: Chase Venters (chaseventers) 2007-10-10 13:24:15

This might be helpful as well:

(gdb) frame 0
#0  0x0807c249 in ast_queue_frame (chan=0xa109f58, fin=0xa1f18ac) at channel.c:934
934             AST_LIST_INSERT_TAIL(&chan->readq, f, frame_list);
(gdb) print chan->readq
$1 = {first = 0xb6e18960, last = 0x0}

The macro is:

#define AST_LIST_INSERT_TAIL(head, elm, field) do {                     \
     if (!(head)->first) {                                             \
               (head)->first = (elm);                                  \
               (head)->last = (elm);                                   \
     } else {                                                          \
               (head)->last->field.next = (elm);                       \
               (head)->last = (elm);                                   \
     }                                                                 \
} while (0)

So we're probably blowing up on:

               (head)->last->field.next = (elm);                       \

trying to manipulate a corrupted linked list.

By: Private Name (falves11) 2007-10-14 09:13:17

I think this is related.

#0  0xb6f5ba88 in SQLNumResultCols () from /usr/lib/libodbc.so.1
#1  0xb6ecc13f in acf_odbc_read (chan=0xb647af18, cmd=0xb660bc60 "ODBC_ROUTING", s=0xb660bc70 "S208-1192367328.1397", buf=0xb660bd40 "", len=4096) at func_odbc.c:312
#2  0x080cc09a in ast_func_read (chan=0xb647af18,
   function=0xb660cd50 "MINIXEL_ROUTING(S208-1192367328.1397,209.237.252.94,15206421013,5206860418,<sip:5206860418@209.237.252.94>;party=calling;screen=yes;privacy=off)", workspace=0xb660bd40 "", len=4096)
   at pbx.c:1586
#3  0x080d2705 in pbx_substitute_variables_helper_full (c=0xb647af18, headp=0xb647b264, cp1=0x8248030 "ARRAY(ANI,X,Nohunt,Delay)=${MINIXEL_ROUTING(${UNIQUEID},${SIPIP},${EXTEN},${anix},${cnum})}",
   cp2=0xb6612faa "", count=8165) at pbx.c:1718
#4  0x080d678b in pbx_extension_helper (c=0xb647af18, con=Variable "con" is not available.
) at pbx.c:1810
ASTERISK-1  0x080d7dcf in __ast_pbx_run (c=0xb647af18) at pbx.c:2397
ASTERISK-2  0x080da29e in pbx_thread (data=0xb647af18) at pbx.c:2752
ASTERISK-3  0x0810f245 in dummy_start (data=0x1) at utils.c:807
ASTERISK-4  0xb7eaa3cc in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-5  0xb71ffc3e in clone () from /lib/tls/libc.so.6

By: Digium Subversion (svnbot) 2007-11-01 14:29:03

Repository: asterisk
Revision: 88153

U   team/russell/readq-1.4/main/channel.c

------------------------------------------------------------------------
r88153 | russell | 2007-11-01 14:29:02 -0500 (Thu, 01 Nov 2007) | 15 lines

The readq handling in ast_do_masquerade() got broken when the code was converted
to use the AST_LIST macros.  Furthermore, the actual operation performed was
extremely bizarre.  I have re-written the readq handling in ast_do_masquerade()
to make it safe so that the readq list does not get corrupted, as well as
simplified and documented the code. There is also another fix for list handling
for channel datastores.

(related to issues ASTERISK-10489, ASTERISK-10193, ASTERISK-10012, and the 2nd backtrace of ASTERISK-10616)
(potentially related to issues ASTERISK-9737 and ASTERISK-10404)

For users involved with any of the bug reports I have listed, please give this
code a try:

$ svn co http://svn.digium.com/svn/asterisk/team/russell/readq-1.4

------------------------------------------------------------------------

By: Chase Venters (chaseventers) 2007-11-01 15:03:47

Looks good. We will apply the fix and monitor the system for further problems, after which we will indicate here what we saw. Thanks!

By: Digium Subversion (svnbot) 2007-11-05 14:10:19.000-0600

Repository: asterisk
Revision: 88709

U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r88709 | russell | 2007-11-05 14:10:17 -0600 (Mon, 05 Nov 2007) | 20 lines

Merge the last bit of changes from asterisk/team/russell/readq-1.4

The issue here is that the channel frame readq handling got broken when the
code was converted to use the linked list macros.  It caused corruption of the
list head and tail pointers.  So, I fixed up the usage of the linked list
macros and in passing, simplified the code.  I also documented what the code
is doing, as it was a bit difficult to figure out at first.

This bug showed itself with crashes showing messed up head/tail pointers for
the readq.  However, there are a couple of crashes that aren't quite as obvious,
but I think may be related.  So, if your bug gets closed by this commit, but
you still have a problem, please reopen or create a new bug report.

(closes issue ASTERISK-10489)
(closes issue ASTERISK-10193)
(closes issue ASTERISK-10012)
(closes issue ASTERISK-10616)
(closes issue ASTERISK-9737)
(closes issue ASTERISK-10404)

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-11-05 14:12:57.000-0600

Repository: asterisk
Revision: 88710

_U  trunk/
U   trunk/main/channel.c

------------------------------------------------------------------------
r88710 | russell | 2007-11-05 14:12:56 -0600 (Mon, 05 Nov 2007) | 28 lines

Merged revisions 88709 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r88709 | russell | 2007-11-05 14:11:04 -0600 (Mon, 05 Nov 2007) | 20 lines

Merge the last bit of changes from asterisk/team/russell/readq-1.4

The issue here is that the channel frame readq handling got broken when the
code was converted to use the linked list macros.  It caused corruption of the
list head and tail pointers.  So, I fixed up the usage of the linked list
macros and in passing, simplified the code.  I also documented what the code
is doing, as it was a bit difficult to figure out at first.

This bug showed itself with crashes showing messed up head/tail pointers for
the readq.  However, there are a couple of crashes that aren't quite as obvious,
but I think may be related.  So, if your bug gets closed by this commit, but
you still have a problem, please reopen or create a new bug report.

(closes issue ASTERISK-10489)
(closes issue ASTERISK-10193)
(closes issue ASTERISK-10012)
(closes issue ASTERISK-10616)
(closes issue ASTERISK-9737)
(closes issue ASTERISK-10404)

........

------------------------------------------------------------------------