[Home]

Summary:ASTERISK-00556: Asterisk Crash
Reporter:denon (denon)Labels:
Date Opened:2003-11-21 09:34:39.000-0600Date Closed:2004-09-25 02:55:33
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) ast_queue_frame.patch
Description:Had a * core this morning .. second one recently, hopefully the below will be helpful to the powers that be.  I've obscured a callerid string to protect the guilty. pstn2 is on a T100P+adtran 750 FXO.

CVS-11/20/03-18:04:26
#0  ast_queue_frame (chan=0x8172840, fin=0x8121a0c, lock=0) at channel.c:368
#1  0x4069c3d2 in local_write (ast=0x813f328, f=0x8121a0c) at chan_local.c:111
#2  0x080586f7 in ast_write (chan=0x813f328, fr=0x8121a0c) at channel.c:1392
#3  0x4042fbe1 in wait_for_answer (in=0x815d8f0, outgoing=0x813f220, to=0xbd5feb2c, allowredir_in=0xbd5feb30,
   allowredir_out=0xbd5feb34, allowdisconnect=0xbd5feb38) at app_dial.c:319
#4  0x40430cbb in dial_exec (chan=0x815d8f0, data=0xbd5ff304) at app_dial.c:635
ASTERISK-1  0x08060bb0 in pbx_exec (c=0x815d8f0, app=0x8107a48, data=0xbd5ff304, newstack=1) at pbx.c:396
ASTERISK-2  0x08062bf3 in pbx_extension_helper (c=0x815d8f0, context=0x815da48 "pstn2", exten=0x815db3c "s", priority=10,
   callerid=0x815fxxx "\"SOME-CALLERID\" <555111222>", action=1) at pbx.c:1157
ASTERISK-3  0x0806392d in ast_pbx_run (c=0x815d8f0) at pbx.c:1641
ASTERISK-4  0x4062f62c in ss_thread (data=0x815d8f0) at chan_zap.c:4458
ASTERISK-5  0x400200ba in pthread_start_thread () from /lib/libpthread.so.0
Comments:By: jerjer (jerjer) 2003-11-21 10:00:50.000-0600

updated bug settings per denon's request

By: Brian West (bkw918) 2003-12-02 10:44:01.000-0600

Let me know if recent changes to chan_local fix this.

By: ww (ww) 2003-12-05 14:18:59.000-0600

i have added some parameter checking in channel.c
patch is attached. it should not core with this patch
but the real question is why chan->pvt is null.

perhaps the changes bkw refers to in chan_local
explain this?

By: tclark (tclark) 2003-12-06 10:00:30.000-0600

ww: from kram race condition here
look in chan_local local_queue_frame
see where it tried to obtain the channel lock
if (ast_mutex_trylock(&other->lock)) {
/* Failed to lock.  Release main lock and try again */
ast_mutex_unlock(&p->lock);

if the first attempt to get the lock failed
it did  goto retrylock;

that label 'retrylock' was *AFTER* the code
if (isoutbound) {
other = p->owner;
} else {
other = p->chan;
}

that channel assignment to 'other' was not attempted while in the locks atemmpts
so it is possible why waiting the mutex lock one of the channels could get destroyed (ie hang up)

the fix moved the label b4 that code & re-obtain the channel ref ..

By: jrollyson (jrollyson) 2004-01-11 00:49:18.000-0600

Is this still happening?

By: Brian West (bkw918) 2004-01-31 23:35:31.000-0600

Please confirm that this has been fixed if not please post comment and we will reopen.

Thanks,
Brian

By: Brian West (bkw918) 2004-02-17 15:36:51.000-0600

Yo Yo Yo... bring on the bug fixes... :P

By: jjanzer (jjanzer) 2004-02-17 15:44:09.000-0600

I noticed that I had nearly the exact same problem.
I was running safe_asterisk : CVS-02/16/04-15:40:46

It segfaulted with the last messages on CLI stating:
== Spawn extension (corporate, h, 1) exited non-zero on 'Zap/5-1'
   -- Hungup 'Zap/5-1'
   -- SIP/6031-f57b answered Local/6031@corporate-e9a4,2
   -- Stopped music on hold on Local/6031@corporate-e9a4,2
   -- Local/6031@corporate-e9a4,1 answered Local/6032@corporate-0fbf,2
   -- Stopped music on hold on Local/6032@corporate-0fbf,2
   -- Local/6032@corporate-0fbf,1 answered, waiting for '#' to acknowledge
 == Spawn extension (corporate, 6032, 1) exited non-zero on 'Local/6032@corporate-0fbf,2'
   -- Executing Hangup("Local/6032@corporate-0fbf,2", "") in new stack
 == Spawn extension (corporate, h, 1) exited non-zero on 'Local/6032@corporate-0fbf,2'
ast0*CLI>
Disconnected from Asterisk server



And I believe /var/log/asterisk/messages
Feb 17 12:15:08 WARNING[18219033]: No audio available on Local/6031@corporate-7e30,2??
Feb 17 12:15:08 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:15:26 WARNING[18251801]: No audio available on Local/6031@corporate-cf03,2??
Feb 17 12:15:26 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:15:45 WARNING[18300951]: No audio available on Local/6031@corporate-fc27,2??
Feb 17 12:15:45 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:16:03 WARNING[18333719]: No audio available on Local/6031@corporate-8abc,2??
Feb 17 12:16:03 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:16:21 WARNING[18366487]: No audio available on Local/6031@corporate-ac25,2??
Feb 17 12:16:21 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:16:40 WARNING[18399255]: No audio available on Local/6031@corporate-2530,2??
Feb 17 12:16:40 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:16:58 WARNING[18448408]: No audio available on Local/6031@corporate-4d06,2??
Feb 17 12:16:58 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:17:16 WARNING[18481176]: No audio available on Local/6031@corporate-ca13,2??
Feb 17 12:17:17 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:17:35 WARNING[18530329]: No audio available on Local/6031@corporate-3f88,2??
Feb 17 12:17:35 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:17:53 WARNING[18563097]: No audio available on Local/6031@corporate-8989,2??
Feb 17 12:17:53 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:18:11 WARNING[18595865]: No audio available on Local/6031@corporate-e0ac,2??
Feb 17 12:18:11 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:18:30 WARNING[18628633]: No audio available on Local/6031@corporate-16a0,2??
Feb 17 12:18:30 NOTICE[17629205]: No one is answered queue CustomerService
Feb 17 12:24:59 WARNING[16384]: Ignoring port for now
Feb 17 12:25:01 NOTICE[229391]: Unable to create channel of type 'Zap'
Feb 17 12:25:31 NOTICE[311314]: No one is answered queue CustomerService
Feb 17 12:25:32 NOTICE[311314]: No one is answered queue CustomerService

That jump from 12:18 -> 12:24 is when I caught it (I'm not 100% sure though), and reran safe_asterisk


Here is the full backtrace (note: I didn't run the backtrace on the machine that was running production, but I did copy the binary and core files to another):

...
Core was generated by `asterisk -vvvg -c'.
Program terminated with signal 11, Segmentation fault.
...
bt full
#0  ast_queue_frame (chan=0x81add10, fin=0xbc9feb54, lock=0) at channel.c:368
f = (struct ast_frame *) 0x8153590
prev = (struct ast_frame *) 0x0
cur = (struct ast_frame *) 0x0
blah = 1
qlen = 0
#1  0x0805bd6c in ast_channel_masquerade (original=0x81add10, clone=0x81bcba8) at channel.c:1915
clone = (struct ast_channel *) 0x81bcba8
null = {frametype = 5, subclass = 0, datalen = 0, samples = 0, mallocd = 0, offset = 0, src = 0x0, data = 0x0, prev = 0x0, next = 0x0}
#2  0x40375170 in ?? ()
No symbol table info available.
#3  0x40375ceb in ?? ()
No symbol table info available.
#4  0x080592d7 in ast_write (chan=0x81a5b70, fr=0x81c7e60) at channel.c:1397
chan = (struct ast_channel *) 0x80dc950
res = -1
f = (struct ast_frame *) 0x81c7e60
ASTERISK-1  0x0805b0ca in ast_channel_bridge (c0=0x81a5b70, c1=0x81bcba8, flags=3, fo=0xbc9fecf4, rc=0xbc9fecf8) at channel.c:2325
c0 = (struct ast_channel *) 0x81a5b70
c1 = (struct ast_channel *) 0x81bcba8
cs = {0x81a5b70, 0x81bcba8, 0x81a5b70}
to = -1
f = (struct ast_frame *) 0x81c7e60
who = (struct ast_channel *) 0x81bcba8
res = 136085088
nativefailed = 0
ASTERISK-2  0x4022479e in ?? ()
No symbol table info available.
ASTERISK-3  0x404df1b7 in ?? ()
No symbol table info available.
ASTERISK-4  0x08061c50 in pbx_exec (c=0x81a5b70, app=0x80db4a8, data=0xbc9ff77c, newstack=1) at pbx.c:396
res = 1076001004
stack = -1
execute = (int (*)()) 0x404ddee4
ASTERISK-5  0x08063cf3 in pbx_extension_helper (c=0x81a5b70, context=0x81a5cc8 "corporate", exten=0x81a5dbc "6031", priority=1, callerid=0x8120bd0 "9128982121", action=1)
   at pbx.c:1171
callerid = 0x81a5b70 "Local/6031@corporate-e9a4,2"
action = -1130367108
e = (struct ast_exten *) 0x0
---Type <return> to continue, or q <return> to quit---
app = (struct ast_app *) 0x81a5b70
sw = (struct ast_switch *) 0x0
data = 0x0
newstack = 1
res = 0
status = 4
incstack = {0x0 <repeats 20 times>, 0x9 <Address 0x9 out of bounds>, 0x8070f42 "\203Ä\020\205Àu\a\212\0043\210\0047GC\200<3", 0x80b25d0 "( )-.",
 0x31 <Address 0x31 out of bounds>, 0x0, 0x0, 0xbc9ff93c "9128982121", 0x81d4230 "9128982121", 0x0, 0x81d44f0 "1077045511.1989", 0x81d4230 "9128982121", 0x0,
 0xbc9ff98c "&ASTERISK-185;]\032\b\001", 0x8073413 "1À\215e\210[^_\211ì]Ã\220U\211å\203ìlWVS\213}\b Ø,\v\b\203Äü\213·<\027", 0x81d44f0 "1077045511.1989",
 0x81a731c "1077045511.1989", 0x1f <Address 0x1f out of bounds>, 0x40155b61 "¥", 0x4002dd14 "\024ì", 0xbc9ffbe0 "àû\237&ASTERISK-185;", 0x0, 0x4002dd14 "\024ì", 0x402140a0 "",
 0x81d4230 "9128982121", 0xbc9ff94c "", 0x40025d7a "\004ë®ÇF\b", 0x0, 0xbc9ff93c "9128982121", 0x38323139 <Address 0x38323139 out of bounds>,
 0x31323839 <Address 0x31323839 out of bounds>, 0x3132 <Address 0x3132 out of bounds>, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
 0x8073b0f "\215eè[^\211ì]ÃU\211å\203ì\024S\213]\b\205Û\017\204\225", 0x81d44b0 "\ai2@¬\177\005", 0x0, 0x0, 0x0, 0x81a5cc8 "corporate", 0x81a5dbc "6031"}
passdata = "SIP/6031|30|mtTr", '\0' <repeats 239 times>
stacklen = 0
tmp = "\e[1;36;40mDial\e[0;37;40m", '\0' <repeats 55 times>
tmp2 = "\e[1;35;40mLocal/6031@corporate-e9a4,2\e[0;37;40m", '\0' <repeats 32 times>
tmp3 = "\e[1;35;40mSIP/6031|30|mtTr\e[0;37;40m", '\0' <repeats 219 times>
ASTERISK-6 0x08064a2d in ast_pbx_run (c=0x81a5b70) at pbx.c:1655
digit = 0 '\0'
exten = '\0' <repeats 255 times>
pos = 0
waittime = -1130366260
res = 0
ASTERISK-7 0x0806af8e in pbx_thread (data=0x81a5b70) at pbx.c:1880
data = (void *) 0x0
ASTERISK-8 0x40024e51 in pthread_start_thread () from /lib/libpthread.so.0
No symbol table info available.


I am going to apply ww's patch after business hours today, which seems like a decent way to prevent the segfaults for now, let me know if I can be of more help.

By: jjanzer (jjanzer) 2004-02-17 18:02:41.000-0600

Here is a dump of "*chan"

print *chan
$2 = {name = ",D!@&ASTERISK-165;\f\036\b32@corporate-0fbf,1<ZOMBIE>", '\0' <repeats 44 times>, language = "en", '\0' <repeats 17 times>, type = 0x40376074 "Local", fds = {-1,
   -1, -1, -1, -1, -1, 117, 115}, musicclass = '\0' <repeats 19 times>, generatordata = 0x0, generator = 0x0, writeinterrupt = 0, bridge = 0x0, dialed = 0x0,
 dialing = 0x0, reversedialed = 0, masq = 0x81bcba8, masqr = 0x0, cdrflags = 0, blocking = 0, _softhangup = 1, zombie = 1, whentohangup = 0, blocker = 18661400,
 lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 1, __spinlock = 0}}, blockproc = 0x80ab938 "ast_waitfor_nandfds",
 appl = 0x40225cd4 <Address 0x40225cd4 out of bounds>, data = 0x81fdd30 "X@\025\bè=\022\b32@corporate-0fbf,2", exception = 0, fdno = 7, sched = 0x8122928,
 streamid = -1, stream = 0x0, vstreamid = 0, vstream = 0x0, oldwriteformat = 0, timingfd = 117, timingfunc = 0, timingdata = 0x0, _state = 6, rings = 0,
 stack = -1, nativeformats = 4, readformat = 4, writeformat = 4, dnid = 0x0, callerid = 0x8130130 "\b\a\022\b982121", ani = 0x0, rdnis = 0x0, restrictcid = 0,
 callingpres = 0, context = "corporate", '\0' <repeats 70 times>, macrocontext = '\0' <repeats 79 times>, macroexten = '\0' <repeats 79 times>, macropriority = 0,
 exten = "s", '\0' <repeats 78 times>, priority = 1, app = {0x0 <repeats 32 times>}, dtmfq = '\0' <repeats 79 times>, deferdtmf = 0, dtmff = {frametype = 0,
   subclass = 0, datalen = 0, samples = 0, mallocd = 0, offset = 0, src = 0x0, data = 0x0, prev = 0x0, next = 0x0}, pvt = 0x0, jmp = {{{__jmpbuf = {0, 0, 0, 0, 0,
         0}, __mask_was_saved = 0, __saved_mask = {__val = {0 <repeats 32 times>}}}} <repeats 32 times>}, pbx = 0x0, amaflags = 3,
 accountcode = '\0' <repeats 19 times>, cdr = 0x0, adsicpe = 0, call_forward = '\0' <repeats 79 times>, zone = 0x0, monitor = 0x0, insmpl = 0, outsmpl = 0,
 fin = 5, fout = 0, uniqueid = "1077045511.1988", '\0' <repeats 16 times>, hangupcause = 0, vars = 0x81d5418, varshead = {first = 0x0, lock = {__m_reserved = 0,
     __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, callgroup = 0, pickupgroup = 0, flag = 0, next = 0x81dc8d8}

By: Mark Spencer (markster) 2004-02-18 17:37:04.000-0600

Can this problem be duplicated?  Maybe I can ssh in and take a look?

By: jjanzer (jjanzer) 2004-02-20 12:05:26.000-0600

It can be duplicated but it seems to occur at random, I think bkw was right in suspecting chan_local.so as the problem. I switched our queues from static callbacklogins to the dynamic queue functions.

If you would like to I can reverse the process on some weekend and you can play around as root on the box.

By: Mark Spencer (markster) 2004-02-23 00:08:16.000-0600

This may be fixed with some recent changes to masquerade.

By: Mark Spencer (markster) 2004-02-23 00:11:19.000-0600

I believe this was fixed as part of my work on 1008.  Re-open if it isn't.