Summary: | ASTERISK-00556: Asterisk Crash | ||
Reporter: | denon (denon) | Labels: | |
Date Opened: | 2003-11-21 09:34:39.000-0600 | Date Closed: | 2004-09-25 02:55:33 |
Priority: | Blocker | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) ast_queue_frame.patch | |
Description: | Had a * core this morning .. second one recently, hopefully the below will be helpful to the powers that be. I've obscured a callerid string to protect the guilty. pstn2 is on a T100P+adtran 750 FXO. CVS-11/20/03-18:04:26 #0 ast_queue_frame (chan=0x8172840, fin=0x8121a0c, lock=0) at channel.c:368 #1 0x4069c3d2 in local_write (ast=0x813f328, f=0x8121a0c) at chan_local.c:111 #2 0x080586f7 in ast_write (chan=0x813f328, fr=0x8121a0c) at channel.c:1392 #3 0x4042fbe1 in wait_for_answer (in=0x815d8f0, outgoing=0x813f220, to=0xbd5feb2c, allowredir_in=0xbd5feb30, allowredir_out=0xbd5feb34, allowdisconnect=0xbd5feb38) at app_dial.c:319 #4 0x40430cbb in dial_exec (chan=0x815d8f0, data=0xbd5ff304) at app_dial.c:635 ASTERISK-1 0x08060bb0 in pbx_exec (c=0x815d8f0, app=0x8107a48, data=0xbd5ff304, newstack=1) at pbx.c:396 ASTERISK-2 0x08062bf3 in pbx_extension_helper (c=0x815d8f0, context=0x815da48 "pstn2", exten=0x815db3c "s", priority=10, callerid=0x815fxxx "\"SOME-CALLERID\" <555111222>", action=1) at pbx.c:1157 ASTERISK-3 0x0806392d in ast_pbx_run (c=0x815d8f0) at pbx.c:1641 ASTERISK-4 0x4062f62c in ss_thread (data=0x815d8f0) at chan_zap.c:4458 ASTERISK-5 0x400200ba in pthread_start_thread () from /lib/libpthread.so.0 | ||
Comments: | By: jerjer (jerjer) 2003-11-21 10:00:50.000-0600 updated bug settings per denon's request By: Brian West (bkw918) 2003-12-02 10:44:01.000-0600 Let me know if recent changes to chan_local fix this. By: ww (ww) 2003-12-05 14:18:59.000-0600 i have added some parameter checking in channel.c patch is attached. it should not core with this patch but the real question is why chan->pvt is null. perhaps the changes bkw refers to in chan_local explain this? By: tclark (tclark) 2003-12-06 10:00:30.000-0600 ww: from kram race condition here look in chan_local local_queue_frame see where it tried to obtain the channel lock if (ast_mutex_trylock(&other->lock)) { /* Failed to lock. Release main lock and try again */ ast_mutex_unlock(&p->lock); if the first attempt to get the lock failed it did goto retrylock; that label 'retrylock' was *AFTER* the code if (isoutbound) { other = p->owner; } else { other = p->chan; } that channel assignment to 'other' was not attempted while in the locks atemmpts so it is possible why waiting the mutex lock one of the channels could get destroyed (ie hang up) the fix moved the label b4 that code & re-obtain the channel ref .. By: jrollyson (jrollyson) 2004-01-11 00:49:18.000-0600 Is this still happening? By: Brian West (bkw918) 2004-01-31 23:35:31.000-0600 Please confirm that this has been fixed if not please post comment and we will reopen. Thanks, Brian By: Brian West (bkw918) 2004-02-17 15:36:51.000-0600 Yo Yo Yo... bring on the bug fixes... :P By: jjanzer (jjanzer) 2004-02-17 15:44:09.000-0600 I noticed that I had nearly the exact same problem. I was running safe_asterisk : CVS-02/16/04-15:40:46 It segfaulted with the last messages on CLI stating: == Spawn extension (corporate, h, 1) exited non-zero on 'Zap/5-1' -- Hungup 'Zap/5-1' -- SIP/6031-f57b answered Local/6031@corporate-e9a4,2 -- Stopped music on hold on Local/6031@corporate-e9a4,2 -- Local/6031@corporate-e9a4,1 answered Local/6032@corporate-0fbf,2 -- Stopped music on hold on Local/6032@corporate-0fbf,2 -- Local/6032@corporate-0fbf,1 answered, waiting for '#' to acknowledge == Spawn extension (corporate, 6032, 1) exited non-zero on 'Local/6032@corporate-0fbf,2' -- Executing Hangup("Local/6032@corporate-0fbf,2", "") in new stack == Spawn extension (corporate, h, 1) exited non-zero on 'Local/6032@corporate-0fbf,2' ast0*CLI> Disconnected from Asterisk server And I believe /var/log/asterisk/messages Feb 17 12:15:08 WARNING[18219033]: No audio available on Local/6031@corporate-7e30,2?? Feb 17 12:15:08 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:15:26 WARNING[18251801]: No audio available on Local/6031@corporate-cf03,2?? Feb 17 12:15:26 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:15:45 WARNING[18300951]: No audio available on Local/6031@corporate-fc27,2?? Feb 17 12:15:45 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:16:03 WARNING[18333719]: No audio available on Local/6031@corporate-8abc,2?? Feb 17 12:16:03 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:16:21 WARNING[18366487]: No audio available on Local/6031@corporate-ac25,2?? Feb 17 12:16:21 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:16:40 WARNING[18399255]: No audio available on Local/6031@corporate-2530,2?? Feb 17 12:16:40 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:16:58 WARNING[18448408]: No audio available on Local/6031@corporate-4d06,2?? Feb 17 12:16:58 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:17:16 WARNING[18481176]: No audio available on Local/6031@corporate-ca13,2?? Feb 17 12:17:17 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:17:35 WARNING[18530329]: No audio available on Local/6031@corporate-3f88,2?? Feb 17 12:17:35 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:17:53 WARNING[18563097]: No audio available on Local/6031@corporate-8989,2?? Feb 17 12:17:53 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:18:11 WARNING[18595865]: No audio available on Local/6031@corporate-e0ac,2?? Feb 17 12:18:11 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:18:30 WARNING[18628633]: No audio available on Local/6031@corporate-16a0,2?? Feb 17 12:18:30 NOTICE[17629205]: No one is answered queue CustomerService Feb 17 12:24:59 WARNING[16384]: Ignoring port for now Feb 17 12:25:01 NOTICE[229391]: Unable to create channel of type 'Zap' Feb 17 12:25:31 NOTICE[311314]: No one is answered queue CustomerService Feb 17 12:25:32 NOTICE[311314]: No one is answered queue CustomerService That jump from 12:18 -> 12:24 is when I caught it (I'm not 100% sure though), and reran safe_asterisk Here is the full backtrace (note: I didn't run the backtrace on the machine that was running production, but I did copy the binary and core files to another): ... Core was generated by `asterisk -vvvg -c'. Program terminated with signal 11, Segmentation fault. ... bt full #0 ast_queue_frame (chan=0x81add10, fin=0xbc9feb54, lock=0) at channel.c:368 f = (struct ast_frame *) 0x8153590 prev = (struct ast_frame *) 0x0 cur = (struct ast_frame *) 0x0 blah = 1 qlen = 0 #1 0x0805bd6c in ast_channel_masquerade (original=0x81add10, clone=0x81bcba8) at channel.c:1915 clone = (struct ast_channel *) 0x81bcba8 null = {frametype = 5, subclass = 0, datalen = 0, samples = 0, mallocd = 0, offset = 0, src = 0x0, data = 0x0, prev = 0x0, next = 0x0} #2 0x40375170 in ?? () No symbol table info available. #3 0x40375ceb in ?? () No symbol table info available. #4 0x080592d7 in ast_write (chan=0x81a5b70, fr=0x81c7e60) at channel.c:1397 chan = (struct ast_channel *) 0x80dc950 res = -1 f = (struct ast_frame *) 0x81c7e60 ASTERISK-1 0x0805b0ca in ast_channel_bridge (c0=0x81a5b70, c1=0x81bcba8, flags=3, fo=0xbc9fecf4, rc=0xbc9fecf8) at channel.c:2325 c0 = (struct ast_channel *) 0x81a5b70 c1 = (struct ast_channel *) 0x81bcba8 cs = {0x81a5b70, 0x81bcba8, 0x81a5b70} to = -1 f = (struct ast_frame *) 0x81c7e60 who = (struct ast_channel *) 0x81bcba8 res = 136085088 nativefailed = 0 ASTERISK-2 0x4022479e in ?? () No symbol table info available. ASTERISK-3 0x404df1b7 in ?? () No symbol table info available. ASTERISK-4 0x08061c50 in pbx_exec (c=0x81a5b70, app=0x80db4a8, data=0xbc9ff77c, newstack=1) at pbx.c:396 res = 1076001004 stack = -1 execute = (int (*)()) 0x404ddee4 ASTERISK-5 0x08063cf3 in pbx_extension_helper (c=0x81a5b70, context=0x81a5cc8 "corporate", exten=0x81a5dbc "6031", priority=1, callerid=0x8120bd0 "9128982121", action=1) at pbx.c:1171 callerid = 0x81a5b70 "Local/6031@corporate-e9a4,2" action = -1130367108 e = (struct ast_exten *) 0x0 ---Type <return> to continue, or q <return> to quit--- app = (struct ast_app *) 0x81a5b70 sw = (struct ast_switch *) 0x0 data = 0x0 newstack = 1 res = 0 status = 4 incstack = {0x0 <repeats 20 times>, 0x9 <Address 0x9 out of bounds>, 0x8070f42 "\203Ä\020\205Àu\a\212\0043\210\0047GC\200<3", 0x80b25d0 "( )-.", 0x31 <Address 0x31 out of bounds>, 0x0, 0x0, 0xbc9ff93c "9128982121", 0x81d4230 "9128982121", 0x0, 0x81d44f0 "1077045511.1989", 0x81d4230 "9128982121", 0x0, 0xbc9ff98c "&ASTERISK-185;]\032\b\001", 0x8073413 "1À\215e\210[^_\211ì]Ã\220U\211å\203ìlWVS\213}\b Ø,\v\b\203Äü\213·<\027", 0x81d44f0 "1077045511.1989", 0x81a731c "1077045511.1989", 0x1f <Address 0x1f out of bounds>, 0x40155b61 "¥", 0x4002dd14 "\024ì", 0xbc9ffbe0 "àû\237&ASTERISK-185;", 0x0, 0x4002dd14 "\024ì", 0x402140a0 "", 0x81d4230 "9128982121", 0xbc9ff94c "", 0x40025d7a "\004ë®ÇF\b", 0x0, 0xbc9ff93c "9128982121", 0x38323139 <Address 0x38323139 out of bounds>, 0x31323839 <Address 0x31323839 out of bounds>, 0x3132 <Address 0x3132 out of bounds>, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8073b0f "\215eè[^\211ì]ÃU\211å\203ì\024S\213]\b\205Û\017\204\225", 0x81d44b0 "\ai2@¬\177\005", 0x0, 0x0, 0x0, 0x81a5cc8 "corporate", 0x81a5dbc "6031"} passdata = "SIP/6031|30|mtTr", '\0' <repeats 239 times> stacklen = 0 tmp = "\e[1;36;40mDial\e[0;37;40m", '\0' <repeats 55 times> tmp2 = "\e[1;35;40mLocal/6031@corporate-e9a4,2\e[0;37;40m", '\0' <repeats 32 times> tmp3 = "\e[1;35;40mSIP/6031|30|mtTr\e[0;37;40m", '\0' <repeats 219 times> ASTERISK-6 0x08064a2d in ast_pbx_run (c=0x81a5b70) at pbx.c:1655 digit = 0 '\0' exten = '\0' <repeats 255 times> pos = 0 waittime = -1130366260 res = 0 ASTERISK-7 0x0806af8e in pbx_thread (data=0x81a5b70) at pbx.c:1880 data = (void *) 0x0 ASTERISK-8 0x40024e51 in pthread_start_thread () from /lib/libpthread.so.0 No symbol table info available. I am going to apply ww's patch after business hours today, which seems like a decent way to prevent the segfaults for now, let me know if I can be of more help. By: jjanzer (jjanzer) 2004-02-17 18:02:41.000-0600 Here is a dump of "*chan" print *chan $2 = {name = ",D!@&ASTERISK-165;\f\036\b32@corporate-0fbf,1<ZOMBIE>", '\0' <repeats 44 times>, language = "en", '\0' <repeats 17 times>, type = 0x40376074 "Local", fds = {-1, -1, -1, -1, -1, -1, 117, 115}, musicclass = '\0' <repeats 19 times>, generatordata = 0x0, generator = 0x0, writeinterrupt = 0, bridge = 0x0, dialed = 0x0, dialing = 0x0, reversedialed = 0, masq = 0x81bcba8, masqr = 0x0, cdrflags = 0, blocking = 0, _softhangup = 1, zombie = 1, whentohangup = 0, blocker = 18661400, lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 1, __spinlock = 0}}, blockproc = 0x80ab938 "ast_waitfor_nandfds", appl = 0x40225cd4 <Address 0x40225cd4 out of bounds>, data = 0x81fdd30 "X@\025\bè=\022\b32@corporate-0fbf,2", exception = 0, fdno = 7, sched = 0x8122928, streamid = -1, stream = 0x0, vstreamid = 0, vstream = 0x0, oldwriteformat = 0, timingfd = 117, timingfunc = 0, timingdata = 0x0, _state = 6, rings = 0, stack = -1, nativeformats = 4, readformat = 4, writeformat = 4, dnid = 0x0, callerid = 0x8130130 "\b\a\022\b982121", ani = 0x0, rdnis = 0x0, restrictcid = 0, callingpres = 0, context = "corporate", '\0' <repeats 70 times>, macrocontext = '\0' <repeats 79 times>, macroexten = '\0' <repeats 79 times>, macropriority = 0, exten = "s", '\0' <repeats 78 times>, priority = 1, app = {0x0 <repeats 32 times>}, dtmfq = '\0' <repeats 79 times>, deferdtmf = 0, dtmff = {frametype = 0, subclass = 0, datalen = 0, samples = 0, mallocd = 0, offset = 0, src = 0x0, data = 0x0, prev = 0x0, next = 0x0}, pvt = 0x0, jmp = {{{__jmpbuf = {0, 0, 0, 0, 0, 0}, __mask_was_saved = 0, __saved_mask = {__val = {0 <repeats 32 times>}}}} <repeats 32 times>}, pbx = 0x0, amaflags = 3, accountcode = '\0' <repeats 19 times>, cdr = 0x0, adsicpe = 0, call_forward = '\0' <repeats 79 times>, zone = 0x0, monitor = 0x0, insmpl = 0, outsmpl = 0, fin = 5, fout = 0, uniqueid = "1077045511.1988", '\0' <repeats 16 times>, hangupcause = 0, vars = 0x81d5418, varshead = {first = 0x0, lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, callgroup = 0, pickupgroup = 0, flag = 0, next = 0x81dc8d8} By: Mark Spencer (markster) 2004-02-18 17:37:04.000-0600 Can this problem be duplicated? Maybe I can ssh in and take a look? By: jjanzer (jjanzer) 2004-02-20 12:05:26.000-0600 It can be duplicated but it seems to occur at random, I think bkw was right in suspecting chan_local.so as the problem. I switched our queues from static callbacklogins to the dynamic queue functions. If you would like to I can reverse the process on some weekend and you can play around as root on the box. By: Mark Spencer (markster) 2004-02-23 00:08:16.000-0600 This may be fixed with some recent changes to masquerade. By: Mark Spencer (markster) 2004-02-23 00:11:19.000-0600 I believe this was fixed as part of my work on 1008. Re-open if it isn't. |