Summary: | ASTERISK-01195: Potential lockup when read() from a pipe to mpg123 causes channel to become unusable and pbx turns call away | ||
Reporter: | jjanzer (jjanzer) | Labels: | |
Date Opened: | 2004-03-11 02:26:29.000-0600 | Date Closed: | 2008-01-15 14:49:36.000-0600 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 2.mp3 ( 1) bug1202.diff ( 2) threadApllyAllBTFull.log | |
Description: | Just to start I am unsure on the exact cause of this issue, but here goes. At some point during the run of asterisk a call gets transferred (I believe this is how it's triggered, but I'm still unsure) and somehow the person hangs up but asterisk doesn't fully hangup the call (as in it is still listed when doing show channels, if I do a 'show channel' it states: Blocking in: (Not Blocking) as opposed to a good channel which states: Blocking in: ast_waitfor_nandfds Ok so that wouldn't really be a big issue, we could just restart asterisk after all the calls are over, but, the real problem is that when someone calls into asterisk on that channel they get a fast busy or similar error. I talked to kram, and looked myself at the problem for about 5 hours tonight and I believe I have tracked it down. What I think is happening is that a PIPE from mpg123 is nuked triggering a SIG_PIPE call, which since not-too-long ago is now under SIG_IGN. From here it sits in a wait() or something similar. Thus the channel is never given a chance to fully hangup since it never terminates, and read never returns. Again this is just a theory, I am not sure if this is the case... The temporary solution seems to be to turn off music on hold as suggested by Mark. ****** ADDITIONAL INFORMATION ****** here is a bt full of one of the threads having the issue: 0x4002aaab in read () from /lib/libpthread.so.0 (gdb) (gdb) (gdb) bt full #0 0x4002aaab in read () from /lib/libpthread.so.0 No symbol table info available. #1 0x4021b7e4 in ?? () from /usr/lib/asterisk/modules/res_musiconhold.so No symbol table info available. #2 0x00000014 in ?? () No symbol table info available. #3 0x402191a8 in moh_generate (chan=0x81da8e8, data=0x40709390, len=160, samples=160) at res_musiconhold.c:449 data = (void *) 0xfffffe00 len = -512 f = {frametype = 136161728, subclass = 135855252, datalen = -1126175796, samples = 1073896826, mallocd = 136161744, offset = 0, src = 0x0, data = 0x40027570, prev = 0x81da8e8, next = 0x81da8e8} buf = {0 <repeats 32 times>, -3503, -4222, -2678, -583, 1457, 1148, -294, -535, 11, 628, 832, 496, -737, -1764, -2557, -1896, -1283, -1297, -1428, -1863, -2278, -929, 1230, 2579, 2901, 1346, -318, -772, 166, 472, 351, -743, -1844, -2041, -726, 1058, 2546, 2906, 2397, 2089, 2094, 2808, 3027, 2835, 2124, 1509, 955, 1084, 2019, 1745, 177, -915, -1457, -183, 1646, 2936, 3130, 1999, 1500, 1074, 1139, 512, 323, -487, -1096, -746, -854, -96, 334, 714, -152, -553, -559, 391, 966, 1072, 342, -1014, -1805, -1821, -1190, -1174, -1418, -2071, -2342, -1188, 409, 906, 1296, 1017, 182, -199, 100, 295, 542, 581, -250, -606, -958, -330, 208, 393, -479, -1026, -1395, -645, 475, 592, -207, -527, -1293, -1679, -918, -490, -500, -1108, -1961, -1628, 118, 1823, 2323, 1724, 954, 352, 657, 1739, 2130, 1765, 525, -43, -453, -592, -497, -200, -652, -998, -1059, -965, -267, 99, -319, -740, -1337, -1680, -1491, -932, -325, -447, -1162, -1016, 459, 1244, 1117, 316, -92, -223, 374, 669, 405, 335, 0, 0, 1, 0, 2, 0 <repeats 21 times>, -6820, -17185, -7356, -17185, -3649, 16400, 21117, 16416, 15392, 26995, 14960, 12344, 0, 0, 0...} res = 320 #4 0x08058a18 in ast_read (chan=0x81da8e8) at channel.c:1160 res = -512 chan = (struct ast_channel *) 0x81da8e8 f = (struct ast_frame *) 0x818fc94 blah = -1126175616 func = (int (*)()) 0x818fc94 data = (void *) 0x805afe2 null_frame = {frametype = 5, subclass = 0, datalen = 0, samples = 0, mallocd = 0, offset = 0, src = 0x0, data = 0x0, prev = 0x0, next = 0x0} ASTERISK-1 0x0805b01e in ast_channel_bridge (c0=0x81da8e8, c1=0x81ff190, flags=2, fo=0xbcdfecf4, rc=0xbcdfecf8) at channel.c:2268 c0 = (struct ast_channel *) 0x81da8e8 c1 = (struct ast_channel *) 0x81ff190 cs = {0x81ff190, 0x81da8e8, 0x81da8e8} to = -1 f = (struct ast_frame *) 0x0 who = (struct ast_channel *) 0x81da8e8 res = 0 nativefailed = 0 ASTERISK-2 0x4022479e in ast_bridge_call (chan=0x81da8e8, peer=0x81ff190, allowredirect_in=1, allowredirect_out=0, allowdisconnect=0) at res_parking.c:250 f = (struct ast_frame *) 0x40709390 who = (struct ast_channel *) 0x81ff190 newext = "\000=\002@K]\002@n\217\002@(���\022<\002@\000\000\000\00006!@\000\000\000\00006!@\024�\002@@\036\f\b\004\000\000\000L���z]\002@P\036\f\b\000\000\000\000��\002@\220�\037\b\220�\037\b\000\000\000\000\234���;�\005\b@\036\f\b\004\000\000\000|���z]\002@\220�\037\b\004\000\000\000\004�����\035\b\220�\037\b\004\000\000\000\004���;�\005\b@\036\f\b�@!@������\232;\000\000\000\000\000\000\000\000����\027\226\005\b\004\000\000\000\004\000\000\000����n\200\b\b��\035\b\220�\037\b\020\000\000\000��\232;"... ---Type <return> to continue, or q <return> to quit--- ptr = 0xffffffff <Address 0xffffffff out of bounds> res = 0 aoh = (struct ast_option_header *) 0x140 transferer = (struct ast_channel *) 0x0 transferee = (struct ast_channel *) 0x0 transferer_real_context = 0x0 ASTERISK-3 0x404df1b7 in dial_exec (chan=0x81da8e8, data=0xbcdff77c) at app_dial.c:697 data = (void *) 0x81ff190 res = 0 u = (struct localuser *) 0x8143800 info = "SIP\0006018\00030\000rt", '\0' <repeats 241 times> peers = 0x81ff190 "SIP/6018-a729" timeout = 0xbcdff465 "30" number = 0x81da8e8 "Zap/6-1" rest = 0x0 cur = 0x0 privdb = '\0' <repeats 255 times> s = 0x81da8e8 "Zap/6-1" announcemsg = '\0' <repeats 255 times> ann = 0x81ff190 "SIP/6018-a729" outgoing = (struct localuser *) 0x0 tmp = (struct localuser *) 0x81d05c8 peer = (struct ast_channel *) 0x81ff190 to = 23870 allowredir_in = 1 allowredir_out = 0 allowdisconnect = 0 privacy = 0 announce = 0 resetcdr = 0 numsubst = "6018", '\0' <repeats 75 times>, "�" restofit = '\0' <repeats 64 times>, "\\���k6\006\b\000\000\000\000\000\000\000" transfer = 0xbcdff468 "rt" newnum = 0x81ff190 "SIP/6018-a729" callerid = '\0' <repeats 20 times>, "\\���b���\177�\n\b����p0\006\b\\���4�\035\b�\000\000\000\000\000\000\000\005\000\000\000\000\000\000\000\\���", '\0' <repeats 28 times>, "��\035\b", '\0' <repeats 155 times> l = 0x0 n = 0x0 url = 0x0 current = (struct ast_var_t *) 0x81da8e8 newvar = (struct ast_var_t *) 0xbcdfe1a4 go_on = 0 ASTERISK-4 0x08061c50 in pbx_exec (c=0x81da8e8, app=0x812b958, data=0xbcdff77c, newstack=1) at pbx.c:396 res = 0 ---Type <return> to continue, or q <return> to quit--- stack = -1 execute = (int (*)()) 0x404ddee4 <dial_exec> ASTERISK-5 0x08063cf3 in pbx_extension_helper (c=0x81da8e8, context=0x81daa40 "corporate", exten=0x81dab34 "6018", priority=1, callerid=0x8140a98 "8013193151", action=1) at pbx.c:1171 callerid = 0x812b958 "Dial" action = -1126172804 e = (struct ast_exten *) 0x811b5b8 app = (struct ast_app *) 0x812b958 sw = (struct ast_switch *) 0x0 data = 0x0 newstack = 1 res = -512 status = 4 incstack = {0x0 <repeats 20 times>, 0x9 <Address 0x9 out of bounds>, 0x8070f42 "\203�\020\205�u\a\212\0043\210\0047GC\200<3", 0x80b25d0 "( )-.", 0x31 <Address 0x31 out of bounds>, 0x0, 0x0, 0xbcdff93c "8013193151", 0x81cead8 "8013193151", 0x0, 0x81ced98 "1078955259.3673", 0x81cead8 "8013193151", 0x0, 0xbcdff98c "4�\035\b\001", 0x8073413 "1�\215e\210[^_\211�]�\220U\211�\203�lWVS\213}\b��,\v\b\203��\213�<\027", 0x81ced98 "1078955259.3673", 0x81dc094 "1078955259.3673", 0x1f <Address 0x1f out of bounds>, 0x40155c18 "\213E�\203�\\\211E�\213U�\213M�\213Rh\211U�9�\017\204�", 0x4002dd14 "\024�", 0xbcdffbe0 "����", 0x0, 0x4002dd14 "\024�", 0x402140a0 "", 0x81cead8 "8013193151", 0xbcdff94c "", 0x40025d7a "��\215t&", 0x0, 0xbcdff93c "8013193151", 0x33313038 <Address 0x33313038 out of bounds>, 0x31333931 <Address 0x31333931 out of bounds>, 0x3135 <Address 0x3135 out of bounds>, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8073b0f "\215e�[^\211�]�U\211�\203�\024S\213]\b\205�\017\204\225", 0x81ced58 "�\214O@u+\017", 0x0, 0x0, 0x0, 0x81daa40 "corporate", 0x81dab34 "6018"} passdata = "SIP/6018|30|rt", '\0' <repeats 241 times> stacklen = 0 tmp = "\e[1;36;40mDial\e[0;37;40m", '\0' <repeats 55 times> tmp2 = "\e[1;35;40mZap/6-1\e[0;37;40m", '\0' <repeats 52 times> tmp3 = "\e[1;35;40mSIP/6018|30|rt\e[0;37;40m\000m", '\0' <repeats 219 times> ASTERISK-6 0x08064a2d in ast_pbx_run (c=0x81da8e8) at pbx.c:1655 digit = 0 '\0' exten = '\0' <repeats 255 times> pos = 0 waittime = -1126171956 res = 0 ASTERISK-7 0x0806af8e in pbx_thread (data=0x81da8e8) at pbx.c:1880 data = (void *) 0xfffffe00 ASTERISK-8 0x40024e51 in pthread_start_thread () from /lib/libpthread.so.0 No symbol table info available. ASTERISK-9 0x401bb64a in clone () from /lib/libc.so.6 here are some vars from #3 print *(struct mohdata *)moh {pipe = {1634100580, 7629941}, origwfmt = 0, parent = 0x0, next = 0x0} print len -512 Unfortunately the mpg123 processes were terminated by Mark earlier today so I was unable to get any information from those processes. | ||
Comments: | By: jjanzer (jjanzer) 2004-03-11 02:29:14.000-0600 Btw, I tried to reproduce this in a variety of ways, but I was totally incapable of reproducing them after asterisk was restarted. Oh and your guess is as good as mine why len = -512 By: jjanzer (jjanzer) 2004-03-12 14:14:50.000-0600 I added threadApllyAllBTFull.log just now, since I was able to recreate the problem on a development box (this has no zap hardware). Note: I am running mpg123 with varaible bitrates (I think this may be causing mpg123 to act funkier than normal, which is what I want since I would like to find the root of the problem). I am not running ztdummy. ps aux --forest root 24469 0.0 0.8 2340 1052 pts/4 S 10:29 0:00 /bin/sh /usr/sbin/safe_asterisk root 24471 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24472 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24473 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24475 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24474 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24477 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24579 0.0 0.5 3768 700 pts/4 S 10:38 0:00 | \_ mpg123 -q -s --mono -r 8000 -b 2048 -f 8192 2.mp3 root 24580 0.0 0.3 3756 472 pts/4 S 10:38 0:00 | \_ mpg123 -q -s --mono -r 8000 -b 2048 -f 8192 2.mp3 root 24483 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24481 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24482 0.0 0.5 3768 700 pts/4 S 10:29 0:00 | \_ mpg123 -q -s --mono -r 8000 -b 2048 -f 8192 2.mp3 root 24484 0.0 0.3 3756 464 pts/4 S 10:29 0:00 | \_ mpg123 -q -s --mono -r 8000 -b 2048 -f 8192 2.mp3 root 24485 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24486 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24487 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24488 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24489 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24492 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24495 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24496 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24498 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24497 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24499 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24506 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24515 0.0 4.9 50740 6232 pts/4 S 10:29 0:00 \_ asterisk -vvvg -c root 24562 0.0 4.9 50740 6232 pts/4 S 10:36 0:00 \_ asterisk -vvvg -c 24562 is the thread waiting for a read() call to return, notice that there aren't any children... edited on: 03-12-04 13:03 By: jjanzer (jjanzer) 2004-03-17 20:21:31.000-0600 I think I have fixed this problem: read was being called with a pipe set to "0", which caused the read to lock. Instead of blindly calling read, I check to see if pipe[0] (the fd/pipe) is NULL, if it is, then we return 0, which will allow time for everything to readjust and right itself. I was able to reproduce the bug by playing the attached "2.mp3" in my musiconhold, and repeatedly hitting hold/resume on my phone after a 2nd call was made into a queue.... Logged into queue dynamically on phone A, called into queue from phone B. Pressed hold/resume (from phone a) a few times and hung up. Made 2nd call from phone B into queue. Pressed hold/resume very quickly until it caused the hang. After my simple patch "bug1202.diff" it catches the NULL pipe and is properly handled just fine. By: Mark Spencer (markster) 2004-03-19 03:35:33.000-0600 pipe[0] can't be "NULL" or "0". If it fails to open it should be -1. If it's being set to "0" that would suggest some very serious problem. Perhaps running under valgrind is in order? By: jjanzer (jjanzer) 2004-03-19 04:12:02.000-0600 I can do that tomorrow on my test enviroment. I will update later in the day with any info. By: Mark Spencer (markster) 2004-03-21 03:07:25.000-0600 How did it go? By: alric (alric) 2004-03-22 11:49:24.000-0600 Reminder sent to jjanzer Any update on this bug? How did your tests go in your test environment? Alric By: jjanzer (jjanzer) 2004-03-22 17:47:33.000-0600 Sorry I took so long. I am unable to reproduce this problem under valgrind. I will keep trying though. By: Mark Spencer (markster) 2004-04-02 02:45:55.000-0600 Any luck reproducing under valgrind? By: Mark Spencer (markster) 2004-04-02 16:59:12.000-0600 I believe I found the generator race that caused this bug for you and it should be fixed in CVS now. Thanks! By: Mark Spencer (markster) 2004-04-05 17:01:55 Feel free to reopen if you still have this problem but I believe it's been fixed. By: Digium Subversion (svnbot) 2008-01-15 14:49:36.000-0600 Repository: asterisk Revision: 2604 U trunk/channel.c ------------------------------------------------------------------------ r2604 | markster | 2008-01-15 14:49:35 -0600 (Tue, 15 Jan 2008) | 2 lines Fix generator race (bug ASTERISK-1195) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=2604 |