[Home]

Summary:ASTERISK-01195: Potential lockup when read() from a pipe to mpg123 causes channel to become unusable and pbx turns call away
Reporter:jjanzer (jjanzer)Labels:
Date Opened:2004-03-11 02:26:29.000-0600Date Closed:2008-01-15 14:49:36.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 2.mp3
( 1) bug1202.diff
( 2) threadApllyAllBTFull.log
Description:Just to start I am unsure on the exact cause of this issue, but here goes.
At some point during the run of asterisk a call gets transferred (I believe this is how it's triggered, but I'm still unsure) and somehow the person hangs up but asterisk doesn't fully hangup the call (as in it is still listed when doing show channels, if I do a 'show channel' it states:
   Blocking in: (Not Blocking)
as opposed to a good channel which states:
   Blocking in: ast_waitfor_nandfds

Ok so that wouldn't really be a big issue, we could just restart asterisk after all the calls are over, but, the real problem is that when someone calls into asterisk on that channel they get a fast busy or similar error.

I talked to kram, and looked myself at the problem for about 5 hours tonight and I believe I have tracked it down. What I think is happening is that a PIPE from mpg123 is nuked triggering a SIG_PIPE call, which since not-too-long ago is now under SIG_IGN. From here it sits in a wait() or something similar. Thus the channel is never given a chance to fully hangup since it never terminates, and read never returns.

Again this is just a theory, I am not sure if this is the case...


The temporary solution seems to be to turn off music on hold as suggested by Mark.



****** ADDITIONAL INFORMATION ******

here is a bt full of one of the threads having the issue:

0x4002aaab in read () from /lib/libpthread.so.0
(gdb)
(gdb)
(gdb) bt full
#0  0x4002aaab in read () from /lib/libpthread.so.0
No symbol table info available.
#1  0x4021b7e4 in ?? () from /usr/lib/asterisk/modules/res_musiconhold.so
No symbol table info available.
#2  0x00000014 in ?? ()
No symbol table info available.
#3  0x402191a8 in moh_generate (chan=0x81da8e8, data=0x40709390, len=160, samples=160) at res_musiconhold.c:449
       data = (void *) 0xfffffe00
       len = -512
       f = {frametype = 136161728, subclass = 135855252, datalen = -1126175796, samples = 1073896826, mallocd = 136161744, offset = 0, src = 0x0,
 data = 0x40027570, prev = 0x81da8e8, next = 0x81da8e8}
       buf = {0 <repeats 32 times>, -3503, -4222, -2678, -583, 1457, 1148, -294, -535, 11, 628, 832, 496, -737, -1764, -2557, -1896, -1283, -1297, -1428,
 -1863, -2278, -929, 1230, 2579, 2901, 1346, -318, -772, 166, 472, 351, -743, -1844, -2041, -726, 1058, 2546, 2906, 2397, 2089, 2094, 2808, 3027, 2835, 2124,
 1509, 955, 1084, 2019, 1745, 177, -915, -1457, -183, 1646, 2936, 3130, 1999, 1500, 1074, 1139, 512, 323, -487, -1096, -746, -854, -96, 334, 714, -152, -553,
 -559, 391, 966, 1072, 342, -1014, -1805, -1821, -1190, -1174, -1418, -2071, -2342, -1188, 409, 906, 1296, 1017, 182, -199, 100, 295, 542, 581, -250, -606,
 -958, -330, 208, 393, -479, -1026, -1395, -645, 475, 592, -207, -527, -1293, -1679, -918, -490, -500, -1108, -1961, -1628, 118, 1823, 2323, 1724, 954, 352,
 657, 1739, 2130, 1765, 525, -43, -453, -592, -497, -200, -652, -998, -1059, -965, -267, 99, -319, -740, -1337, -1680, -1491, -932, -325, -447, -1162, -1016,
 459, 1244, 1117, 316, -92, -223, 374, 669, 405, 335, 0, 0, 1, 0, 2, 0 <repeats 21 times>, -6820, -17185, -7356, -17185, -3649, 16400, 21117, 16416, 15392,
 26995, 14960, 12344, 0, 0, 0...}
       res = 320
#4  0x08058a18 in ast_read (chan=0x81da8e8) at channel.c:1160
       res = -512
       chan = (struct ast_channel *) 0x81da8e8
       f = (struct ast_frame *) 0x818fc94
       blah = -1126175616
       func = (int (*)()) 0x818fc94
       data = (void *) 0x805afe2
       null_frame = {frametype = 5, subclass = 0, datalen = 0, samples = 0, mallocd = 0, offset = 0, src = 0x0, data = 0x0, prev = 0x0, next = 0x0}
ASTERISK-1  0x0805b01e in ast_channel_bridge (c0=0x81da8e8, c1=0x81ff190, flags=2, fo=0xbcdfecf4, rc=0xbcdfecf8) at channel.c:2268
       c0 = (struct ast_channel *) 0x81da8e8
       c1 = (struct ast_channel *) 0x81ff190
       cs = {0x81ff190, 0x81da8e8, 0x81da8e8}
       to = -1
       f = (struct ast_frame *) 0x0
       who = (struct ast_channel *) 0x81da8e8
       res = 0
       nativefailed = 0
ASTERISK-2  0x4022479e in ast_bridge_call (chan=0x81da8e8, peer=0x81ff190, allowredirect_in=1, allowredirect_out=0, allowdisconnect=0) at res_parking.c:250
       f = (struct ast_frame *) 0x40709390
       who = (struct ast_channel *) 0x81ff190
       newext = "\000=\002@K]\002@n\217\002@(&#65533;&#65533;&#65533;\022<\002@\000\000\000\00006!@\000\000\000\00006!@\024&#65533;\002@@\036\f\b\004\000\000\000L&#65533;&#65533;&#65533;z]\002@P\036\f\b\000\000\000\000&#65533;&#65533;\002@\220&#65533;\037\b\220&#65533;\037\b\000\000\000\000\234&#65533;&#65533;&#65533;;&#65533;\005\b@\036\f\b\004\000\000\000|&#65533;&#65533;&#65533;z]\002@\220&#65533;\037\b\004\000\000\000\004&#65533;&#65533;&#65533;&#65533;&#65533;\035\b\220&#65533;\037\b\004\000\000\000\004&#65533;&#65533;&#65533;;&#65533;\005\b@\036\f\b&#65533;@!@&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;\232;\000\000\000\000\000\000\000\000&#65533;&#65533;&#65533;&#65533;\027\226\005\b\004\000\000\000\004\000\000\000&#65533;&#65533;&#65533;&#65533;n\200\b\b&#65533;&#65533;\035\b\220&#65533;\037\b\020\000\000\000&#65533;&#65533;\232;"...
---Type <return> to continue, or q <return> to quit---
       ptr = 0xffffffff <Address 0xffffffff out of bounds>
       res = 0
       aoh = (struct ast_option_header *) 0x140
       transferer = (struct ast_channel *) 0x0
       transferee = (struct ast_channel *) 0x0
       transferer_real_context = 0x0
ASTERISK-3  0x404df1b7 in dial_exec (chan=0x81da8e8, data=0xbcdff77c) at app_dial.c:697
       data = (void *) 0x81ff190
       res = 0
       u = (struct localuser *) 0x8143800
       info = "SIP\0006018\00030\000rt", '\0' <repeats 241 times>
       peers = 0x81ff190 "SIP/6018-a729"
       timeout = 0xbcdff465 "30"
       number = 0x81da8e8 "Zap/6-1"
       rest = 0x0
       cur = 0x0
       privdb = '\0' <repeats 255 times>
       s = 0x81da8e8 "Zap/6-1"
       announcemsg = '\0' <repeats 255 times>
       ann = 0x81ff190 "SIP/6018-a729"
       outgoing = (struct localuser *) 0x0
       tmp = (struct localuser *) 0x81d05c8
       peer = (struct ast_channel *) 0x81ff190
       to = 23870
       allowredir_in = 1
       allowredir_out = 0
       allowdisconnect = 0
       privacy = 0
       announce = 0
       resetcdr = 0
       numsubst = "6018", '\0' <repeats 75 times>, "&#65533;"
       restofit = '\0' <repeats 64 times>, "\\&#65533;&#65533;&#65533;k6\006\b\000\000\000\000\000\000\000"
       transfer = 0xbcdff468 "rt"
       newnum = 0x81ff190 "SIP/6018-a729"
       callerid = '\0' <repeats 20 times>, "\\&#65533;&#65533;&#65533;b&#65533;&#65533;&#65533;\177&#65533;\n\b&#65533;&#65533;&#65533;&#65533;p0\006\b\\&#65533;&#65533;&#65533;4&#65533;\035\b&#65533;\000\000\000\000\000\000\000\005\000\000\000\000\000\000\000\\&#65533;&#65533;&#65533;", '\0' <repeats 28 times>, "&#65533;&#65533;\035\b", '\0' <repeats 155 times>
       l = 0x0
       n = 0x0
       url = 0x0
       current = (struct ast_var_t *) 0x81da8e8
       newvar = (struct ast_var_t *) 0xbcdfe1a4
       go_on = 0
ASTERISK-4  0x08061c50 in pbx_exec (c=0x81da8e8, app=0x812b958, data=0xbcdff77c, newstack=1) at pbx.c:396
       res = 0
---Type <return> to continue, or q <return> to quit---
       stack = -1
       execute = (int (*)()) 0x404ddee4 <dial_exec>
ASTERISK-5  0x08063cf3 in pbx_extension_helper (c=0x81da8e8, context=0x81daa40 "corporate", exten=0x81dab34 "6018", priority=1, callerid=0x8140a98 "8013193151",
   action=1) at pbx.c:1171
       callerid = 0x812b958 "Dial"
       action = -1126172804
       e = (struct ast_exten *) 0x811b5b8
       app = (struct ast_app *) 0x812b958
       sw = (struct ast_switch *) 0x0
       data = 0x0
       newstack = 1
       res = -512
       status = 4
       incstack = {0x0 <repeats 20 times>, 0x9 <Address 0x9 out of bounds>, 0x8070f42 "\203&#65533;\020\205&#65533;u\a\212\0043\210\0047GC\200<3", 0x80b25d0 "( )-.",
 0x31 <Address 0x31 out of bounds>, 0x0, 0x0, 0xbcdff93c "8013193151", 0x81cead8 "8013193151", 0x0, 0x81ced98 "1078955259.3673", 0x81cead8 "8013193151", 0x0,
 0xbcdff98c "4&#65533;\035\b\001", 0x8073413 "1&#65533;\215e\210[^_\211&#65533;]&#65533;\220U\211&#65533;\203&#65533;lWVS\213}\b&#65533;&#65533;,\v\b\203&#65533;&#65533;\213&#65533;<\027", 0x81ced98 "1078955259.3673",
 0x81dc094 "1078955259.3673", 0x1f <Address 0x1f out of bounds>, 0x40155c18 "\213E&#65533;\203&#65533;\\\211E&#65533;\213U&#65533;\213M&#65533;\213Rh\211U&#65533;9&#65533;\017\204&#65533;", 0x4002dd14 "\024&#65533;",
 0xbcdffbe0 "&#65533;&#65533;&#65533;&#65533;", 0x0, 0x4002dd14 "\024&#65533;", 0x402140a0 "", 0x81cead8 "8013193151", 0xbcdff94c "", 0x40025d7a "&#65533;&#65533;\215t&", 0x0, 0xbcdff93c "8013193151",
 0x33313038 <Address 0x33313038 out of bounds>, 0x31333931 <Address 0x31333931 out of bounds>, 0x3135 <Address 0x3135 out of bounds>, 0x0, 0x0, 0x0, 0x0,
 0x0, 0x0, 0x8073b0f "\215e&#65533;[^\211&#65533;]&#65533;U\211&#65533;\203&#65533;\024S\213]\b\205&#65533;\017\204\225", 0x81ced58 "&#65533;\214O@u+\017", 0x0, 0x0, 0x0, 0x81daa40 "corporate",
 0x81dab34 "6018"}
       passdata = "SIP/6018|30|rt", '\0' <repeats 241 times>
       stacklen = 0
       tmp = "\e[1;36;40mDial\e[0;37;40m", '\0' <repeats 55 times>
       tmp2 = "\e[1;35;40mZap/6-1\e[0;37;40m", '\0' <repeats 52 times>
       tmp3 = "\e[1;35;40mSIP/6018|30|rt\e[0;37;40m\000m", '\0' <repeats 219 times>
ASTERISK-6 0x08064a2d in ast_pbx_run (c=0x81da8e8) at pbx.c:1655
       digit = 0 '\0'
       exten = '\0' <repeats 255 times>
       pos = 0
       waittime = -1126171956
       res = 0
ASTERISK-7 0x0806af8e in pbx_thread (data=0x81da8e8) at pbx.c:1880
       data = (void *) 0xfffffe00
ASTERISK-8 0x40024e51 in pthread_start_thread () from /lib/libpthread.so.0
No symbol table info available.
ASTERISK-9 0x401bb64a in clone () from /lib/libc.so.6


here are some vars from #3
print *(struct mohdata *)moh
{pipe = {1634100580, 7629941}, origwfmt = 0, parent = 0x0, next = 0x0}

print len
-512


Unfortunately the mpg123 processes were terminated by Mark earlier today so I was unable to get any information from those processes.
Comments:By: jjanzer (jjanzer) 2004-03-11 02:29:14.000-0600

Btw, I tried to reproduce this in a variety of ways, but I was totally incapable of reproducing them after asterisk was restarted.

Oh and your guess is as good as mine why len = -512

By: jjanzer (jjanzer) 2004-03-12 14:14:50.000-0600

I added threadApllyAllBTFull.log just now, since I was able to recreate the problem on a development box (this has no zap hardware).

Note: I am running mpg123 with varaible bitrates (I think this may be causing mpg123 to act funkier than normal, which is what I want since I would like to find the root of the problem). I am not running ztdummy.


ps aux --forest

root     24469  0.0  0.8  2340 1052 pts/4    S    10:29   0:00 /bin/sh /usr/sbin/safe_asterisk
root     24471  0.0  4.9 50740 6232 pts/4    S    10:29   0:00  \_ asterisk -vvvg -c
root     24472  0.0  4.9 50740 6232 pts/4    S    10:29   0:00      \_ asterisk -vvvg -c
root     24473  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24475  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24474  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24477  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24579  0.0  0.5  3768  700 pts/4    S    10:38   0:00          |   \_ mpg123 -q -s --mono -r 8000 -b 2048 -f 8192 2.mp3
root     24580  0.0  0.3  3756  472 pts/4    S    10:38   0:00          |       \_ mpg123 -q -s --mono -r 8000 -b 2048 -f 8192 2.mp3
root     24483  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24481  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24482  0.0  0.5  3768  700 pts/4    S    10:29   0:00          |   \_ mpg123 -q -s --mono -r 8000 -b 2048 -f 8192 2.mp3
root     24484  0.0  0.3  3756  464 pts/4    S    10:29   0:00          |       \_ mpg123 -q -s --mono -r 8000 -b 2048 -f 8192 2.mp3
root     24485  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24486  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24487  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24488  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24489  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24492  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24495  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24496  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24498  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24497  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24499  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24506  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24515  0.0  4.9 50740 6232 pts/4    S    10:29   0:00          \_ asterisk -vvvg -c
root     24562  0.0  4.9 50740 6232 pts/4    S    10:36   0:00          \_ asterisk -vvvg -c


24562 is the thread waiting for a read() call to return, notice that there aren't any children...

edited on: 03-12-04 13:03

By: jjanzer (jjanzer) 2004-03-17 20:21:31.000-0600

I think I have fixed this problem:
read was being called with a pipe set to "0", which caused the read to lock. Instead of blindly calling read, I check to see if pipe[0] (the fd/pipe) is NULL, if it is, then we return 0, which will allow time for everything to readjust and right itself.

I was able to reproduce the bug by playing the attached "2.mp3" in my musiconhold, and repeatedly hitting hold/resume on my phone after a 2nd call was made into a queue....
Logged into queue dynamically on phone A, called into queue from phone B. Pressed hold/resume (from phone a) a few times and hung up. Made 2nd call from phone B into queue. Pressed hold/resume very quickly until it caused the hang.

After my simple patch "bug1202.diff" it catches the NULL pipe and is properly handled just fine.

By: Mark Spencer (markster) 2004-03-19 03:35:33.000-0600

pipe[0] can't be "NULL" or "0".  If it fails to open it should be -1.  If it's being set to "0" that would suggest some very serious problem.  Perhaps running under valgrind is in order?

By: jjanzer (jjanzer) 2004-03-19 04:12:02.000-0600

I can do that tomorrow on my test enviroment. I will update later in the day with any info.

By: Mark Spencer (markster) 2004-03-21 03:07:25.000-0600

How did it go?

By: alric (alric) 2004-03-22 11:49:24.000-0600

Reminder sent to jjanzer

Any update on this bug?  How did your tests go in your test environment?

Alric

By: jjanzer (jjanzer) 2004-03-22 17:47:33.000-0600

Sorry I took so long. I am unable to reproduce this problem under valgrind. I will keep trying though.

By: Mark Spencer (markster) 2004-04-02 02:45:55.000-0600

Any luck reproducing under valgrind?

By: Mark Spencer (markster) 2004-04-02 16:59:12.000-0600

I believe I found the generator race that caused this bug for you and it should be fixed in CVS now.  Thanks!

By: Mark Spencer (markster) 2004-04-05 17:01:55

Feel free to reopen if you still have this problem but I believe it's been fixed.

By: Digium Subversion (svnbot) 2008-01-15 14:49:36.000-0600

Repository: asterisk
Revision: 2604

U   trunk/channel.c

------------------------------------------------------------------------
r2604 | markster | 2008-01-15 14:49:35 -0600 (Tue, 15 Jan 2008) | 2 lines

Fix generator race (bug ASTERISK-1195)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=2604