Summary: | ASTERISK-13507: Crash - maybe in filestream_destructor | ||
Reporter: | Matt Riddell (zx81) | Labels: | |
Date Opened: | 2009-02-03 16:28:20.000-0600 | Date Closed: | 2009-04-09 11:03:05 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Resources/res_musiconhold |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 14395.patch | |
Description: | I'm really not sure what caused this crash, but here's the bt: (gdb) bt #0 0xb7f34410 in ?? () #1 0xb6a43efc in ?? () #2 0x00000006 in ?? () #3 0x000078b3 in ?? () #4 0xb7d90811 in raise () from /lib/tls/i686/cmov/libc.so.6 ASTERISK-1 0xb7d91fb9 in abort () from /lib/tls/i686/cmov/libc.so.6 ASTERISK-2 0xb7dc5dfa in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6 ASTERISK-3 0xb7dcd68f in mallopt () from /lib/tls/i686/cmov/libc.so.6 ASTERISK-4 0xb7dcd732 in free () from /lib/tls/i686/cmov/libc.so.6 ASTERISK-5 0xb7dbcf01 in fclose () from /lib/tls/i686/cmov/libc.so.6 ASTERISK-6 0x080a386b in filestream_destructor (arg=0x825a150) at file.c:338 ASTERISK-7 0x0807433e in ao2_ref (user_data=0x825a150, delta=-1) at astobj2.c:229 ASTERISK-8 0x080a6258 in ast_filestream_frame_freed (fr=0x825a180) at file.c:1299 ASTERISK-9 0x080a7666 in ast_frame_free (fr=0x825a180, cache=1) at frame.c:329 ASTERISK-10 0xb7cac6f9 in moh_files_generator (chan=0xb6b0c7c8, data=0x826f238, len=160, samples=160) at res_musiconhold.c:287 ASTERISK-11 0x080859c9 in ast_read_generator_actions (chan=0xb6b0c7c8, f=0x81ad54c) at channel.c:1968 ASTERISK-12 0x080871ba in __ast_read (chan=0xb6b0c7c8, dropaudio=0) at channel.c:2400 ASTERISK-13 0x08087440 in ast_read (chan=0xb6b0c7c8) at channel.c:2434 ASTERISK-14 0x0808bd6d in ast_generic_bridge (c0=0xb6b0c7c8, c1=0x82f3fd0, config=0xb6a464fc, fo=0xb6a44b50, rc=0xb6a44b4c, bridge_end= {tv_sec = 0, tv_usec = 0}) at channel.c:3962 ASTERISK-15 0x0808d1e8 in ast_channel_bridge (c0=0xb6b0c7c8, c1=0x82f3fd0, config=0xb6a464fc, fo=0xb6a44b50, rc=0xb6a44b4c) at channel.c:4285 ASTERISK-16 0xb7c99bc9 in ast_bridge_call (chan=0xb6b0c7c8, peer=0x82f3fd0, config=0xb6a464fc) at res_features.c:1548 ASTERISK-17 0xb74d8167 in try_calling (qe=0xb6a46840, options=0xb6a467e9 "", announceoverride=0x0, url=0x0, tries=0xb6a469e0, noption=0xb6a469dc, agi=0x0) at app_queue.c:3161 ASTERISK-18 0xb74db2f8 in queue_exec (chan=0xb6b0c7c8, data=0xb6a46b1c) at app_queue.c:4005 ASTERISK-19 0x080c18b7 in pbx_exec (c=0xb6b0c7c8, app=0x8223750, data=0xb6a46b1c) at pbx.c:537 ASTERISK-20 0xb6e3b7ce in realtime_exec (chan=0xb6b0c7c8, context=0xb6b0c948 "routing", exten=0xb6b0c998 "queue-Telesales_All_Regions", priority=3, callerid=0x8273c80 "078702004", data=0x82f593d "routing@extensions") at pbx_realtime.c:216 ASTERISK-21 0x080c4ddb in pbx_extension_helper (c=0xb6b0c7c8, con=0x0, context=0xb6b0c948 "routing", exten=0xb6b0c998 "queue-Telesales_All_Regions", priority=3, label=0x0, callerid=0x8273c80 "078702004", action=E_SPAWN) at pbx.c:1874 ASTERISK-22 0x080c5df9 in ast_spawn_extension (c=0xb6b0c7c8, context=0xb6b0c948 "routing", exten=0xb6b0c998 "queue-Telesales_All_Regions", priority=3, callerid=0x8273c80 "078702004") at pbx.c:2318 ASTERISK-23 0x080c6217 in __ast_pbx_run (c=0xb6b0c7c8) at pbx.c:2407 ASTERISK-24 0x080c7001 in pbx_thread (data=0xb6b0c7c8) at pbx.c:2622 ASTERISK-25 0x08104cbf in dummy_start (data=0xb6b10770) at utils.c:856 ASTERISK-26 0xb7f18240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 ASTERISK-27 0xb7e3349e in clone () from /lib/tls/i686/cmov/libc.so.6 | ||
Comments: | By: Mark Michelson (mmichelson) 2009-02-03 16:37:12.000-0600 I'm changing the category to res_musiconhold, although it's really a core file playback issue. Since the actual crash occurred because of res_musiconhold, though, it's about the closest I'm going to come with the categorization offered. By: Matt Riddell (zx81) 2009-02-03 16:42:17.000-0600 Music on hold is: moh show files Class: default File: /var/lib/asterisk/moh/fpm-world-mix File: /var/lib/asterisk/moh/fpm-sunshine File: /var/lib/asterisk/moh/fpm-calm-river ll /var/lib/asterisk/moh: -rw-r--r-- 1 root root 969875 2006-09-21 05:38 fpm-calm-river.alaw -rw-r--r-- 1 root root 200046 2006-09-21 05:38 fpm-calm-river.gsm -rw-r--r-- 1 root root 969875 2006-09-21 05:38 fpm-calm-river.ulaw -rw-r--r-- 1 root root 1939794 2006-09-21 05:38 fpm-calm-river.wav -rw-r--r-- 1 root root 1291076 2006-09-21 05:38 fpm-sunshine.alaw -rw-r--r-- 1 root root 266310 2006-09-21 05:38 fpm-sunshine.gsm -rw-r--r-- 1 root root 1291076 2006-09-21 05:38 fpm-sunshine.ulaw -rw-r--r-- 1 root root 2582196 2006-09-21 05:38 fpm-sunshine.wav -rw-r--r-- 1 root root 1108637 2006-09-21 05:38 fpm-world-mix.alaw -rw-r--r-- 1 root root 228657 2006-09-21 05:38 fpm-world-mix.gsm -rw-r--r-- 1 root root 1108637 2006-09-21 05:38 fpm-world-mix.ulaw -rw-r--r-- 1 root root 2217318 2006-09-21 05:38 fpm-world-mix.wav df -h: Filesystem Size Used Avail Use% Mounted on /dev/md0 111G 27G 79G 26% / tmpfs 506M 0 506M 0% /lib/init/rw udev 10M 60K 10M 1% /dev tmpfs 506M 0 506M 0% /dev/shm Nothing in syslog other than the usual - box been stable for two weeks - reasonably heavily loaded production machine. By: Matt Riddell (zx81) 2009-02-03 16:44:38.000-0600 Slightly higher than normal load as there was a pretty major power outage in Auckland - this means that all calls have been diverted to Dunedin. By: Matt Riddell (zx81) 2009-02-03 16:50:22.000-0600 Hmmm, the weird thing is that the queues are all called with options trn (transfer, ring instead of moh and no retries). Maybe it's not from the "queue-Telesales_All_Regions" By: Mark Michelson (mmichelson) 2009-02-03 17:07:25.000-0600 Actually it looks like music-on-hold was started after the call was bridged. Perhaps a party was placed on hold during the call? I found a suspicious spot in file.c, and I'll be uploading a potential fix shortly. By: Mark Michelson (mmichelson) 2009-02-03 17:11:12.000-0600 I've uploaded 14395.patch based on the fact that there is some pertinent code that "looks wrong" to me and might be the cause of this problem. I can't tell for certain though, based on the backtrace. I'd need valgrind output in order to be able to tell with more certainty. Anyway, there is no risk in using this uploaded patch, and it might even fix the crash, too. By: Matt Riddell (zx81) 2009-02-03 20:40:34.000-0600 Ok, I've applied the patch - will wait a couple of hours before restarting to make it live. We've got a public holiday on Friday (Wednesday afternoon here now), so I could run in valgrind with a few hundred call files if it would help. Should I just send both ends of the call to musiconhold app? Oh and by the way - thanks as always for your extremely fast response! By: Mark Michelson (mmichelson) 2009-02-03 21:00:32.000-0600 If you have downtime and don't mind running Asterisk under valgrind, that would be great. I think sending both sides of the call to the musiconhold app would do the trick. It seems like it's a matter of chance whether the problem will or won't occur, though. The problem actually occurs when musiconhold is finished, though. So a bunch of really short calls that have musiconhold started and then get hung up after a second or two would cause the issue to be more likely to happen. Oh, and just as a reminder (even though valgrind.txt says this), running under valgrind may actually cause Asterisk not to crash, but it may log useful output anyway. So if you run some tests with valgrind and don't get any crashes to happen, just remember that its output may still contain useful information. By: Matt Riddell (zx81) 2009-02-04 00:58:17.000-0600 Yeah, we actually run SmoothTorque under valgrind all the time, and if it ever crashes it uses mutt to send me the valgrind logs :) I'll run a campaign of a few calls a second, with an absolute timeout in extensions.conf followed by the musiconhold app. I'll post back here once I've run some tests (the patch is in place now and the machine restarted). By: Mark Michelson (mmichelson) 2009-02-11 16:16:02.000-0600 I was wondering how this was progressing. Any further problems? By: Matt Riddell (zx81) 2009-02-12 16:25:15.000-0600 Nah, seems to be fine now - didn't manage to get a campaign run as they ended up staying open for the public holiday, but this can be closed out - I'll find you if it has more problems :D By: Digium Subversion (svnbot) 2009-02-12 17:22:46.000-0600 Repository: asterisk Revision: 175407 U branches/1.4/main/file.c ------------------------------------------------------------------------ r175407 | mmichelson | 2009-02-12 17:22:46 -0600 (Thu, 12 Feb 2009) | 12 lines Fix a place where filestreams were not refcounted properly This section was already present in trunk and other branches, but did not exist in 1.4. (closes issue ASTERISK-13507) Reported by: ZX81 Patches: 14395.patch uploaded by putnopvut (license 60) Tested by: ZX81 ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=175407 By: Digium Subversion (svnbot) 2009-02-12 17:23:47.000-0600 Repository: asterisk Revision: 175408 _U trunk/ ------------------------------------------------------------------------ r175408 | mmichelson | 2009-02-12 17:23:47 -0600 (Thu, 12 Feb 2009) | 17 lines Blocked revisions 175407 via svnmerge ........ r175407 | mmichelson | 2009-02-12 17:22:44 -0600 (Thu, 12 Feb 2009) | 12 lines Fix a place where filestreams were not refcounted properly This section was already present in trunk and other branches, but did not exist in 1.4. (closes issue ASTERISK-13507) Reported by: ZX81 Patches: 14395.patch uploaded by putnopvut (license 60) Tested by: ZX81 ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=175408 By: Digium Subversion (svnbot) 2009-02-12 17:24:35.000-0600 Repository: asterisk Revision: 175409 _U branches/1.6.1/ ------------------------------------------------------------------------ r175409 | mmichelson | 2009-02-12 17:24:35 -0600 (Thu, 12 Feb 2009) | 23 lines Blocked revisions 175408 via svnmerge ................ r175408 | mmichelson | 2009-02-12 17:23:47 -0600 (Thu, 12 Feb 2009) | 17 lines Blocked revisions 175407 via svnmerge ........ r175407 | mmichelson | 2009-02-12 17:22:44 -0600 (Thu, 12 Feb 2009) | 12 lines Fix a place where filestreams were not refcounted properly This section was already present in trunk and other branches, but did not exist in 1.4. (closes issue ASTERISK-13507) Reported by: ZX81 Patches: 14395.patch uploaded by putnopvut (license 60) Tested by: ZX81 ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=175409 |