Summary:ASTERISK-13507: Crash - maybe in filestream_destructor
Reporter:Matt Riddell (zx81)Labels:
Date Opened:2009-02-03 16:28:20.000-0600Date Closed:2009-04-09 11:03:05
Versions:Frequency of
Environment:Attachments:( 0) 14395.patch
Description:I'm really not sure what caused this crash, but here's the bt:

(gdb) bt
#0  0xb7f34410 in ?? ()
#1  0xb6a43efc in ?? ()
#2  0x00000006 in ?? ()
#3  0x000078b3 in ?? ()
#4  0xb7d90811 in raise () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-1  0xb7d91fb9 in abort () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-2  0xb7dc5dfa in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-3  0xb7dcd68f in mallopt () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-4  0xb7dcd732 in free () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-5  0xb7dbcf01 in fclose () from /lib/tls/i686/cmov/libc.so.6
ASTERISK-6 0x080a386b in filestream_destructor (arg=0x825a150) at file.c:338
ASTERISK-7 0x0807433e in ao2_ref (user_data=0x825a150, delta=-1) at astobj2.c:229
ASTERISK-8 0x080a6258 in ast_filestream_frame_freed (fr=0x825a180) at file.c:1299
ASTERISK-9 0x080a7666 in ast_frame_free (fr=0x825a180, cache=1) at frame.c:329
ASTERISK-10 0xb7cac6f9 in moh_files_generator (chan=0xb6b0c7c8, data=0x826f238, len=160, samples=160) at res_musiconhold.c:287
ASTERISK-11 0x080859c9 in ast_read_generator_actions (chan=0xb6b0c7c8, f=0x81ad54c) at channel.c:1968
ASTERISK-12 0x080871ba in __ast_read (chan=0xb6b0c7c8, dropaudio=0) at channel.c:2400
ASTERISK-13 0x08087440 in ast_read (chan=0xb6b0c7c8) at channel.c:2434
ASTERISK-14 0x0808bd6d in ast_generic_bridge (c0=0xb6b0c7c8, c1=0x82f3fd0, config=0xb6a464fc, fo=0xb6a44b50, rc=0xb6a44b4c, bridge_end=
     {tv_sec = 0, tv_usec = 0}) at channel.c:3962
ASTERISK-15 0x0808d1e8 in ast_channel_bridge (c0=0xb6b0c7c8, c1=0x82f3fd0, config=0xb6a464fc, fo=0xb6a44b50, rc=0xb6a44b4c)
   at channel.c:4285
ASTERISK-16 0xb7c99bc9 in ast_bridge_call (chan=0xb6b0c7c8, peer=0x82f3fd0, config=0xb6a464fc) at res_features.c:1548
ASTERISK-17 0xb74d8167 in try_calling (qe=0xb6a46840, options=0xb6a467e9 "", announceoverride=0x0, url=0x0, tries=0xb6a469e0,
   noption=0xb6a469dc, agi=0x0) at app_queue.c:3161
ASTERISK-18 0xb74db2f8 in queue_exec (chan=0xb6b0c7c8, data=0xb6a46b1c) at app_queue.c:4005
ASTERISK-19 0x080c18b7 in pbx_exec (c=0xb6b0c7c8, app=0x8223750, data=0xb6a46b1c) at pbx.c:537
ASTERISK-20 0xb6e3b7ce in realtime_exec (chan=0xb6b0c7c8, context=0xb6b0c948 "routing", exten=0xb6b0c998 "queue-Telesales_All_Regions",
   priority=3, callerid=0x8273c80 "078702004", data=0x82f593d "routing@extensions") at pbx_realtime.c:216
ASTERISK-21 0x080c4ddb in pbx_extension_helper (c=0xb6b0c7c8, con=0x0, context=0xb6b0c948 "routing",
   exten=0xb6b0c998 "queue-Telesales_All_Regions", priority=3, label=0x0, callerid=0x8273c80 "078702004", action=E_SPAWN)
   at pbx.c:1874
ASTERISK-22 0x080c5df9 in ast_spawn_extension (c=0xb6b0c7c8, context=0xb6b0c948 "routing", exten=0xb6b0c998 "queue-Telesales_All_Regions",
   priority=3, callerid=0x8273c80 "078702004") at pbx.c:2318
ASTERISK-23 0x080c6217 in __ast_pbx_run (c=0xb6b0c7c8) at pbx.c:2407
ASTERISK-24 0x080c7001 in pbx_thread (data=0xb6b0c7c8) at pbx.c:2622
ASTERISK-25 0x08104cbf in dummy_start (data=0xb6b10770) at utils.c:856
ASTERISK-26 0xb7f18240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-27 0xb7e3349e in clone () from /lib/tls/i686/cmov/libc.so.6
Comments:By: Mark Michelson (mmichelson) 2009-02-03 16:37:12.000-0600

I'm changing the category to res_musiconhold, although it's really a core file playback issue. Since the actual crash occurred because of res_musiconhold, though, it's about the closest I'm going to come with the categorization offered.

By: Matt Riddell (zx81) 2009-02-03 16:42:17.000-0600

Music on hold is:

moh show files
Class: default
       File: /var/lib/asterisk/moh/fpm-world-mix
       File: /var/lib/asterisk/moh/fpm-sunshine
       File: /var/lib/asterisk/moh/fpm-calm-river

ll /var/lib/asterisk/moh:

-rw-r--r-- 1 root root  969875 2006-09-21 05:38 fpm-calm-river.alaw
-rw-r--r-- 1 root root  200046 2006-09-21 05:38 fpm-calm-river.gsm
-rw-r--r-- 1 root root  969875 2006-09-21 05:38 fpm-calm-river.ulaw
-rw-r--r-- 1 root root 1939794 2006-09-21 05:38 fpm-calm-river.wav
-rw-r--r-- 1 root root 1291076 2006-09-21 05:38 fpm-sunshine.alaw
-rw-r--r-- 1 root root  266310 2006-09-21 05:38 fpm-sunshine.gsm
-rw-r--r-- 1 root root 1291076 2006-09-21 05:38 fpm-sunshine.ulaw
-rw-r--r-- 1 root root 2582196 2006-09-21 05:38 fpm-sunshine.wav
-rw-r--r-- 1 root root 1108637 2006-09-21 05:38 fpm-world-mix.alaw
-rw-r--r-- 1 root root  228657 2006-09-21 05:38 fpm-world-mix.gsm
-rw-r--r-- 1 root root 1108637 2006-09-21 05:38 fpm-world-mix.ulaw
-rw-r--r-- 1 root root 2217318 2006-09-21 05:38 fpm-world-mix.wav

df -h:

Filesystem            Size  Used Avail Use% Mounted on
/dev/md0              111G   27G   79G  26% /
tmpfs                 506M     0  506M   0% /lib/init/rw
udev                   10M   60K   10M   1% /dev
tmpfs                 506M     0  506M   0% /dev/shm

Nothing in syslog other than the usual - box been stable for two weeks - reasonably heavily loaded production machine.

By: Matt Riddell (zx81) 2009-02-03 16:44:38.000-0600

Slightly higher than normal load as there was a pretty major power outage in Auckland - this means that all calls have been diverted to Dunedin.

By: Matt Riddell (zx81) 2009-02-03 16:50:22.000-0600

Hmmm, the weird thing is that the queues are all called with options trn (transfer, ring instead of moh and no retries).

Maybe it's not from the "queue-Telesales_All_Regions"

By: Mark Michelson (mmichelson) 2009-02-03 17:07:25.000-0600

Actually it looks like music-on-hold was started after the call was bridged. Perhaps a party was placed on hold during the call?

I found a suspicious spot in file.c, and I'll be uploading a potential fix shortly.

By: Mark Michelson (mmichelson) 2009-02-03 17:11:12.000-0600

I've uploaded 14395.patch based on the fact that there is some pertinent code that "looks wrong" to me and might be the cause of this problem. I can't tell for certain though, based on the backtrace. I'd need valgrind output in order to be able to tell with more certainty.

Anyway, there is no risk in using this uploaded patch, and it might even fix the crash, too.

By: Matt Riddell (zx81) 2009-02-03 20:40:34.000-0600

Ok, I've applied the patch - will wait a couple of hours before restarting to make it live.

We've got a public holiday on Friday (Wednesday afternoon here now), so I could run in valgrind with a few hundred call files if it would help.

Should I just send both ends of the call to musiconhold app?

Oh and by the way - thanks as always for your extremely fast response!

By: Mark Michelson (mmichelson) 2009-02-03 21:00:32.000-0600

If you have downtime and don't mind running Asterisk under valgrind, that would be great.

I think sending both sides of the call to the musiconhold app would do the trick. It seems like it's a matter of chance whether the problem will or won't occur, though. The problem actually occurs when musiconhold is finished, though. So a bunch of really short calls that have musiconhold started and then get hung up after a second or two would cause the issue to be more likely to happen.

Oh, and just as a reminder (even though valgrind.txt says this), running under valgrind may actually cause Asterisk not to crash, but it may log useful output anyway. So if you run some tests with valgrind and don't get any crashes to happen, just remember that its output may still contain useful information.

By: Matt Riddell (zx81) 2009-02-04 00:58:17.000-0600

Yeah, we actually run SmoothTorque under valgrind all the time, and if it ever crashes it uses mutt to send me the valgrind logs :)

I'll run a campaign of a few calls a second, with an absolute timeout in extensions.conf followed by the musiconhold app.

I'll post back here once I've run some tests (the patch is in place now and the machine restarted).

By: Mark Michelson (mmichelson) 2009-02-11 16:16:02.000-0600

I was wondering how this was progressing. Any further problems?

By: Matt Riddell (zx81) 2009-02-12 16:25:15.000-0600

Nah, seems to be fine now - didn't manage to get a campaign run as they ended up staying open for the public holiday, but this can be closed out - I'll find you if it has more problems :D

By: Digium Subversion (svnbot) 2009-02-12 17:22:46.000-0600

Repository: asterisk
Revision: 175407

U   branches/1.4/main/file.c

r175407 | mmichelson | 2009-02-12 17:22:46 -0600 (Thu, 12 Feb 2009) | 12 lines

Fix a place where filestreams were not refcounted properly

This section was already present in trunk and other branches,
but did not exist in 1.4.

(closes issue ASTERISK-13507)
Reported by: ZX81
     14395.patch uploaded by putnopvut (license 60)
Tested by: ZX81



By: Digium Subversion (svnbot) 2009-02-12 17:23:47.000-0600

Repository: asterisk
Revision: 175408

_U  trunk/

r175408 | mmichelson | 2009-02-12 17:23:47 -0600 (Thu, 12 Feb 2009) | 17 lines

Blocked revisions 175407 via svnmerge

 r175407 | mmichelson | 2009-02-12 17:22:44 -0600 (Thu, 12 Feb 2009) | 12 lines
 Fix a place where filestreams were not refcounted properly
 This section was already present in trunk and other branches,
 but did not exist in 1.4.
 (closes issue ASTERISK-13507)
 Reported by: ZX81
       14395.patch uploaded by putnopvut (license 60)
 Tested by: ZX81



By: Digium Subversion (svnbot) 2009-02-12 17:24:35.000-0600

Repository: asterisk
Revision: 175409

_U  branches/1.6.1/

r175409 | mmichelson | 2009-02-12 17:24:35 -0600 (Thu, 12 Feb 2009) | 23 lines

Blocked revisions 175408 via svnmerge

 r175408 | mmichelson | 2009-02-12 17:23:47 -0600 (Thu, 12 Feb 2009) | 17 lines
 Blocked revisions 175407 via svnmerge
   r175407 | mmichelson | 2009-02-12 17:22:44 -0600 (Thu, 12 Feb 2009) | 12 lines
   Fix a place where filestreams were not refcounted properly
   This section was already present in trunk and other branches,
   but did not exist in 1.4.
   (closes issue ASTERISK-13507)
   Reported by: ZX81
         14395.patch uploaded by putnopvut (license 60)
   Tested by: ZX81