Summary: | ASTERISK-24604: res_rtp_asterisk: Crash during restart due to race condition in accessing codec in stored ast_frame and codec core | ||
Reporter: | Matt Jordan (mjordan) | Labels: | |
Date Opened: | 2014-12-10 13:14:14.000-0600 | Date Closed: | 2014-12-12 11:01:49.000-0600 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Core/CodecInterface Resources/res_rtp_asterisk |
Versions: | 13.0.2 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Attachments: | ||
Description: | A crash occurred while performing call recording and restarting Asterisk. A backtrace showed the following:
{noformat} Core was generated by `/usr/sbin/asterisk -f -vvvg -c'. Program terminated with signal 11, Segmentation fault. #0 0x08106bf3 in ast_codec_samples_count (frame=0xb7361244) at codec.c:363 363 if (codec->samples_count) { Missing separate debuginfos, use: debuginfo-install asterisk-6.0-429112.fl.i686 (gdb) bt full #0 0x08106bf3 in ast_codec_samples_count (frame=0xb7361244) at codec.c:363 codec = 0x0 samples = 0 __PRETTY_FUNCTION__ = "ast_codec_samples_count" #1 0x06df6ca3 in ast_rtp_read (instance=0xb730e154, rtcp=0) at res_rtp_asterisk.c:4599 rtp = 0xb7361240 addr = {ss = {ss_family = 2, __ss_align = 33950218, __ss_padding = '\000' <repeats 16 times>, "\001\000\000\000@\000\000\000\240c9\000\340\213\356\tL!\200\n\000\000\000\000\000\017=\000\250\035\f\264\316}\t\b\000\000\000\000\310\001\000\000\256}\"\b*|\"\b\200\213\356\t\377\377\377\377L{\"\b|\320\031\n\244}\"\bM<&\b\310\035\f\264\240\213\356\t\200\213\356\t\000\000\000\000\000\000\000\000\001\000\000\000X\000\000"}, len = 16} res = 172 hdrlen = 12 version = 0 payloadtype = 9 padding = 0 mark = 0 ext = 0 cc = <value optimized out> prev_seqno = 10932 rtpheader = 0xb73612d0 seqno = 10933 ssrc = 10932 timestamp = 110560 payload = 0xb7367ea4 remote_address = {ss = {ss_family = 2, __ss_align = 33950218, __ss_padding = '\000' <repeats 119 times>}, len = 16} frames = {first = 0x0, last = 0x0} __PRETTY_FUNCTION__ = “ast_rtp_read" {noformat} Further printing in {{gdb}} showed that while {{codec}} is NULL in {{ast_codec_samples_count}}, the {{frame}} passed in has a valid format with a valid {{codec}} object: {noformat} (gdb) p *frame $4 = {frametype = AST_FRAME_VOICE, subclass = {integer = 0, format = 0x8dfe2fc, frame_ending = 0}, datalen = 160, samples = 320, mallocd = 0, mallocd_hdr_len = 0, offset = 76, src = 0x2857443 "RTP", data = {ptr = 0x98c1b9c, uint32 = 160177052, pad = "\234\033\214\t\000\000\000"}, delivery = {tv_sec = 1418236604, tv_usec = 395000}, frame_list = {next = 0x0}, flags = 1, ts = 26280, len = 20, seqno = 3737} (gdb) p *frame->subclass.format $5 = {name = 0x823d3ee "g722", codec = 0x8dfe2ac, attribute_data = 0x0, interface = 0x0} {noformat} The culprit is {{ast_codec_samples_count}}, which does an incredibly round about trip to get the codec stored on a format. This is because the {{ast_format}} interface only exposes a way to get the codec ID, not the actual {{ast_codec}} object itself. We even have a {{BUGBUG}} for this: {code} /* BUGBUG - why not just get the codec pointer off the format? This is a bit roundabout */ codec = ast_codec_get_by_id(ast_format_get_codec_id(frame->subclass.format)); {code} This causes a race condition during shut down. While we have a {{struct ast_codec}} on the frame's format, we've already removed the referenced codec from the {{codecs}} container. Hence, {{ast_codec_get_by_id}} returns NULL, and we quickly blow up. There's two bugs to fix here: # {{ast_codec_samples_count}} needs to make sure {{codec}} is non-NULL before de-referencing it # The {{ast_format}} API should just expose the codec (bumping the reference when someone obtains it). That solves the silly {{ao2_callback}} we currently are doing via {{ast_codec_get_by_id}}. | ||
Comments: |