Summary:ASTERISK-13204: [patch] Astrerisk crashes using the app_queue.c transfer datastores
Reporter:Kevin Scott Adams (nivek)Labels:
Date Opened:2008-12-11 07:33:11.000-0600Date Closed:2008-12-16 11:31:16.000-0600
Versions:Frequency of
Environment:Attachments:( 0) datastore_fixup.patch
( 1) datastore_fixup.patch.corrected
Description:When Asterisk crashes and we analyze the dump, Asterisk always crashes in main/channel.c at line 3562 of 1.4 SVN which states in the dump trace:
   if (ds->info->chan_fixup)

After a lot of debugging statements, we have found that right after the return from queue_transfer_fixup() it crashes on the above statement.

Variable ds seems to have a value but ds->info and ds->info->chan_fixup comeback as undefined.

Looking at the code, we believe (Marquis and I are colleagues) the ast_channel_datastore_free in the queue_transfer_fixup is causing the grief.

If I understand the code flow correctly, the ast_channel_datastore_remove unlinks the datastore from the linked list but the datastore still retains its values to point to the next data structure in the linked list.  The ast_channel_datastore_free, of course, adds the datastore allocated memory back into the heap for it to be allocated again.  We believe that the memory added back to the heap gets reallocated to another process and causes the crash.

This crash is very random.  I has happened once a day for three days then not again for a week.  If happened within 15 minutes of each other on one day.  Sometimes it took a week or so to happen.

I have included a patch to 1.4-SVN that we have used for a little over a month now without a crash (knock on wood).  The patch also includes a change to the time parameter passing and calculations that were using 'int' instead of 'long'.  We elected to use 'time_t' in case 'time_t' ever changed it could be more portable.  This is your call on that.


Our hardware is a Dell 2950 with 2 Intel 5150 processors (Xeon Dual Core 2.66GHz) with 4Gb memory.

Also, we elected not to change main/channel.c because other programs are using this function with out issues (I hope).  But, looking at the link list macros, the AST_LIST_TRAVERSE could be substituted by using AST_LIST_TRAVERSE_SAFE_BEGIN if you wish fixups to remove the datastores that call them.  Marquis and I went to the bar after coming up with this one and again agreed, that is your call.

Comments:By: Kevin Scott Adams (nivek) 2008-12-11 09:17:19.000-0600

Sorry uploaded the incorrect patch.  Please use datastore_fixup.patch.corrected.

By: Mark Michelson (mmichelson) 2008-12-11 10:18:04.000-0600

The patch certainly looks good by my eyes with one small exception. The final call to ast_channel_datastore_free should first make sure that transfer_ds is non-NULL. That's a trivial thing to add, though, so I'll just do it when I commit the patch.

Thank you very much for ... well for pretty much everything in this report: the time taken to find the problem, the good patch which follows coding guidelines, and the time taken to test it. I will get this committed shortly.

By: Digium Subversion (svnbot) 2008-12-11 10:24:39.000-0600

Repository: asterisk
Revision: 163080

U   branches/1.4/apps/app_queue.c

r163080 | mmichelson | 2008-12-11 10:24:38 -0600 (Thu, 11 Dec 2008) | 14 lines

Fix a potential crash due to unsafe datastore handling.

This patch also contains a conversion from using long to time_t
for representing times for a queue, as well as some whitespace

(closes issue ASTERISK-13204)
Reported by: nivek
     datastore_fixup.patch.corrected uploaded by nivek (license 636)
 with slight modification from me
Tested by: nivek