Summary: | ASTERISK-16044: [patch] The heap data structure can't cope with a removal and reinsert | ||
Reporter: | cappucinoking (cappucinoking) | Labels: | |
Date Opened: | 2010-05-03 12:42:51 | Date Closed: | 2010-05-06 09:15:58 |
Priority: | Blocker | Regression? | No |
Status: | Closed/Complete | Components: | Tests/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) heap-fix.diff ( 1) heap-fix.rev2.diff ( 2) test_heap.diff ( 3) test_heap.patch | |
Description: | Whilst looking into why qualify events are being scheduled out of sequence I started looking into why items on the maxheap weren't in the correct order. The qualify code is a different use case then the current tests allow for. When a response to a qualify request is made, the event is rescheduled via a remove and re-add to the maxheap structure. I have written a test to demonstrate the failure of the heap to keep the items in order when removing and re-inserting items. | ||
Comments: | By: cappucinoking (cappucinoking) 2010-05-03 12:54:49 The patch creates a heap with 1000 items. Randomly removes and re-inserts 500 items. Checks that items pop-off in order. Currently this test is failing. By: Russell Bryant (russell) 2010-05-03 16:15:28 Give this patch a try. By: cappucinoking (cappucinoking) 2010-05-03 16:56:54 This is now passing the test case. By: cappucinoking (cappucinoking) 2010-05-03 17:30:16 test_heap.diff is a slightly revised version of the test - and fixes a minor typo in test two as well. By: Russell Bryant (russell) 2010-05-04 13:52:36 Let me know how the testing in production goes today. By: Russell Bryant (russell) 2010-05-04 14:00:26 I attached a rev2 of the fix. It's the same code, I just changed the formatting of the bubble_up() loop to be a bit more readable. By: cappucinoking (cappucinoking) 2010-05-04 16:07:07 It seems to have fixed the issue I was having with the qualify timing. I haven't seen any adverse effects. By: Digium Subversion (svnbot) 2010-05-06 08:58:12 Repository: asterisk Revision: 261496 U trunk/main/heap.c ------------------------------------------------------------------------ r261496 | russell | 2010-05-06 08:58:07 -0500 (Thu, 06 May 2010) | 40 lines Fix handling of removing nodes from the middle of a heap. This bug surfaced in 1.6.2 and does not affect code in any other released version of Asterisk. It manifested itself as SIP qualify not happening when it should, causing peers to go unreachable. This was debugged down to scheduler entries sometimes not getting executed when they were supposed to, which was in turn caused by an error in the heap code. The problem only sometimes occurs, and it is due to the logic for removing an entry in the heap from an arbitrary location (not just popping off the top). The scheduler performs this operation frequently when entries are removed before they run (when ast_sched_del() is used). In a normal pop off of the top of the heap, a node is taken off the bottom, placed at the top, and then bubbled down until the max heap property is restored (see max_heapify()). This same logic was used for removing an arbitrary node from the middle of the heap. Unfortunately, that logic is full of fail. This patch fixes that by fully restoring the max heap property when a node is thrown into the middle of the heap. Instead of just pushing it down as appropriate, it first pushes it up as high as it will go, and _then_ pushes it down. Lastly, fix a minor problem in ast_heap_verify(), which is only used for debugging. If a parent and child node have the same value, that is not an error. The only error is if a parent's value is less than its children. A huge thanks goes out to cappucinoking for debugging this down to the scheduler, and then producing an ast_heap test case that demonstrated the breakage. That made it very easy for me to focus on the heap logic and produce a fix. Open source projects are awesome. (closes issue ASTERISK-15721) Reported by: ib2 Tested by: cappucinoking, crjw (closes issue ASTERISK-16044) Reported by: cappucinoking Patches: heap-fix.rev2.diff uploaded by russell (license 2) Tested by: cappucinoking, russell ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=261496 By: Digium Subversion (svnbot) 2010-05-06 09:02:25 Repository: asterisk Revision: 261497 _U branches/1.6.1/ U branches/1.6.1/main/heap.c ------------------------------------------------------------------------ r261497 | russell | 2010-05-06 09:02:22 -0500 (Thu, 06 May 2010) | 47 lines Merged revisions 261496 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r261496 | russell | 2010-05-06 08:58:07 -0500 (Thu, 06 May 2010) | 40 lines Fix handling of removing nodes from the middle of a heap. This bug surfaced in 1.6.2 and does not affect code in any other released version of Asterisk. It manifested itself as SIP qualify not happening when it should, causing peers to go unreachable. This was debugged down to scheduler entries sometimes not getting executed when they were supposed to, which was in turn caused by an error in the heap code. The problem only sometimes occurs, and it is due to the logic for removing an entry in the heap from an arbitrary location (not just popping off the top). The scheduler performs this operation frequently when entries are removed before they run (when ast_sched_del() is used). In a normal pop off of the top of the heap, a node is taken off the bottom, placed at the top, and then bubbled down until the max heap property is restored (see max_heapify()). This same logic was used for removing an arbitrary node from the middle of the heap. Unfortunately, that logic is full of fail. This patch fixes that by fully restoring the max heap property when a node is thrown into the middle of the heap. Instead of just pushing it down as appropriate, it first pushes it up as high as it will go, and _then_ pushes it down. Lastly, fix a minor problem in ast_heap_verify(), which is only used for debugging. If a parent and child node have the same value, that is not an error. The only error is if a parent's value is less than its children. A huge thanks goes out to cappucinoking for debugging this down to the scheduler, and then producing an ast_heap test case that demonstrated the breakage. That made it very easy for me to focus on the heap logic and produce a fix. Open source projects are awesome. (closes issue ASTERISK-15721) Reported by: ib2 Tested by: cappucinoking, crjw (closes issue ASTERISK-16044) Reported by: cappucinoking Patches: heap-fix.rev2.diff uploaded by russell (license 2) Tested by: cappucinoking, russell ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=261497 By: Digium Subversion (svnbot) 2010-05-06 09:04:38 Repository: asterisk Revision: 261498 _U branches/1.6.2/ U branches/1.6.2/main/heap.c ------------------------------------------------------------------------ r261498 | russell | 2010-05-06 09:04:36 -0500 (Thu, 06 May 2010) | 47 lines Merged revisions 261496 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r261496 | russell | 2010-05-06 08:58:07 -0500 (Thu, 06 May 2010) | 40 lines Fix handling of removing nodes from the middle of a heap. This bug surfaced in 1.6.2 and does not affect code in any other released version of Asterisk. It manifested itself as SIP qualify not happening when it should, causing peers to go unreachable. This was debugged down to scheduler entries sometimes not getting executed when they were supposed to, which was in turn caused by an error in the heap code. The problem only sometimes occurs, and it is due to the logic for removing an entry in the heap from an arbitrary location (not just popping off the top). The scheduler performs this operation frequently when entries are removed before they run (when ast_sched_del() is used). In a normal pop off of the top of the heap, a node is taken off the bottom, placed at the top, and then bubbled down until the max heap property is restored (see max_heapify()). This same logic was used for removing an arbitrary node from the middle of the heap. Unfortunately, that logic is full of fail. This patch fixes that by fully restoring the max heap property when a node is thrown into the middle of the heap. Instead of just pushing it down as appropriate, it first pushes it up as high as it will go, and _then_ pushes it down. Lastly, fix a minor problem in ast_heap_verify(), which is only used for debugging. If a parent and child node have the same value, that is not an error. The only error is if a parent's value is less than its children. A huge thanks goes out to cappucinoking for debugging this down to the scheduler, and then producing an ast_heap test case that demonstrated the breakage. That made it very easy for me to focus on the heap logic and produce a fix. Open source projects are awesome. (closes issue ASTERISK-15721) Reported by: ib2 Tested by: cappucinoking, crjw (closes issue ASTERISK-16044) Reported by: cappucinoking Patches: heap-fix.rev2.diff uploaded by russell (license 2) Tested by: cappucinoking, russell ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=261498 By: Digium Subversion (svnbot) 2010-05-06 09:07:52 Repository: asterisk Revision: 261499 U branches/1.6.2/tests/test_heap.c ------------------------------------------------------------------------ r261499 | russell | 2010-05-06 09:07:52 -0500 (Thu, 06 May 2010) | 8 lines Add test case that ensures the heap handles arbitrary removals properly. (issue ASTERISK-16044) Reported by: cappucinoking Patches: test_heap.diff uploaded by cappucinoking (license 1036) Tested by: cappucinoking, russell ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=261499 By: Digium Subversion (svnbot) 2010-05-06 09:15:58 Repository: asterisk Revision: 261500 U trunk/tests/test_heap.c ------------------------------------------------------------------------ r261500 | russell | 2010-05-06 09:15:57 -0500 (Thu, 06 May 2010) | 10 lines Add test case for removing random elements from a heap. I modified the original patch for trunk to use the unit test API. (issue ASTERISK-16044) Reported by: cappucinoking Patches: test_heap.diff uploaded by cappucinoking (license 1036) Tested by: cappucinoking, russell ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=261500 |