Created
March 18, 2014 18:19
-
-
Save btc/9626131 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NAME(s) | |
Brian Holder-Chow Lin On (brianhc) | |
Nick Hewitt (nhewitt) | |
DESIGN | |
====== | |
Overview: | |
We decided to use an explicit free list in the form of a hash table of linked | |
lists. | |
Hash Table (sc_arr global var): | |
We hash into the table by calculating the log2(requestedSize). Each bucket of | |
the hash table contains a doubly-linked list of free blocks. There are 32 | |
"size classes" in this implementation. The smallest size class is at index 3 | |
and and satisfies requests up to 7 bytes of memory. The hashing scheme is | |
summarized below: | |
Size classes: | |
* 0-2 - NULL | |
* 3 - [0 to 7] Bytes | |
* 4 - [8 to 15] Bytes | |
* ... - etc. | |
Linked List (in each bucket/size class): | |
We added the ability to sort the linked lists, but did not end up activating | |
this functionality in our optimized design. This list is doubly-linked. The | |
free block at the head of the list has a NULL prev field. The block at the | |
end of the list has NULL next field. | |
Storage used when ALLOC, FREE: | |
When allocated, we have 8 byte headers. The first four bytes store the size | |
and allocation status. Of the first four bytes, the first 29 bits store the | |
size. The LSB holds the status (1 for ALLOC, 0 for FREE). | |
When free, we use 12 of the 16 bytes (minimum allocated) to store the header | |
(size and status), prev, and next fields. To simplifying setting and getting, | |
we cast to a Node struct. | |
Algorithms: Retrieval: To satisfy a request, we index into the most | |
appropriate size class and traverse the linked list in search of a suitable | |
block. If one is not found at this size class, we increment to the next | |
highest size class and try to satisfy the request by finding a block from this | |
list. This process is repeated until a block is found or the end of the size | |
class array is reached. If no block is found, we submit a request for more | |
memory from the OS. This operation is O(n) where n is the number of blocks in | |
the array of linked lists. | |
Algorithms: Storage: When the client returns memory to the heap allocator, we | |
check the size of the memory (stored in the header), index into the | |
appropriate size class, and store this free block there (add to linked list at | |
head). This operation runs in constant time. | |
//TODO rm below | |
<Give an overview of your allocator implementation (data structures/algorithms)> | |
RATIONALE | |
Our decision to use a size class array at log base 2 intervals, was in order | |
to concentrate the quantity of size classes in the lower size range (which we | |
hypothesized to be more heavily used by clients) while maintaining our | |
alignment. This array allowed us to Hash primitively and increase the speed of | |
our searches for free blocks during placement and removal. | |
We chose to make the linked lists extending down from the size classes to be | |
bi-directional in order to speed our coalescing operation. When blocks are | |
selected for coalescing, we do not have to search in linear time for the | |
parent in order to patch the linked list. | |
We chose to make a 1 Byte header that contains the block's size and allocation | |
status so that we could place blocks back into a list of free memory, and | |
organize their placement by size. This header also facilitated right | |
coalescing, by allowing us to jump from block to block until we run into an | |
allocated block. | |
We chose to make the minimum block size 16 Bytes, and the minimum memory given | |
to the client for use to 8 Bytes. This allowed us to maintain a header for the | |
blocks with the block size and allocation status at all times, and leave room | |
for two pointers to link into the list of freed blocks. | |
We chose to have a Node struct in order to abstract the complex pointer | |
operations when manipulating blocks that are in the size class array. | |
We made a function to cut off the remainder before returning a block to the | |
client to minimize the internal fragmentation. This ensures the client never | |
gets more than the requested size (rounded up to 8). | |
OPTIMIZATION | |
============ | |
Number of pages requested: | |
We applied a constant multiplier to the number pages requested from the OS | |
to satisfy client requests in order to minimize the amount of time spent in a | |
costly operation. So if a request came in that would require 8 pages, we | |
retrieved 3 times (in our final submission) this amount. | |
Number of nodes searched (linked list): | |
If we allow the allocator to search the linked list for LL_SEARCH_LENGTH nodes | |
before trying a larger size class, we improve the utilization by several | |
percentage points. Throughput also improves as we find ourselves requesting | |
fewer pages from the OS: | |
static inline Node *LLFindAndRemoveNode(Node **ptrHead, size_t requestedSize) { | |
int i = 0; | |
for (Node *n = *ptrHead; n != NULL; n = n->next) { | |
if (i++ > LL_SEARCH_LENGTH) | |
return NULL; | |
if (GET_SIZE(CLIENT(n)) >= requestedSize) { | |
... | |
// remove node, return node | |
} | |
} | |
return NULL; | |
} | |
Inlining, compiler optimization, the macroizing: | |
We inline all static helper functions, use O3 optimization, and use macros | |
insetad of function calls for many low-level operations. The -O3 optimization | |
yields considerable throughput gains. | |
EVALUATION | |
========== | |
The primary strength of our design is that it allows for rapid removal and | |
addition of blocks to the size class array, and facilitates right coalescing | |
without sacrificing much throughput. The result is an allocator with good | |
throughput, and very little internal fragmentation. | |
Another strength is that we are able to retrieve suitable blocks from our | |
array of linked lists with approximately best-fit in a relatively short amount | |
of time. | |
The weakness of our design is its tendency for heavy external fragmentation. | |
This is the result of our inability to coalesce across the groups of pages | |
that were requested from the OS. Furthermore, we tried to improve utilization | |
by sorting and searching (different occasions) the linked lists for best-fit | |
blocks, but this sacrificed double digits of throughput for a 1% increase in | |
utilization. It just wasn't worth it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment