Skip to content

Instantly share code, notes, and snippets.

@btc
Created March 18, 2014 18:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save btc/9626131 to your computer and use it in GitHub Desktop.
Save btc/9626131 to your computer and use it in GitHub Desktop.
NAME(s)
Brian Holder-Chow Lin On (brianhc)
Nick Hewitt (nhewitt)
DESIGN
======
Overview:
We decided to use an explicit free list in the form of a hash table of linked
lists.
Hash Table (sc_arr global var):
We hash into the table by calculating the log2(requestedSize). Each bucket of
the hash table contains a doubly-linked list of free blocks. There are 32
"size classes" in this implementation. The smallest size class is at index 3
and and satisfies requests up to 7 bytes of memory. The hashing scheme is
summarized below:
Size classes:
* 0-2 - NULL
* 3 - [0 to 7] Bytes
* 4 - [8 to 15] Bytes
* ... - etc.
Linked List (in each bucket/size class):
We added the ability to sort the linked lists, but did not end up activating
this functionality in our optimized design. This list is doubly-linked. The
free block at the head of the list has a NULL prev field. The block at the
end of the list has NULL next field.
Storage used when ALLOC, FREE:
When allocated, we have 8 byte headers. The first four bytes store the size
and allocation status. Of the first four bytes, the first 29 bits store the
size. The LSB holds the status (1 for ALLOC, 0 for FREE).
When free, we use 12 of the 16 bytes (minimum allocated) to store the header
(size and status), prev, and next fields. To simplifying setting and getting,
we cast to a Node struct.
Algorithms: Retrieval: To satisfy a request, we index into the most
appropriate size class and traverse the linked list in search of a suitable
block. If one is not found at this size class, we increment to the next
highest size class and try to satisfy the request by finding a block from this
list. This process is repeated until a block is found or the end of the size
class array is reached. If no block is found, we submit a request for more
memory from the OS. This operation is O(n) where n is the number of blocks in
the array of linked lists.
Algorithms: Storage: When the client returns memory to the heap allocator, we
check the size of the memory (stored in the header), index into the
appropriate size class, and store this free block there (add to linked list at
head). This operation runs in constant time.
//TODO rm below
<Give an overview of your allocator implementation (data structures/algorithms)>
RATIONALE
Our decision to use a size class array at log base 2 intervals, was in order
to concentrate the quantity of size classes in the lower size range (which we
hypothesized to be more heavily used by clients) while maintaining our
alignment. This array allowed us to Hash primitively and increase the speed of
our searches for free blocks during placement and removal.
We chose to make the linked lists extending down from the size classes to be
bi-directional in order to speed our coalescing operation. When blocks are
selected for coalescing, we do not have to search in linear time for the
parent in order to patch the linked list.
We chose to make a 1 Byte header that contains the block's size and allocation
status so that we could place blocks back into a list of free memory, and
organize their placement by size. This header also facilitated right
coalescing, by allowing us to jump from block to block until we run into an
allocated block.
We chose to make the minimum block size 16 Bytes, and the minimum memory given
to the client for use to 8 Bytes. This allowed us to maintain a header for the
blocks with the block size and allocation status at all times, and leave room
for two pointers to link into the list of freed blocks.
We chose to have a Node struct in order to abstract the complex pointer
operations when manipulating blocks that are in the size class array.
We made a function to cut off the remainder before returning a block to the
client to minimize the internal fragmentation. This ensures the client never
gets more than the requested size (rounded up to 8).
OPTIMIZATION
============
Number of pages requested:
We applied a constant multiplier to the number pages requested from the OS
to satisfy client requests in order to minimize the amount of time spent in a
costly operation. So if a request came in that would require 8 pages, we
retrieved 3 times (in our final submission) this amount.
Number of nodes searched (linked list):
If we allow the allocator to search the linked list for LL_SEARCH_LENGTH nodes
before trying a larger size class, we improve the utilization by several
percentage points. Throughput also improves as we find ourselves requesting
fewer pages from the OS:
static inline Node *LLFindAndRemoveNode(Node **ptrHead, size_t requestedSize) {
int i = 0;
for (Node *n = *ptrHead; n != NULL; n = n->next) {
if (i++ > LL_SEARCH_LENGTH)
return NULL;
if (GET_SIZE(CLIENT(n)) >= requestedSize) {
...
// remove node, return node
}
}
return NULL;
}
Inlining, compiler optimization, the macroizing:
We inline all static helper functions, use O3 optimization, and use macros
insetad of function calls for many low-level operations. The -O3 optimization
yields considerable throughput gains.
EVALUATION
==========
The primary strength of our design is that it allows for rapid removal and
addition of blocks to the size class array, and facilitates right coalescing
without sacrificing much throughput. The result is an allocator with good
throughput, and very little internal fragmentation.
Another strength is that we are able to retrieve suitable blocks from our
array of linked lists with approximately best-fit in a relatively short amount
of time.
The weakness of our design is its tendency for heavy external fragmentation.
This is the result of our inability to coalesce across the groups of pages
that were requested from the OS. Furthermore, we tried to improve utilization
by sorting and searching (different occasions) the linked lists for best-fit
blocks, but this sacrificed double digits of throughput for a 1% increase in
utilization. It just wasn't worth it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment