btc/readme.txt

## readme.txt
NAME(s)
Brian Holder-Chow Lin On (brianhc)
Nick Hewitt (nhewitt)

DESIGN
======
Overview:
We decided to use an explicit free list in the form of a hash table of linked
lists.

Hash Table (sc_arr global var):
We hash into the table by calculating the log2(requestedSize). Each bucket of
the hash table contains a doubly-linked list of free blocks. There are 32
"size classes" in this implementation. The smallest size class is at index 3
and and satisfies requests up to 7 bytes of memory. The hashing scheme is
summarized below:

Size classes:
 * 0-2  - NULL
 * 3    - [0 to 7]  Bytes
 * 4    - [8 to 15] Bytes
 * ...  - etc.

Linked List (in each bucket/size class):
We added the ability to sort the linked lists, but did not end up activating
this functionality in our optimized design. This list is doubly-linked. The
free block at the head of the list has a NULL prev field. The block at the
end of the list has NULL next field.

Storage used when ALLOC, FREE:
When allocated, we have 8 byte headers. The first four bytes store the size
and allocation status. Of the first four bytes, the first 29 bits store the
size. The LSB holds the status (1 for ALLOC, 0 for FREE).
When free, we use 12 of the 16 bytes (minimum allocated) to store the header
(size and status), prev, and next fields. To simplifying setting and getting,
we cast to a Node struct.

Algorithms: Retrieval: To satisfy a request, we index into the most
appropriate size class and traverse the linked list in search of a suitable
block. If one is not found at this size class, we increment to the next
highest size class and try to satisfy the request by finding a block from this
list. This process is repeated until a block is found or the end of the size
class array is reached. If no block is found, we submit a request for more
memory from the OS. This operation is O(n) where n is the number of blocks in
the array of linked lists.

Algorithms: Storage: When the client returns memory to the heap allocator, we
check the size of the memory (stored in the header), index into the
appropriate size class, and store this free block there (add to linked list at
head). This operation runs in constant time.

//TODO rm below
<Give an overview of your allocator implementation (data structures/algorithms)>

RATIONALE

Our decision to use a size class array at log base 2 intervals, was in order
to concentrate the quantity of size classes in the lower size range (which we
hypothesized to be more heavily used by clients) while maintaining our
alignment. This array allowed us to Hash primitively and increase the speed of
our searches for free blocks during placement and removal.

We chose to make the linked lists extending down from the size classes to be
bi-directional in order to speed our coalescing operation. When blocks are
selected for coalescing, we do not have to search in linear time for the
parent in order to patch the linked list.

We chose to make a 1 Byte header that contains the block's size and allocation
status so that we could place blocks back into a list of free memory, and
organize their placement by size. This header also facilitated right
coalescing, by allowing us to jump from block to block until we run into an
allocated block.

We chose to make the minimum block size 16 Bytes, and the minimum memory given
to the client for use to 8 Bytes. This allowed us to maintain a header for the
blocks with the block size and allocation status at all times, and leave room
for two pointers to link into the list of freed blocks.

We chose to have a Node struct in order to abstract the complex pointer
operations when manipulating blocks that are in the size class array.

We made a function to cut off the remainder before returning a block to the
client to minimize the internal fragmentation. This ensures the client never
gets more than the requested size (rounded up to 8).

OPTIMIZATION
============

Number of pages requested:
We applied a constant multiplier to the number pages requested from the OS
to satisfy client requests in order to minimize the amount of time spent in a
costly operation. So if a request came in that would require 8 pages, we
retrieved 3 times (in our final submission) this amount.

Number of nodes searched (linked list):
If we allow the allocator to search the linked list for LL_SEARCH_LENGTH nodes
before trying a larger size class, we improve the utilization by several
percentage points. Throughput also improves as we find ourselves requesting
fewer pages from the OS:

static inline Node *LLFindAndRemoveNode(Node **ptrHead, size_t requestedSize) {
    int i = 0;
	for (Node *n = *ptrHead; n != NULL; n = n->next) {
        if (i++ > LL_SEARCH_LENGTH)
            return NULL;
        if (GET_SIZE(CLIENT(n)) >= requestedSize) {
            ...
            // remove node, return node
        }
    }
    return NULL;
}

Inlining, compiler optimization, the macroizing:
We inline all static helper functions, use O3 optimization, and use macros
insetad of function calls for many low-level operations. The -O3 optimization
yields considerable throughput gains.


EVALUATION
==========
The primary strength of our design is that it allows for rapid removal and
addition of blocks to the size class array, and facilitates right coalescing
without sacrificing much throughput. The result is an allocator with good
throughput, and very little internal fragmentation.

Another strength is that we are able to retrieve suitable blocks from our
array of linked lists with approximately best-fit in a relatively short amount
of time.

The weakness of our design is its tendency for heavy external fragmentation.
This is the result of our inability to coalesce across the groups of pages
that were requested from the OS. Furthermore, we tried to improve utilization
by sorting and searching (different occasions) the linked lists for best-fit
blocks, but this sacrificed double digits of throughput for a 1% increase in
utilization. It just wasn't worth it.
	NAME(s)
	Brian Holder-Chow Lin On (brianhc)
	Nick Hewitt (nhewitt)

	DESIGN
	======
	Overview:
	We decided to use an explicit free list in the form of a hash table of linked
	lists.

	Hash Table (sc_arr global var):
	We hash into the table by calculating the log2(requestedSize). Each bucket of
	the hash table contains a doubly-linked list of free blocks. There are 32
	"size classes" in this implementation. The smallest size class is at index 3
	and and satisfies requests up to 7 bytes of memory. The hashing scheme is
	summarized below:

	Size classes:
	* 0-2 - NULL
	* 3 - [0 to 7] Bytes
	* 4 - [8 to 15] Bytes
	* ... - etc.

	Linked List (in each bucket/size class):
	We added the ability to sort the linked lists, but did not end up activating
	this functionality in our optimized design. This list is doubly-linked. The
	free block at the head of the list has a NULL prev field. The block at the
	end of the list has NULL next field.

	Storage used when ALLOC, FREE:
	When allocated, we have 8 byte headers. The first four bytes store the size
	and allocation status. Of the first four bytes, the first 29 bits store the
	size. The LSB holds the status (1 for ALLOC, 0 for FREE).
	When free, we use 12 of the 16 bytes (minimum allocated) to store the header
	(size and status), prev, and next fields. To simplifying setting and getting,
	we cast to a Node struct.

	Algorithms: Retrieval: To satisfy a request, we index into the most
	appropriate size class and traverse the linked list in search of a suitable
	block. If one is not found at this size class, we increment to the next
	highest size class and try to satisfy the request by finding a block from this
	list. This process is repeated until a block is found or the end of the size
	class array is reached. If no block is found, we submit a request for more
	memory from the OS. This operation is O(n) where n is the number of blocks in
	the array of linked lists.

	Algorithms: Storage: When the client returns memory to the heap allocator, we
	check the size of the memory (stored in the header), index into the
	appropriate size class, and store this free block there (add to linked list at
	head). This operation runs in constant time.

	//TODO rm below
	<Give an overview of your allocator implementation (data structures/algorithms)>

	RATIONALE

	Our decision to use a size class array at log base 2 intervals, was in order
	to concentrate the quantity of size classes in the lower size range (which we
	hypothesized to be more heavily used by clients) while maintaining our
	alignment. This array allowed us to Hash primitively and increase the speed of
	our searches for free blocks during placement and removal.

	We chose to make the linked lists extending down from the size classes to be
	bi-directional in order to speed our coalescing operation. When blocks are
	selected for coalescing, we do not have to search in linear time for the
	parent in order to patch the linked list.

	We chose to make a 1 Byte header that contains the block's size and allocation
	status so that we could place blocks back into a list of free memory, and
	organize their placement by size. This header also facilitated right
	coalescing, by allowing us to jump from block to block until we run into an
	allocated block.

	We chose to make the minimum block size 16 Bytes, and the minimum memory given
	to the client for use to 8 Bytes. This allowed us to maintain a header for the
	blocks with the block size and allocation status at all times, and leave room
	for two pointers to link into the list of freed blocks.

	We chose to have a Node struct in order to abstract the complex pointer
	operations when manipulating blocks that are in the size class array.

	We made a function to cut off the remainder before returning a block to the
	client to minimize the internal fragmentation. This ensures the client never
	gets more than the requested size (rounded up to 8).

	OPTIMIZATION
	============

	Number of pages requested:
	We applied a constant multiplier to the number pages requested from the OS
	to satisfy client requests in order to minimize the amount of time spent in a
	costly operation. So if a request came in that would require 8 pages, we
	retrieved 3 times (in our final submission) this amount.

	Number of nodes searched (linked list):
	If we allow the allocator to search the linked list for LL_SEARCH_LENGTH nodes
	before trying a larger size class, we improve the utilization by several
	percentage points. Throughput also improves as we find ourselves requesting
	fewer pages from the OS:

	static inline Node LLFindAndRemoveNode(Node *ptrHead, size_t requestedSize) {
	int i = 0;
	for (Node n = ptrHead; n != NULL; n = n->next) {
	if (i++ > LL_SEARCH_LENGTH)
	return NULL;
	if (GET_SIZE(CLIENT(n)) >= requestedSize) {
	...
	// remove node, return node
	}
	}
	return NULL;
	}

	Inlining, compiler optimization, the macroizing:
	We inline all static helper functions, use O3 optimization, and use macros
	insetad of function calls for many low-level operations. The -O3 optimization
	yields considerable throughput gains.


	EVALUATION
	==========
	The primary strength of our design is that it allows for rapid removal and
	addition of blocks to the size class array, and facilitates right coalescing
	without sacrificing much throughput. The result is an allocator with good
	throughput, and very little internal fragmentation.

	Another strength is that we are able to retrieve suitable blocks from our
	array of linked lists with approximately best-fit in a relatively short amount
	of time.

	The weakness of our design is its tendency for heavy external fragmentation.
	This is the result of our inability to coalesce across the groups of pages
	that were requested from the OS. Furthermore, we tried to improve utilization
	by sorting and searching (different occasions) the linked lists for best-fit
	blocks, but this sacrificed double digits of throughput for a 1% increase in
	utilization. It just wasn't worth it.