Blog 2020/8/20
<- previous | index | next ->
Let's learn how to implement a hash table in C! (See also part 1).
One technique to resolve the problem of "slot collisions" is the concept of chaining, which is a fancy way of saying "use linked lists".
We'll call our linked list node an HTSlot
. It will contain:
- a key's hash
- a pointer to a value
- a pointer to the next
HTSlot
struct _HTSlot {
size_t keyHash;
void* valuep;
struct _HTSlot* nextp;
};
typedef struct _HTSlot HTSlot;
Our HashTable
needs to be updated.
struct _HashTable {
- void** slotspp;
+ HTSlot** slotspp;
size_t capacity;
size_t population;
};
HashTableFree
needs to walk and free each HTSlot
list.
void HashTableFree(HashTable* htp) {
+ for (size_t i=0; i < htp->capacity; i++) {
+ HTSlot* slotp = htp->slotspp[i];
+ while (slotp != NULL) {
+ HTSlot* tmp = slotp;
+ free(slotp);
+ slotp = tmp->nextp;
+ continue;
+ }
+ }
free(htp->slotspp);
free(htp);
}
HashTableGet
needs to walk each slot list and look for an exact key hash match.
void* HashTableGet(HashTable* htp, size_t keyHash) {
size_t slotIndex = keyHash % htp->capacity;
- return htp->slotspp[slotIndex];
+ HTSlot* cursorp = htp->slotspp[slotIndex];
+ while (cursorp != NULL) {
+ if (cursorp->keyHash == keyHash) {
+ return cursorp->valuep;
+ } else {
+ cursorp = cursorp->nextp;
+ continue;
+ }
+ }
+ return NULL;
}
HashTableSet
walks the slot list, looking for a match to update.
If it doesn't find one, it appends a new slot onto the list.
void HashTableSet(HashTable* htp, size_t keyHash, void* valuep) {
size_t slotIndex = keyHash % htp->capacity;
HTSlot* cursorp = htp->slotspp[slotIndex];
while (cursorp != NULL) {
if (cursorp->keyHash == keyHash) {
cursorp->valuep = valuep;
if (cursorp->valuep == NULL && valuep != NULL) {
htp->population++;
} else if (cursorp->valuep != NULL && valuep == NULL) {
htp->population--;
}
return;
} else {
cursorp = cursorp->nextp;
continue;
}
}
// no existing match? append a new slot to the list.
if (valuep != NULL) {
HTSlot* newSlotp = _HTSlotMake();
newSlotp->keyHash = keyHash;
newSlotp->valuep = valuep;
if (htp->slotspp[slotIndex] == NULL) {
htp->slotspp[slotIndex] = newSlotp;
} else {
HTSlot* lastp = htp->slotspp[slotIndex];
while (lastp->nextp != NULL) {
lastp = lastp->nextp;
continue;
}
lastp->nextp = newSlotp;
}
htp->population++;
}
}
With these changes, our slot collision test passes!
./tests
testMake... OK
testFree... OK
testSet... OK
testGet... OK
testSlotCollision... OK
However, there is still a problem with this hash table: it doesn't handle the case where two different keys happen to produce the same hash.
We can demonstrate this with a test.
This is an easy task for us, as terribleHash
is so simplistic
that intentionally creating a hash collision is as easy as rearranging the letters in a key.
void testHashCollision() {
printf("%s... ", __FUNCTION__);
size_t capacity = 1;
HashTable* htp = HashTableMake(capacity);
char* key1p = "ab";
char* value1p = "cd";
size_t key1Hash = terribleHash(key1p, strlen(key1p)+1);
HashTableSet(htp, key1Hash, value1p);
assert(htp->population == 1);
char* key2p = "ba";
char* value2p = "dc";
size_t key2Hash = terribleHash(key2p, strlen(key2p)+1);
HashTableSet(htp, key2Hash, value2p);
assert(htp->population == 2);
char* got1p = HashTableGet(htp, key1Hash);
assert(strcmp(value1p, got1p) == 0);
char* got2p = HashTableGet(htp, key2Hash);
assert(strcmp(value2p, got2p) == 0);
printf("OK\n");
}
tests: tests.c:90: testHashCollision: Assertion `htp->population == 2' failed.
In part 3, we account for hash collisions.