Skip to content

Instantly share code, notes, and snippets.

@nicknapoli82
Last active September 8, 2020 16:11
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nicknapoli82/c7cb7484011b08a9806d732ea1a142a6 to your computer and use it in GitHub Desktop.
Save nicknapoli82/c7cb7484011b08a9806d732ea1a142a6 to your computer and use it in GitHub Desktop.
Very simple analysis of a couple hashing algorithms
This is just comparing my hash function against the djb2 function and xxHash.
The purpose is not only to observe speed, but also to observe dispersion into a hash table and collisions that occur
considering a full 64 bits. Maybe this is useful?
I was more focused on my own hash function which is a simple play on the djb2 algorithm.
You may find interesting the times involved based on optimization, and when it concerns unfilled buckets.
For filling buckets I am only using the low bits of the full number. So
table bucket = 64bit hash & 0xffff;
when I want to only use the first 16 bits.
The dictionaries used are the cs50 large dictionary for the speller pset, and words.txt which I pulled
from here https://github.com/dwyl/english-words
Time was calculated by iterating through the given dictionary 1000 times, and timing only the hash function used.
Time is additive for each hash function call and is timed within the nano second limitation of clock_gettime.
This is for fun, and a little explorative.
Results for my hash:
clang -O0 -o dictionary dictionary.c
./dictionary large
Time spent in hash = 6.752705
Using MY HASH and 65536 table buckets
There are 7409 zero buckets out of 143091 words
84964 collisions occured where 58127 buckets were used
The average bucket length was 3.326907
Considering the total 64 bit hash 0 collisions occured
./dictionary words.txt
Time spent in hash = 20.061957
Using MY HASH and 262144 table buckets
There are 44304 zero buckets out of 466551 words
248711 collisions occured where 217840 buckets were used
The average bucket length was 2.883202
Considering the total 64 bit hash 4 collisions occured
clang -O3 -o dictionary dictionary.c - Optimized version for speed up comparison
./dictionary words.txt
Time spent in hash = 13.446997
Using MY HASH and 262144 table buckets
There are 44304 zero buckets out of 466551 words
248711 collisions occured where 217840 buckets were used
The average bucket length was 2.883202
Considering the total 64 bit hash 4 collisions occured
So -O3 optimization results in 32.97 % speed up
Results for djb2:
clang -O0 -o dictionary dictionary.c
./dictionary large
Time spent in hash = 5.446917
Using djb2 and 65536 table buckets
There are 7331 zero buckets out of 143091 words
84886 collisions occured where 58205 buckets were used
The average bucket length was 1.577300
Considering the total 64 bit hash 19 collisions occured
./dictionary words.txt
Time spent in hash = 16.352637
Using djb2 and 262144 table buckets
There are 44262 zero buckets out of 466551 words
248669 collisions occured where 217882 buckets were used
The average bucket length was 1.592592
Considering the total 64 bit hash 268 collisions occured
clang -O3 -o dictionary dictionary.c
./dictionary words.txt
Time spent in hash = 11.092085
Using djb2 and 262144 table buckets
There are 44262 zero buckets out of 466551 words
248669 collisions occured where 217882 buckets were used
The average bucket length was 1.592592
Considering the total 64 bit hash 268 collisions occured
Results for xxHash (XXH3 64 bits):
clang -O0 -o dictionary dictionary.c ./xxHash-dev/xxhash.c
./dictionary large
Time spent in hash = 8.331802
Using xxHash and 65536 table buckets
There are 7373 zero buckets out of 143091 words
84928 collisions occured where 58163 buckets were used
The average bucket length was 2.694839
Considering the total 64 bit hash 0 collisions occured
./dictionary words.txt
Time spent in hash = 26.469801
Using xxHash and 262144 table buckets
There are 44042 zero buckets out of 466551 words
248449 collisions occured where 218102 buckets were used
The average bucket length was 3.851463
Considering the total 64 bit hash 0 collisions occured
clang -O3 -o dictionary dictionary.c ./xxHash-dev/xxhash.c
./dictionary words.txt
Time spent in hash = 9.565276
Using xxHash and 262144 table buckets
There are 44042 zero buckets out of 466551 words
248449 collisions occured where 218102 buckets were used
The average bucket length was 3.851463
Considering the total 64 bit hash 0 collisions occured
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment