-
-
Save jshahbazi/5941740 to your computer and use it in GitHub Desktop.
integer function djb_hash(str) result(hash) | |
implicit none | |
character(len=*),intent(in) :: str | |
integer :: hash | |
integer :: i | |
hash = 5381 | |
do i=1,len(str) | |
hash = (ishft(hash,5) + hash) + ichar(str(i:i)) | |
end do | |
end function DJB_hash |
I also find that you can get identical hash values for multiple strings:
B3O2 -814373157
ATO2 -814373157
Maybe this is rare enough but this can throw a wrench into searches using the hashed values.
The low tech solution: search the hashes a-postieriori, and then add another integer to one of the duplicate values.
The hash codes are similar because ishft is used instead of ishftc. Indeed, ishft is not circular, and left bits are discarded as the shift is performed. My guess is that ishftc should be used instead. Just a letter more :
function djb_hash(str) result(hash)
implicit none
character(len=*),intent(in) :: str
integer :: hash
integer :: i
hash = 5381
do i=1,len(str)
hash = (ishftc(hash,5) + hash) + ichar(str(i:i))
end do
end function DJB_hash
I actually tried this and you still get hash collisions under certain circumstances:
### hash of text "IO" : 615763273
### hash of text "INA": 615763273
I think that by the nature of hashing, there will always be cases where collisions occur. You will probably always need some kind of "tiebreaker" algorithm to resolve collisions.
You are correct. Thank you!