Skip to content

Instantly share code, notes, and snippets.

@jshahbazi
Last active April 3, 2020 19:27
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jshahbazi/5941740 to your computer and use it in GitHub Desktop.
Save jshahbazi/5941740 to your computer and use it in GitHub Desktop.
Daniel J. Bernstein's hash algorithm written in Fortran
integer function djb_hash(str) result(hash)
implicit none
character(len=*),intent(in) :: str
integer :: hash
integer :: i
hash = 5381
do i=1,len(str)
hash = (ishft(hash,5) + hash) + ichar(str(i:i))
end do
end function DJB_hash
@jacobwilliams
Copy link

This code has a bug. Putting the "=5381" in the variable declaration makes it an "implicit save" variable. This means that the next time djb_hash is called, hash will retain the value it had from the last time it was called. The correct version is below (with a few other modifications):

function djb_hash(str) result(hash)

    implicit none
    character(len=*),intent(in) :: str
    integer :: hash
    integer :: i

    hash = 5381

    do i=1,len(str)
        hash = (ishft(hash,5) + hash) + ichar(str(i:i))
    end do

end function DJB_hash

@jshahbazi
Copy link
Author

You are correct. Thank you!

@yantosca
Copy link

I also find that you can get identical hash values for multiple strings:

B3O2            -814373157
ATO2            -814373157

Maybe this is rare enough but this can throw a wrench into searches using the hashed values.

The low tech solution: search the hashes a-postieriori, and then add another integer to one of the duplicate values.

@francis36
Copy link

The hash codes are similar because ishft is used instead of ishftc. Indeed, ishft is not circular, and left bits are discarded as the shift is performed. My guess is that ishftc should be used instead. Just a letter more :

function djb_hash(str) result(hash)

implicit none
character(len=*),intent(in) :: str
integer :: hash
integer :: i

hash = 5381

do i=1,len(str)
    hash = (ishftc(hash,5) + hash) + ichar(str(i:i))
end do

end function DJB_hash

@yantosca
Copy link

yantosca commented Apr 3, 2020

I actually tried this and you still get hash collisions under certain circumstances:

 ### hash of text "IO" :    615763273
 ### hash of text "INA":    615763273

I think that by the nature of hashing, there will always be cases where collisions occur. You will probably always need some kind of "tiebreaker" algorithm to resolve collisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment