Skip to content

Instantly share code, notes, and snippets.

@drslump
Last active April 11, 2016 10:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drslump/647efb8204537f9faf2628eaba5a3577 to your computer and use it in GitHub Desktop.
Save drslump/647efb8204537f9faf2628eaba5a3577 to your computer and use it in GitHub Desktop.
proc djb2 { {input ""} {result 5381} } {
foreach c [split $input ""] {
set result [expr {($result << 5) + $result + [scan $c %c] & 0xFFFFFFFF}]
}
return $result
}
proc bench1 {} {djb2 "ANOLPMAP447568116268"}
proc bench2 {} {djb2 "447568116268" 2860308605}
# Full hash using the salt
puts [djb2 "ANOLPMAP447568116268"]
# Optimized hash with the precomputed salt
puts [djb2 "447568116268" 2860308605]
puts [time {bench1} 1000]
puts [time {bench2} 1000]
@drslump
Copy link
Author

drslump commented Apr 10, 2016

Another version optimized for digits only, about 3x faster:

proc djb2 { {input ""} {result 5381} } {

  foreach c [split $input ""] {
    switch $c {
      "0" { set ord 48 }
      "1" { set ord 49 }
      "2" { set ord 50 }
      "3" { set ord 51 }
      "4" { set ord 52 }
      "5" { set ord 53 }
      "6" { set ord 54 }
      "7" { set ord 55 }
      "8" { set ord 56 }
      "9" { set ord 57 }
      default { set ord [scan $c %c] }
    }

    set result [expr {($result << 5) + $result + $ord & 0xFFFFFFFF}]
  }

  return $result
}

@franmrl
Copy link

franmrl commented Apr 11, 2016

Does it make sense to replace the switch (which looks it might involve ugly string comparisons) with a Tcl list?

set t {48 49 50 51 52 53 54 55 56 57}
set ord [lindex $t $c]

Or, going one step ahead, "abusing" the fact that it's continuous:

set ord = $c + 48

might be slightly faster. I don't know jack about Tcl, but I'm used to the fact that switch is a terrible operation for the cpu as it's usually implemented as a bunch of branches that screw up with the branch predictor, so I followed the mantra to try to stay out of switch. Might be completely wrong in the context of Tcl. Or maybe it's not in the part consuming the most % of cpu and it's not relevant.

These will fail miserably if we don't have numbers, ofc, but I don't know if this is really an issue or not. I don't know if the + will be an issue neither.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment