Skip to content

Instantly share code, notes, and snippets.

@chr15m
Last active April 12, 2023 00:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chr15m/fb257ae8bb9774245b0ec67d9c7d388b to your computer and use it in GitHub Desktop.
Save chr15m/fb257ae8bb9774245b0ec67d9c7d388b to your computer and use it in GitHub Desktop.
Hash map / dictionary / associative array implementation for bash
#!/bin/bash
# Bash hash-map which works in Bash 3.
# WARNING: this code is useless and you should not use it. See the comments.
# Hashing function from Adam Katz: https://stackoverflow.com/a/12945518
ht() {
local h=0 i
for (( i=0; i < ${#1}; i++ )); do
let "h=( (h<<5) - h ) + $(printf %d \'${1:$i:1})"
let "h |= h"
done
printf "$h"
}
# Hash map functions
# (collisions are handled)
map=()
function put {
local hash="$(ht "$1")"
map["$hash"]="$1=$2"
}
function get {
local hash="$(ht "$1")"
local value=""
for item in "${map[@]}"; do
if [ "${item%%=*}" == "$1" ]; then
value="${item#*=}"
break
fi
done
echo "$value"
}
function contains_key {
local hash="$(ht "$1")"
local result="false"
for item in "${map[@]}"; do
if [ "${item%%=*}" == "$1" ]; then
result="true"
break
fi
done
echo "$result"
}
# Example usage
put "key1" "value1"
put "key2" "value2"
value1=$(get "key1")
value2=$(get "key2")
echo "key1: $value1"
echo "key2: $value2"
contains_key_result1=$(contains_key "key1")
contains_key_result2=$(contains_key "key3")
echo "key1 exists: $contains_key_result1"
echo "key3 exists: $contains_key_result2"
@adamhotep
Copy link

This code neither uses hashes optimally nor avoids collisions.

get and contains_key both set a local $hash variable using the ht hashing function, but they then completely ignore it. Instead, they loop through the entire array. You might get lucky and short-circuit early, but this loses the primary reason to use a hash data type.

How does this avoid collisions if put has no logic to do anything different if ht returns a code that collides with another element? A standard collision avoidance implementation would generate the checksum, look it up to see if it is already in the hash, then verify if the key is identical. If the key is not identical, it would create a new entry somehow. One way to do this would be a nested data structure. For example, since you're already using = as a special character, you could build a value like key1=value1==key2=value2 and then split on == and loop through the results until the key matches.

@chr15m
Copy link
Author

chr15m commented Apr 12, 2023

Lol yes you are correct, this is a truly awful solution and nobody should use it.

My first impulse is to remove this gist but I think I am going to leave it up here to serve as a warning to others. It's incredibly embarrassing that a veteran software developer like me, with bash Git repos with hundreds of stars, could have made such an elementary mistake. The way this happened is quite simple. I trusted the output from a GPT large language model after running the code but without taking a few moments to properly read it. If I had done so instead of breathlessly posting the link on Stackoverflow I would have come to the same conclusion you did and not had my ass handed to me in public. This has been a very valuable lesson. In future I will be much more wary of the output from these GPTs.

Adam thank you for taking the time to comment on and utterly destroy this implementation, you have done me a great service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment