-
-
Save totetmatt/20027dc562274579a07efc3773d0e5d1 to your computer and use it in GitHub Desktop.
#!/bin/bash | |
# set -x | |
KEY=$1 | |
FILE=$2 | |
# If file doesn't exists | |
if [ ! -f "$FILE" ] | |
then | |
touch $FILE | |
fi | |
# Check file size | |
FILE_SIZE=$(< $FILE wc -l) | |
# If empty file just insert | |
if [ "$FILE_SIZE" = "0" ] | |
then | |
$(echo $KEY > $FILE) | |
echo 1 | |
exit 0 | |
fi | |
# Binary Search & Insert | |
LOW=1 | |
HIGH=$FILE_SIZE | |
MID=$(($LOW + ($HIGH - $LOW)/2 )) | |
function lineEq () { | |
LINE="$(sed "$1!d" $FILE)" | |
if [ "$LINE" = "$2" ] # It's a match ! | |
then | |
echo "0" | |
elif [ "$LINE" \< "$2" ] # Search string is in higher ground (hello there) | |
then | |
echo "1" | |
else | |
echo "-1" # Search string is in lower ground | |
fi | |
} | |
while [ $LOW -le $HIGH ] | |
do | |
IS_MATCH=$(lineEq $MID $KEY) | |
if [[ "$IS_MATCH" -eq "0" ]] | |
then | |
echo $MID | |
exit 0 | |
elif [[ "$IS_MATCH" -eq "1" ]] | |
then | |
LOW=$(($MID+1)) | |
else | |
HIGH=$(($MID-1)) | |
fi | |
MID=$(($LOW + ($HIGH - $LOW)/2 )) | |
# echo $LOW $MID $HIGH | |
if [ "$MID" -gt "$HIGH" ] | |
then | |
break | |
fi | |
done | |
# Insert new value | |
if [ "$MID" -ge "$HIGH" ] #Special case were we insert at begining or end | |
then | |
if [ "$MID" == "1" ] | |
then | |
sed -i "$((MID))i$KEY" $FILE # Add at head | |
else | |
sed -i "$((MID-1))a$KEY" $FILE # Add at last | |
fi | |
else | |
sed -i "${MID}i$KEY" $FILE #Insert in the middle | |
fi | |
echo $MID |
Looks nice, but I wanted to avoid to use sort
again. I would agree it would be more readable.
My idea was to have a list of id that :
- If I visited I do nothing
- If I didn't visit I do something and flag it (so put it in the list)
So the idea was use the "miss" of a search to insert (as if you miss, you still know at which place it should be).
But i'm not sure of the cost of using sed
.
At the end that was also just ot practice Bash and algorithmic^^
Looks nice, but I wanted to avoid to use sort again. I would agree it would be more readable.
Cool. FWIW, I don't see a reason to avoid sort: the coreutils implementation is based on a merge sort, which is more efficient when the input is already sorted (best case complexity O(n)). So performance wise, it hardly gets any faster.
My idea was to have a list of id that :
- If I visited I do nothing
- If I didn't visit I do something and flag it (so put it in the list)
So the idea was use the "miss" of a search to insert (as if you miss, you still know at which place it should be).
You get the "miss" info out of the file size. If it has changed, you had not visited. Here is a version which handles that:
#!/bin/bash
# Create file if it does not exist
[ ! -f $2 ] && touch $2
# File size in bytes
SIZE=$(stat -c %s $2)
# Insert the new key if it is not already there
echo $1 | sort -u -o $2 $2 -
# Do something if the file size has changed
[ $(stat -c %s $2) != $SIZE ] && echo "Do something"
# Print the line of the key
grep -n "\<$1\>" $2 | cut -d: -f1
But i'm not sure of the cost of using sed .
Actually, your statement LINE="$(sed "$1!d" $FILE)"
of line 29 is causing your script to read the entire file up to the line $1
at every call of lineEq
, so you are reading half the file repeatedly.
At the end that was also just ot practice Bash and algorithmic^^
I get that, yeah. It's just that I saw your code and I wondered if I could do it in less lines :-)
How about this? Probably slightly less performant, but more readable to me (if I understood what your code does, that is).