Skip to content

Instantly share code, notes, and snippets.

@terrancesnyder
Last active January 31, 2020 02:38
Show Gist options
  • Save terrancesnyder/3ab8ec188e7be0d7c7b7afde6bf2253b to your computer and use it in GitHub Desktop.
Save terrancesnyder/3ab8ec188e7be0d7c7b7afde6bf2253b to your computer and use it in GitHub Desktop.
automated merge hbase 0.98+ - Bash script that will invoke hshell to query for regions and then perform a merge_region command but only within each regionserver (to avoid data copy/locality issues)
#!/bin/bash
TABLE=$1
MAX=$2
echo "Examining Table $TABLE...."
echo "scan 'hbase:meta',{ COLUMNS => 'info:server', FILTER=>\"PrefixFilter('$TABLE')\"}" | hbase shell > "$TABLE.out" 2>&1
echo "Making splits $TABLE.splits"
echo "" > "$TABLE.splits"
while read p; do
regex="$TABLE,[A-Z,0-9][A-Z,0-9],[0-9]+\.([^\s].+)\. column.+value=(.+)$"
if [[ $p =~ $regex ]]
then
server="${BASH_REMATCH[2]}"
region="${BASH_REMATCH[1]}"
echo "$server,$region" >> "$TABLE.splits"
fi
done < "$TABLE.out"
splits=$(wc -l < "$TABLE.splits")
echo "... total splits $splits"
if [[ $2 -ge $splits ]]
then
echo "Region Size Threshold Already Met: $2 vs $splits"
exit 0
fi
echo "-------- Merge ------"
while read p; do
regex="$TABLE,[A-Z,0-9]+,[0-9]+\.([^\s].+)\. column.+value=(.+)$"
if [[ $p =~ $regex ]]
then
i=$((i+1))
region="${BASH_REMATCH[1]}"
if (( $i % 2 == 0 ))
then
region2=$region
echo "--- [+] $TABLE merge region $region1 <---> $region2"
echo "merge_region '$region1', '$region2', true" | hbase shell > /dev/null 2>&1
else
region1=$region
fi
fi
done < "$TABLE.out"
# if we haven't compressed down run again
# note, depending on hbase, you could get in a situation
# were you physically can't compress down anymore than
# what is allowed by hbase, in those cases this could get
# stuck in a loop, maybe need a fail-safe here to ensure
# we dont end up like that
./merge.sh $TABLE $MAX
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment