Skip to content

Instantly share code, notes, and snippets.

@eidosam
Last active July 9, 2020 15:25
Show Gist options
  • Save eidosam/486c717e5aafcbcdae2193173d9f4a1b to your computer and use it in GitHub Desktop.
Save eidosam/486c717e5aafcbcdae2193173d9f4a1b to your computer and use it in GitHub Desktop.
Inspect parquet files stored in S3 using `parquet-tools`
#!/usr/bin/env bash
set -o errexit
### ------ Preparation ------ ###
# brew tap adoptopenjdk/openjdk && brew cask install adoptopenjdk8
# command -v mvn > /dev/null || brew install maven
# command -v parquet-tools > /dev/null || brew install parquet-tools
# brew cask list --versions osxfuse > /dev/null || brew cask install osxfuse
# command -v goofys > /dev/null || brew install goofys
### ------ -------------- --- ###
profile=default
cmd=head
while [ -n "$1" ]; do
case $1 in
--profile )
shift
profile=$1
;;
--command )
shift
cmd=$1
;;
* )
inputfile=$1
;;
esac
shift
done
inputfile=${inputfile/s3\:\/\//}
key=${inputfile#*/}
bucket=${inputfile%%/*}
if [[ -z "$key" ]] || [[ -z "$bucket" ]]
then
echo "
Invalid S3 file path
"
exit 1
fi
prefix=`dirname ${key}`
file=`basename ${key}`
mountingpoint=~/.mounting-points/${prefix}
mkdir -p ${mountingpoint}
function _umount() {
mount | grep ${mountingpoint} > /dev/null && umount ${mountingpoint} || true
}
trap _umount EXIT
goofys \
--profile ${profile} \
${bucket}:${prefix} \
${mountingpoint}
parquet-tools ${cmd} ${mountingpoint}/${file} 2> /dev/null
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment