Skip to content

Instantly share code, notes, and snippets.

@verdimrc
Last active April 11, 2024 02:12
Show Gist options
  • Save verdimrc/17d28839aa31b4d5c674d4d300a1c617 to your computer and use it in GitHub Desktop.
Save verdimrc/17d28839aa31b4d5c674d4d300a1c617 to your computer and use it in GitHub Desktop.
mountpoint-for-s3
#!/bin/bash
################################################################################
# NOTE for Slurm users: when Slurm is configured to enable cgroup, upon job
# completion Slurm will kill the mount-s3 process. This causes on-access error
# "transport not connected".
#
# [20240404] In the practical sense, running this script under srun will:
# - not work on pcluster-3.9.0 (ProctrackType=proctrack/cgroup)
# - probably work on SageMaker HyperPod (ProctrackType=proctrack/linuxproc)
#
# Below is a workaround the Slurm+cgroup constraint:
#
# scontrol show nodes -o | awk -F'[ |=]' '{print $2}' | xargs -n1 -I {} ssh {} $(pwd)/mount-s3.sh
#
# # Mass unmount (NOTE: use sudo if necessary, usually when mount-s3 process
# # already killed, but mountpoint still active.
# scontrol show nodes -o | awk -F'[ |=]' '{print $2}' | xargs -n1 -I {} ssh {} umount /tmp/haha
################################################################################
set -euo pipefail
BUCKET=xxxx
REGION=us-west-2
PREFIX=mountpoint-for-s3/ # Must not start with /, but must end with /
MOUNTDIR=/tmp/haha
declare -a ARGS=(
$BUCKET
$MOUNTDIR
--region $REGION
--prefix $PREFIX
--allow-delete
--allow-overwrite
#--auto-unmount
)
umount $MOUNTDIR || true
mkdir -p $MOUNTDIR
MSG=$(mount-s3 "${ARGS[@]}" 2>&1)
INSTANCE_ID=$(cat /sys/devices/virtual/dmi/id/board_asset_tag)
echo "$(hostname) ${INSTANCE_ID}:
$MSG"
# Verify mount
ESCAPED_MOUNTDIR=$(printf "%s\n" "$MOUNTDIR")
mount | grep "${ESCAPED_MOUNTDIR} type fuse" &> /dev/null \
|| { echo Error: failed to mount ; exit -1 ; }
# Emit useful information to S3 (via mountpoint) for immediate eyeballing
mkdir $MOUNTDIR/mount-s3-logs/ || true
echo "$MSG" &> ${MOUNTDIR}/mount-s3-logs/$INSTANCE_ID.txt
sync ; sync
echo "
\$ aws s3 cp s3://$BUCKET/${PREFIX}mount-s3-logs/$(hostname)-$INSTANCE_ID.txt -"
aws s3 cp s3://$BUCKET/${PREFIX}mount-s3-logs/$(hostname)-$INSTANCE_ID.txt -
#!/bin/bash
################################################################################
# Ultra defensive: multiple ways to make sure mount-s3 mount point
################################################################################
mount | grep ^mountpoint-s3 || { echo Not mounted... ; exit -1 ; }
# Make sure mount-s3 process still runs, otherwise fs operation on mount point
# will show "endpoint transport not connected" error.
ps hf -C mount-s3 || { echo mount-s3 process not running... ; exit -2 ; }
mount | grep ^mountpoint-s3 | cut -d' ' -f3 | while read LINE; do
#echo -e "\n$LINE:"
#find $LINE -type f | head -3
tree $LINE | head -5
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment