Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Snowflake reconnaisance

Snowflake Reconnaissance

About

This page will assist you in retrieving important information from so called Snowflake Server which exist in your infrastructure unversioned and with unknown status. The basic idea is that system files in a modern system are under packages control, which enables us to do following:

  • verify files and look for changed and missing files and other diversions
  • identify files that do not belong to the system
  • enable you to do a proper backup of the important (changed) files only
  • and effectively transform a snowflake to become part of your fleet (IaC)

Still we have to take care of caches, temporary files and other package managers on the target system.

Remarks

  • Debian: we ignore packages in rc-state (package uninstalled, configs remain)

  • We just do use file-based backup (directories are ignored, yet we do have directories-list)

  • We ignore following for backup

    • HOME directory is ignored except for non-system package lists
    • Cache Files
    • Packages Installed by Python, GO, Rust, NPM, Yarn, Docker, K8s, Snap, Flatpak
    • Temporary Files
  • Shell examples are optimized for parallelism

Todo

  • Get repositories and origin of packages

  • RPM: Do we have something like package diversions here?

Utilities required

  • fd - parallel find done with ❤ in Rust
  • GNU parallel - perl scripts around semaphores
  • ripgrep - parallel grip with PCRE2 regex done in ❤ in Rust
  • csvq - convert csv into sqlite and query done with ❤ in Go
  • neofetch - system information tool
  • rewrite - tool like moreutils/sponge that buffers content of a pipe and replaces a file

I encourage the use of New Minimalist software following the Minimal Software manifest (Minifesto). That is minimalist programs built in duct-tape UNIX fashion that take advantage of 21st century terminal capabilites and modern programming languages like Go and Rust with high parallisms and cloud-deployment (static binary) in mind.

If you are an advanced Linux user please look at suckless-philosophy and links. You may translate fd and rg commands here to find and grep.

Preparation

Binaries

  1. Package with 64bit ELF is provided here otherwise see appendix.

  2. Extract static-tools to either /usr/local (system wide) or $HOME/.local/ and ensure PATH="$PATH:$HOME/.local/bin/

Excluded Files

Place following asexcluded-files into you working directory:

/boot
/proc
/dev
/sys
/proc
/run
/home
/root/
/usr/**/go/**
/usr/lib/python
/usr/lib/node_modules
/usr/lib/yarn_modules
/usr/lib/google-cloud-sdk
/usr/local/gocache
/var/log
/var/lib/apt
/var/lib/jvm
/var/lib/snapd
/var/cache
/var/mail
/var/spool
/var/tmp
/var/lib/yum
/snap
/usr/**/python*/*-packages*/**
/var/lib/snap
/var/lib/docker
/tmp
/var/lib/flatpak
/var/tmp
__pycache__

Common Information Recon

Root: Make sure you are root on the target system since otherwise you may not retrieve every information.

# Dump system information
neofetch --stdout > system-information.list

# Disk layout
lsblk  -o NAME,FSTYPE,SIZE,TYPE,RO,RM,LABEL,MODEL,MOUNTPOINT,UUID > disk-layout.list

# Service status
service --status-all > services-status.list

# SystemD status
systemctl > systemd-status.list

# IP
ip rule > ip.rule.list
ip addr > ip.addr.list
ip route > ip.route.list

# SystemD status
resolvectl > resolve-status.list

# We export mountpoints to to a file since we don't want them in the backup
findmnt -l | cut 	-d' ' -f 1 | grep -v '^/$' > sys.mountpoints.list

# We scan filesystem and exclude some common variable folders
fd . --color=never -t f --full-path / --ignore-file sys.mountpoints.list --ignore-file excluded-files \
	| sort  > sys.all-files.list

RPM-based Package Recon

# Export a list of RPM packages
rpm -qa > rpm.packages.list

# Export a list of all files packages would install
rpm -qa | xargs -I {} echo rpm -ql {}  | parallel -j 16 | sort > rpm.all-files.list

# Export a list of files with diverging status
rpm -qa | xargs -I {} echo rpm -V {}  | parallel -j 16 | sort > rpm.files-status.list

# For testing only: if we do not have access to all file we filter out 'permission denied'
rg -N -v -i 'permission denied' rpm.files-status.list | rewrite rpm.files-status.list

# We extract a list of missing files. These files are expected to exist for some reason.
cat rpm.files-status.list | rg '^missing' | cut -c13- > rpm.missing-files.list

# We extract a list of changed files. These files do divert from package versions.
cat rpm.files-status.list | rg '^[\S]+[\s]*c[\s]' | rg -v '^missing' | cut -c14- > rpm.changed-files.list


# We create a list of fiels to backup
comm -23 sys.all-files.list rpm.all-files.list > backup-files.list

# And add our changed files
cat rpm.changed-files.list >> backup-files.list

Debian-based Package Recon

dpkg -l > sys.packages.list
cat sys.packages.list | grep '^ii' | awk '{ print $2 }' > sys.packages.installed.list
SIZE=$(wc -l sys.packages.installed.list | awk '{print $1}')
cat sys.packages.installed.list | pv -l -s $SIZE | xargs -n 32 | xargs -I@ echo "debsums -c @" |  parallel -u -N 1 1>debsum.tmp 2>>(tee -a debsum.tmp)
cat sys.packages.installed.list | xargs -n 48 | xargs -I@ echo "debsums -c @" | parallel -N 1 1>debsum.tmp 2>&1

cat debsum.tmp | rg --color=never -e 'missing file (.*) \(from\s([^\s]*)' -r '$1' -> sys.files.missing.list
cat debsum.tmp | rg --color=never -e '^/.+' -o > sys.files.changed.list
dpkg-divert --list | rg 'diversion of (.+) to (.+) by (.+)' --replace "\$3: \$2 to \$1" > sys.files.diversion.list


cat packages.deb.installed.list | xargs -n 48 | xargs -I@ echo "dpkg -L @" | parallel -N 1 | sort | uniq | rg -v '^/.$' > dpkg.all-files.list
comm -23 all-fs-files.list dpkg.all-files.list > backup.list

cat debsum.tmp | rg --color=never -e 'missing file (.*) \(from\s([^\s]*)' -r '$2: $1' -o 2>&1

Other System Packages Manages

No plans to do that. Probably you wanna tryout Arch, Clear Linux or nixos?

Secondary Package Managers

# Export list(s) if Python packages (all python versions)
compgen -c | rg '^(python([-0-9\.]+)?)$' --replace '$1' | sort | xargs -I {} sh -c "{} -m pip freeze > {}-packages.list"

# Export a list of global NPM modules
sudo npm -g list | rg '^([└├]─[┬─] )(.+)$' --replace '$2' > $BACKUP_PATH/npm.root.list
npm -g list | rg '^([└├]─[┬─] )(.+)$' --replace '$2' > $BACKUP_PATH/npm.$USER.list

# Export a list of global Yarn modules 
yarn global list > $BACKUP_PATH/yarn.$USER.list
sudo yarn global list > $BACKUP_PATH/yarn.root.list

# Export a list of installed GO packages
sudo bash -c 'cd $GOPATH && go list ...' > go-packages.root.list
( cd $GOPATH && go list ... ) > go-packages.$USER.list

# Export a list of installed Rust crates
sudo cargo install --list > rust-packages.root.list
cargo install --list > rust-packages.$USER.list

# Export a list of flatpaks
flatpak list -d > flatpak.list

# Export a list snaps
snap list > snap.list

Analysing Results

# Prefilter files and directories for comparision 
# - otherwise du woukld would scan whole directory including files not targeted for backup!
cat backup.list | while IFS= read -r f; do [[ -f "$f" ]] && echo "$f"; done  > backup.files.list
cat backup.list | while IFS= read -r f; do [[ -d "$f" ]] && echo "$f"; done  > backup.dirs.list

# Prepare some stats on directories
cat backup.files.list | xargs -d \\n du -bs | rg -v '^0' | rg '(/etc|/srv|(/opt|/srv|/var/lib|/var/local|/usr/local|/usr|/[^/]+)(/[^/]*)?).*/[^/]+$' --replace '${1}' > directory-sizes.csv

# Top 50 directories taking up disk space
csvq --no-header -N -D '\t' -f CSV -d '\t' -q 'SELECT SUM(c1), c2 FROM `directory-sizes.csv` GROUP BY c2 ORDER BY SUM(c1) DESC' | head -n 50 | awk '{ split( "KB MB GB TB" , v ); s=1; while( $1>1024 ){ $1/=1024; s++ } print int($1) v[s]"\t"$2 }'

# Total size of files from backup.list
cat backup.files.list | perl -ne 'chomp(); if (-f $_) {print "$_\n"}' | xargs -d \\n du -bs | awk '{i+=$1} END {print i}' | numfmt --to=iec-i --suffix=B --format="%3f"

# Number of files in backup.list
cat backup.files.list | perl -ne 'chomp(); if (-f $_) {print "$_\n"}' | wc -l

Actual Backup by Filelist

Be sure nothing is writing to files! Just one oldskool way…

tar -czvpf backup-files.tar.gz -T backup.files.list

# or if available you could use pigz
tar --use-compress-program="pigz --recursive" -cf archive.tar.gz YourData

# or maybe borg backup?

Appendix

Enable bash line editor

Good for pasting and editing multiline content (use Ctrl-C-E to enter!)

set -o vi
export VISUAL="micro"
bind -m vi-insert '"\C-x\C-e": edit-and-execute-command'

ZSH filtering

ZSH is able to do simple filtering for directories and files using following commands

printf '%s\n' **/*(D/)   # directories
printf '%s\n' **/*(D-/)  # directories, or symlinks to directories
printf '%s\n' ***/*(D/)  # directories, traversing symlinks
printf '%s\n' ***/*(D-/) # directories or symlinks to directories,
                         # traversing symlinks

Strategy: Cloning Packages

Possible by apt https://wiki.ubuntuusers.de/apt-clone/

apt-clone clone .

Simplier than apt-get, apt, apt-file, apt-search – wajig!

https://wiki.debian.org/Wajig

Installing Tools

##
# Transer static-tools to remote system

# system wide installation
PREFIX="/usr/local" 

# user installation
PREFIX="$HOME/.local/bin"

# or maybe cache before copying to server (SSH_HOST to your host)
PREFIX="$HOME/.cache/raclette/static-tools/x64"

# Ensure ~/.local/bin is within BASH PATH env
ssh $SSH_HOST 'if ! grep -qE "PATH.+/.local/bin" $HOME/.bashrc; then echo ''PATH="$PATH:$HOME/.local/bin/"'' >> $HOME/.bashrc; fi'

# Ensure ~/.local exist
ssh $SSH_HOST 'if -e .local; then mkdir .local; fi'

# Rsync
rsync -zavh $HOME/.cache/raclette/static-tools/x64/ $SSH_HOST:.local/

# Tarpipe
tar -C $HOME/.cache/raclette/static-tools/x64/ -czvpf - . | ssh $SSH_HOST "tar -C ~/.local -zxf -"
# with progress added
tar -C $HOME/.cache/raclette/static-tools/x64/ -czvpf /tmp/static-tools.tar.gz .
pv -S /tmp/static-tools.tar.gz | ssh $SSH_HOST "tar -C ~/.local -zxf -"

Compiling Static Tools

###
# Compile static tools

TMP_PATH="$(mktemp -d -t "static-tools-XXXX")"
# PREFIX="$HOME/Incoming/static-tools/x64" # don't forget!
mkdir -p "$PREFIX/bin"
mkdir -p "$PREFIX/share/man/man1"

##
# fd 
# A simple, fast and user-friendly alternative to 'find' 
# https://github.com/sharkdp/fd
wget -O $TMP_PATH/fd.tar.gz https://github.com/sharkdp/fd/releases/download/v7.3.0/fd-v7.3.0-x86_64-unknown-linux-musl.tar.gz
tar -C $PREFIX/bin --strip-components=1 -zxvf $TMP_PATH/fd.tar.gz --wildcards --no-anchored 'fd'
tar -C $PREFIX/share/man/man1 --strip-components=1 -zxvf $TMP_PATH/fd.tar.gz  --wildcards --no-anchored 'fd.1'

##
# RipGrep
# Ripgrep recursively searches directories for a regex pattern (alternative to sed)
# https://github.com/BurntSushi/ripgrep
wget -O $TMP_PATH/rg.tar.gz https://github.com/BurntSushi/ripgrep/releases/download/11.0.1/ripgrep-11.0.1-x86_64-unknown-linux-musl.tar.gz
tar -C "$PREFIX/bin" --strip-components=1 -zxvf $TMP_PATH/rg.tar.gz --wildcards --no-anchored '*/rg'

##
# Neofetch 
# A command-line system information tool written in bash 3.2+
# https://github.com/dylanaraps/neofetch
wget -O $TMP_PATH/neofetch.tar.gz https://github.com/dylanaraps/neofetch/archive/6.0.0.tar.gz
tar -C "$PREFIX/bin" --strip-components=1 -zxvf $TMP_PATH/neofetch.tar.gz --wildcards --no-anchored 'neofetch'
tar -C "$PREFIX/share/man/man1" --strip-components=1 -zxvf $TMP_PATH/neofetch.tar.gz --wildcards --no-anchored 'neofetch.1'

##
# GNU parallels
# a shell tool for executing jobs in parallel using one or more computers
# https://www.gnu.org/software/parallel/
wget -O $TMP_PATH/parallel.tar.bz2 http://ftp.gnu.org/gnu/parallel/parallel-20190422.tar.bz2
tar -C "$PREFIX/bin" --strip-components=2 -jxvf $TMP_PATH/parallel.tar.bz2 --wildcards --no-anchored 'parcat' 'parset' 'niceload' 'sql' 'sem' 'env_parallel' 'env_parallel.bash' 'env_parallel.zsh' 'parallel'
tar -C "$PREFIX/share/man/man1" --strip-components=2 -jxvf $TMP_PATH/parallel.tar.bz2 --wildcards --no-anchored 'parcat.1' 'parset.1' 'niceload.1' 'sql.1' 'sem.1' 'parallel.1'

##
# Rewrite
# An in-place file rewrite utility, useful for redirecting output to same file as source (alternative to moretools/sponge)
# https://github.com/neosmart/rewrite
if docker ps -a --format '{{.Names}}' | grep -q build; then
	docker rm -f build
fi
docker create --name build -v $(pwd):/share -w /share rust:1.34 sleep 3600
docker start build
docker exec build rustup target add x86_64-unknown-linux-musl
docker exec build cargo install --force --target x86_64-unknown-linux-musl rewrite
docker exec build bash -c 'cp $(which rewrite) /share/rewrite'
docker exec build chown $(id -u):$(id -g) /share/rewrite
docker rm -f build
cp rewrite "$PREFIX/bin"


##
# csvq
# SQL-like query language for csv
# https://github.com/mithrandie/csvq
wget -O $TMP_PATH/csvq.tar.gz https://github.com/mithrandie/csvq/releases/download/v1.10.6/csvq-v1.10.6-linux-amd64.tar.gz
tar -C "$PREFIX/bin" --strip-components=1 -zxvf $TMP_PATH/csvq.tar.gz --wildcards --no-anchored 'csvq'

##
# ncdu 
# NCurses Disk Usage (easily clean up space)
# https://dev.yorhel.nl/ncdu
wget -O $TMP_PATH/ncdu.tar.gz https://dev.yorhel.nl/download/ncdu-linux-i486-1.14.tar.gz
tar -C $PREFIX/bin -xzvf  $TMP_PATH/ncdu.tar.gz
				
##
# dutree 
# A tool to analyze file system usage written in Rust
# https://github.com/nachoparker/dutree
if docker ps -a --format '{{.Names}}' | grep -q build; then
	docker rm -f build
fi
docker create --name build -v $(pwd):/share -w /share rust:1.34 sleep 3600
docker start build
docker exec build rustup target add x86_64-unknown-linux-musl
docker exec build cargo install --force --target x86_64-unknown-linux-musl dutree
docker exec build bash -c 'cp $(which dutree) /share/dutree'
docker exec build chown $(id -u):$(id -g) /share/dutree
docker rm -f build
cp dutree "$PREFIX/bin"

##
# GNU parallels in Rust (beta)
# https://github.com/mmstick/parallel
if docker ps -a --format '{{.Names}}' | grep -q build; then
	docker rm -f build
fi
docker create --name build -v $(pwd):/share -w /share rust:1.34 sleep 3600
docker start build
docker exec build rustup target add x86_64-unknown-linux-musl
docker exec build apt update
docker exec build apt -y install musl-tools
docker exec build cargo install --force --target x86_64-unknown-linux-musl parallel
docker exec build bash -c 'cp $(which parallel) /share/parallel'
docker exec build chown $(id -u):$(id -g) /share/parallel
docker rm -f build
cp parallel "$PREFIX/bin/parallel-rust"

##
# Amber
# A code search / replace tool (alternative to sed)
# https://github.com/dalance/amber
wget -O "$TMP_PATH/ambr.zip" https://github.com/dalance/amber/releases/download/v0.5.1/amber-v0.5.1-x86_64-lnx.zip
unzip -j -x -o -d "$TMP_PATH" "$TMP_PATH/ambr.zip" 
mv "$TMP_PATH/ambr" "$PREFIX/bin/ambr"
mv "$TMP_PATH/ambs" "$PREFIX/bin/ambs"

##
# Micro
# A modern and intuitive terminal-based text editor (alternative to nano)
# https://github.com/zyedidia/micro
wget -O micro.tar.gz https://github.com/zyedidia/micro/releases/download/v1.4.1/micro-1.4.1-linux64.tar.gz
tar -C "$PREFIX/bin" --strip-components=1 -zxvf micro.tar.gz --wildcards --no-anchored '*/micro'

##
# lnav 
# Log File Navigator
# https://github.com/tstack/lnav
wget -O "$TMP_PATH/lnav.zip" https://github.com/tstack/lnav/releases/download/v0.8.4/lnav-0.8.4-linux-64bit.zip
ls -R $TMP_PATH
unzip -j -x -o -d "$TMP_PATH" "$TMP_PATH/lnav.zip"  "*/lnav"
mv "$TMP_PATH/lnav" "$PREFIX/bin/lnav"

###
# Pipe Viewer
# Terminal-based tool for monitoring the progress of data through a pipeline
# http://www.ivarch.com/programs/pv.shtml
if docker ps -a --format '{{.Names}}' | grep -q build; then
    docker rm -f build
fi
wget -O $TMP_PATH/pv.tar.gz http://www.ivarch.com/programs/sources/pv-1.6.6.tar.gz
tar -C "$TMP_PATH" --strip-components=1 -zxvf $TMP_PATH/pv.tar.gz
docker create --name build -v $TMP_PATH:/share -w /share alpine sleep 3600
docker start build
docker exec build apk add bash make gcc libc-dev ncurses-static ncurses-dev ncurses pcre2-dev
docker exec build env CPPFLAGS=-static\ -static-libgcc LDFLAGS=-s ./configure
docker exec build make clean
docker exec build make pv-static
docker exec build chown $(id -u):$(id -g) pv-static
docker rm -f build
cp $TMP_PATH/pv-static "$PREFIX/bin/pv"
cp $TMP_PATH/doc/quickref.1.in "$PREFIX/share/man/man1/pv.1"

###
# Pigz
if docker ps -a --format '{{.Names}}' | grep -q build; then
    docker rm -f build
fi
wget -O $TMP_PATH/pigz.tar.gz https://zlib.net/pigz/pigz-2.4.tar.gz
tar -C "$TMP_PATH" --strip-components=1 -zxvf $TMP_PATH/pigz.tar.gz
sed -r 's/^CFLAGS.+/CFLAGS=-static -O3 -Wall -Wextra -Wno-unknown-pragmas/' -i Makefile
sed -r 's/^LDFLAGS.+/LDLAGS=-static -fPIC/' -i Makefile
sed -r 's/^LIBS=-lm -lpthread -lz/' -i Makefile
docker create --name build -v $TMP_PATH:/share -w /share alpine sleep 3600
docker start build
docker exec build apk add bash make gcc libc-dev ncurses-static ncurses-dev ncurses pcre2-dev
gcc -s -fPIC -o pigz pigz.o yarn.o try.o deflate.o blocksplitter.o tree.o lz77.o cache.o hash.o util.o squeeze.o katajainen.o -lm -lpthread 



##
# pv
docker exec build make 
docker exec build make pv-static
docker exec build chown $(id -u):$(id -g) pv-static
docker rm -f build
cp $TMP_PATH/pv-static "$PREFIX/bin/pv"
cp $TMP_PATH/doc/quickref.1.in "$PREFIX/share/man/man1/pv.1"
https://zlib.net/pigz/pigz-2.4.tar.gz

##
# up
wget -O "$PREFIX/bin/up" https://github.com/akavel/up/releases/download/v0.3.2/up
chmod a+x "$PREFIX/bin/up"
	
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.