Skip to content

Instantly share code, notes, and snippets.

@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active February 8, 2023 05:11
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@matthewmueller
matthewmueller / osx-for-hackers.sh
Last active April 21, 2024 03:30
OSX for Hackers (Mavericks/Yosemite)
# OSX for Hackers (Mavericks/Yosemite)
#
# Source: https://gist.github.com/brandonb927/3195465
#!/bin/sh
# Some things taken from here
# https://github.com/mathiasbynens/dotfiles/blob/master/.osx
# Ask for the administrator password upfront
#!/bin/bash
set -euo pipefail
if [ $# -eq 0 ]; then
sleep 0.2 # xmonad holds keyboard; race condition fix
scrot -s -e "$0 \$f"
exit
fi
@SlexAxton
SlexAxton / .zshrc
Last active April 25, 2023 03:57
My gif workflow
gifify() {
if [[ -n "$1" ]]; then
if [[ $2 == '--good' ]]; then
ffmpeg -i $1 -r 10 -vcodec png out-static-%05d.png
time convert -verbose +dither -layers Optimize -resize 600x600\> out-static*.png GIF:- | gifsicle --colors 128 --delay=5 --loop --optimize=3 --multifile - > $1.gif
rm out-static*.png
else
ffmpeg -i $1 -s 600x400 -pix_fmt rgb24 -r 10 -f gif - | gifsicle --optimize=3 --delay=3 > $1.gif
fi
else
@brandonb927
brandonb927 / osx-for-hackers.sh
Last active May 2, 2024 03:13
OSX for Hackers: Yosemite/El Capitan Edition. This script tries not to be *too* opinionated and any major changes to your system require a prompt. You've been warned.
#!/bin/sh
###
# SOME COMMANDS WILL NOT WORK ON macOS (Sierra or newer)
# For Sierra or newer, see https://github.com/mathiasbynens/dotfiles/blob/master/.macos
###
# Alot of these configs have been taken from the various places
# on the web, most from here
# https://github.com/mathiasbynens/dotfiles/blob/5b3c8418ed42d93af2e647dc9d122f25cc034871/.osx