Skip to content

Instantly share code, notes, and snippets.

@JohannesBuchner
JohannesBuchner / logdeduplicator.py
Created April 6, 2022 10:11
Strips duplicated and repeated lines from stdin (such as a log output). Can also handles multiline repeats, up to a configurable memory limit.
import os
import sys
max_memory = int(os.environ.get('MAX_MEMORY', '10'))
recent_lines = []
for line in sys.stdin:
if line not in recent_lines:
sys.stdout.write(line)
@JohannesBuchner
JohannesBuchner / shrink-folder.sh
Created March 18, 2022 23:05
Delete files until folder is smaller than 1GB
find $FOLDER -maxdepth 1 -type f -printf '%s\t%p\n' |
{ S=0; while read s l; do
((S+=s)); [[ $S -gt 1000000000 ]] && rm -v "$l";
done; }
@JohannesBuchner
JohannesBuchner / convert_setup_py_to_pyproject_toml.py
Created March 17, 2022 16:42
Convert setup.py to pyproject.toml (WIP)
# Help create a pyproject.toml from a setup.py file
#
# USAGE:
# 1)
# replace "from [a-z.]* import setup" in your setup.py
# with "from convert_setup_py_to_pyproject_toml import setup"
# 2)
# run the resulting script with python, with this script in the PYTHONPATH
#
# The above can be achieved on Linux, for example, with:
@JohannesBuchner
JohannesBuchner / supertar.sh
Created September 22, 2021 12:15
Better tar file compression by sorting similar files together
# Compression can be improved when files with the same or similar content
# are next to each other in the file list.
#
# This command sorts by reversed filenames, which places files
# together by file extension, filename and path, in that order.
# identify all files
find mypath/ -type f |
rev | sort | rev |
tar --no-recursion --files-from=- -cvzf myarchive.tar.gz
@JohannesBuchner
JohannesBuchner / cmdcache.py
Created March 1, 2021 10:26
Cache/Memoize any command line program. Keeps stdout, stderr and exit code, env aware.
import sys, os
import joblib
import subprocess
mem = joblib.Memory('.', verbose=False)
@mem.cache
def run_cmd(args, env):
process = subprocess.run(args, capture_output=True, text=True)
return process.stdout, process.stderr, process.returncode
@JohannesBuchner
JohannesBuchner / joss_make_latex.sh
Created January 23, 2021 22:30
Make LaTeX and PDF from JOSS markdown papers
#!/bin/bash
# you need to install:
# pip install openbases
# sudo apt install texlive-xetex pandoc pandoc-citeproc
PDF_INFILE=paper.md
PDF_LOGO=logo.png
PDF_OUTFILE=paper.pdf
TEX_OUTFILE=paper.tex
@JohannesBuchner
JohannesBuchner / xray_opt_gif.sh
Created December 18, 2020 12:14
Make a gif flipping between an X-ray and optical image at some coordinate
#!/bin/bash
# example usage:
# bash xray_opt_gif.sh 155.87737 +19.86508
RA=$1
DEC=$2
wget -nc "https://alasky.unistra.fr/hips-thumbnails/thumbnail?ra=${RA}&dec=${DEC}&fov=0.21750486127986932&width=500&height=500&hips_kw=CDS%2FP%2FSDSS9%2Fcolor" -O opt.jpg
wget -nc "https://alasky.unistra.fr/hips-thumbnails/thumbnail?ra=${RA}&dec=${DEC}&fov=0.21750486127986932&width=500&height=500&hips_kw=xcatdb%2FP%2FXMM%2FPN%2Fcolor" -O xmm.jpg
@JohannesBuchner
JohannesBuchner / benchmark.sh
Last active January 1, 2021 07:54
awk solutions for simple groupby in https://h2oai.github.io/db-benchmark/
# columns: id1,id2,id3,id4,id5,id6,v1,v2,v3
f=G1_1e7_1e2_0_0.csv
awk="time mawk"
# groupby simple
$awk -F, 'NR>1 { a[$1] += $7 } END {for (i in a) print i, a[i]}' $f >/dev/null
$awk -F, 'NR>1 { a[$1,$2] += $7 } END { for (comb in a) { split(comb,sep,SUBSEP); print sep[1], sep[2], a[sep[1],sep[2]]; }}' $f >/dev/null
$awk -F, 'NR>1 { a[$3] += $7; n[$3]++; b[$3] += $9; } END {for (i in a) print i, a[i], b[i]/n[i];}' $f >/dev/null
$awk -F, 'NR>1 { a[$4] += $7; n[$4]++; b[$4] += $8; } END {for (i in a) print i, a[i]/n[i], b[i]/n[i];}' $f >/dev/null
@JohannesBuchner
JohannesBuchner / cachestan.py
Created May 18, 2020 11:32
Build and cache Stan models smartly (ignoring changes in comments and white spaces)
import re
import pystan
import hashlib
import pickle
import os
def build_model(code):
lines = code.split("\n")
lines = [re.sub('//.*$', '', line).strip() for line in lines]
lines = [line.replace(' ', ' ').replace(' ', ' ').replace(' ', ' ')
@JohannesBuchner
JohannesBuchner / run_processes_when_idle.sh
Created May 12, 2020 08:56
Stop/resume processes when the user is active/inactive in Linux/Xorg
#!/bin/bash
# how to identify the processes to STOP/CONT
# this is any part of process command line
PROCESS="myscript.py"
# run this in one terminal:
# it tries to resume the processes every so often
while sleep 10m; do
pgrep -f "${PROCESS}"|xargs -rt kill -CONT