Skip to content

Instantly share code, notes, and snippets.

View joshlk's full-sized avatar

Josh Levy-Kramer joshlk

  • London, UK
View GitHub Profile
@joshlk
joshlk / inference_pretraining_data.ipynb
Last active January 9, 2024 16:47
Inference with pretraining style data for LLM
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@joshlk
joshlk / vocab_sharding.py
Created October 24, 2023 09:29
Vocab sharding using DTensors
from math import ceil
from typing import Optional, Tuple, Union
import torch
from torch import Tensor, nn
from torch.distributed._tensor import DTensor, Replicate, Shard, distribute_module, distribute_tensor
from torch.distributed._tensor.op_schema import OpSchema, OutputSharding
from torch.distributed._tensor.ops.embedding_ops import embedding_rules
from torch.distributed._tensor.ops.utils import register_prop_rule
from torch.distributed._tensor.placement_types import DTensorSpec, _Partial
@joshlk
joshlk / CANNON_MP490_RaspberryPi_CUPS.md
Last active March 24, 2023 13:13
How to add Cannon MP490 printer to RaspberryPi CUPS server
  1. Install Cannon printer drivers sudo apt update; sudo apt install software-properties-common cups-backend-bjnp
  2. Make sure user has password (add password using sudo passwd $user)
  3. Follow instructions to install server https://pimylifeup.com/raspberry-pi-print-server/
  4. Chrome doesn't work to add a printer - worked on Safari
@joshlk
joshlk / stackoverflow_survey.ipynb
Last active January 3, 2023 15:13
Stackoverflow developer survey 2022 analysis of Docker usage between data scientists and devops engineers
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@joshlk
joshlk / conda_total_downloads.ipynb
Last active January 3, 2023 13:24
Obtain total download data for Conda forge
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@joshlk
joshlk / peak_memory.bash
Last active July 25, 2023 12:49
Measure the peak memory of the system while a command is running (this only includes physical memory and not virtual memory)
#!/usr/bin/env bash
# peak_memory [command ...]
# Run the given command line in the background and kill if script exits
trap 'kill $(jobs -p) 2&> /dev/null' EXIT
"$@" &
pid=$! phys_peak=0 not_avail_peak=0
# while command still running
while ps -p $pid &>/dev/null; do
sleep 1
mem_total="$(cat /proc/meminfo | grep 'MemTotal:' | grep -oe '\([0-9.]*\)')"
@joshlk
joshlk / pandas_excel_formater.py
Last active June 14, 2021 08:38
Format output of excel file written by Pandas. Auto-size, auto-filter, number formats, date-time formats, freeze panels
import xlsxwriter
def excel_formatter(writer, sheet, df, num_format = None, auto_filter = False, auto_fit = False):
workbook = writer.book
worksheet = writer.sheets[sheet]
cols = list(df.columns)
max_row, max_col = df.shape
num_format = num_format if num_format else {}
num_format_types = {
@joshlk
joshlk / confidence_intervals.md
Last active April 21, 2021 09:01
Confidence intervals and quantile regression

Confidence/prediction intervals to quantiles

quantile_higher = (interval_width / 2) + 0.5

Quantile metrics:

@joshlk
joshlk / 0_nvidia_benchmark.md
Last active March 29, 2024 06:06
Benchmark bandwidth and latency of P2P NVIDIA GPUs (NVLINK vs PCI)

NVIDIA GPU P2P Benchmark bandwidth/throughput and latency

Using https://github.com/NVIDIA/cuda-samples

You can also view the GPU topology using nvidia-smi topo -m

  1. Download repo git clone https://github.com/NVIDIA/cuda-samples.git
  2. Checkout the tag that corresponds with the right CUDA version: git checkout tags/v11.1
  3. You might need to install some additional packages sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev libglfw3-dev libgles2-mesa-dev
  4. Either build everything by just execting make in root dir. Or cd Samples/p2pBandwidthLatencyTest; make
@joshlk
joshlk / Remote_debugging_with_pycharm.md
Created March 9, 2021 13:23
Remove debugging with pycharm

Steps to debug a program on a remote machine without using remote deployment.

  1. Start the debug server in pycharm and specify a port such as 21000
  2. SSH remote forward a port e.g. ssh host -R 21000:localhost:21000
  3. Start a Python process and insert the following line (first pip install pydevd-pycharm`):
import pydevd_pycharm; pydevd_pycharm.settrace('localhost', port=21000, stdoutToServer=True, stderrToServer=True)