Skip to content

Instantly share code, notes, and snippets.

View cli99's full-sized avatar
🐼

Cheng Li cli99

🐼
View GitHub Profile
#!/bin/sh
# requirements
# - wget
# prepare directory
mkdir -p ~/.local/bin
mkdir -p ~/.config
if ! [ -x "$(command -v tmux)" ]; then
@cli99
cli99 / profile_inception_v4.py
Last active June 25, 2022 02:37
profile_inception_v4
import timm
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform
import torch
from deepspeed.profiling.flops_profiler import get_model_profile
model = timm.create_model('inception_v4', pretrained=True)
model.eval()
config = resolve_data_config({}, model=model)
@cli99
cli99 / bet-base-flops.txt
Created June 24, 2022 22:24
bert-base-flops
-------------------------- DeepSpeed Flops Profiler --------------------------
Profile Summary at step 1:
Notations:
data parallel size (dp_size), model parallel size(mp_size),
number of parameters (params), number of multiply-accumulate operations(MACs),
number of floating-point operations (flops), floating-point operations per second (FLOPS),
fwd latency (forward propagation latency), bwd latency (backward propagation latency),
step (weights update latency), iter latency (sum of fwd, bwd and step latency)
@cli99
cli99 / get_hf_models.py
Created April 12, 2022 08:33
query huggingface models
from transformers import CONFIG_MAPPING
from huggingface_hub import HfApi
api = HfApi()
def _get_all_hf_model_types():
return list(CONFIG_MAPPING.keys())
def _get_total_downloads_by_model_type(model_type):
models = api.list_models(filter=model_type)
@cli99
cli99 / test_pt_profiler.py
Last active September 9, 2021 01:07
layer-by-layer profiling using record_function
import torch
import torch.nn as nn
from torch.profiler import profile, record_function, ProfilerActivity
import torchprof
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
@cli99
cli99 / pr.md
Created May 13, 2021 18:16 — forked from piscisaureus/pr.md
Checkout github pull requests locally

Locate the section for your github remote in the .git/config file. It looks like this:

[remote "origin"]
	fetch = +refs/heads/*:refs/remotes/origin/*
	url = git@github.com:joyent/node.git

Now add the line fetch = +refs/pull/*/head:refs/remotes/origin/pr/* to this section. Obviously, change the github url to match your project's URL. It ends up looking like this:

@cli99
cli99 / busy.py
Last active February 20, 2022 06:46
Keep GPU busy
import argparse
import torch
import torch.distributed as dist
parser = argparse.ArgumentParser()
parser.add_argument("--local_rank", type=int)
args = parser.parse_args()
device = args.local_rank
dist.init_process_group('nccl')
@cli99
cli99 / bash
Created January 25, 2021 22:50 — forked from panchicore/bash
solve perl: warning: Setting locale failed.
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
@cli99
cli99 / megatron-lm.txt
Created November 2, 2020 21:37
flops couter example output
deepspeed --num_nodes 1 --num_gpus 16 pretrain_gpt2.py --model-parallel-size 1 --num-layers 4 --hidden-size 512 --num-attention-heads 8 --seq-length 512 --max-position-embeddings 4096 --batch-size 1 --train-iters 10 --lr-decay-iters 320000 --data-path data/webtext/webtext_text_document --vocab-file data/gpt2-vocab.json --merge-file data/gpt2-merges.txt --data-impl mmap --split 949,50,1 --distributed-backend nccl --lr 1.5e-4 --lr-decay-style cosine --min-lr 1.0e-5 --weight-decay 1e-2 --clip-grad 1.0 --warmup 0.01 --log-interval 100 --save-interval 10000 --eval-interval 1000 --eval-iters 10 --fp16 --hysteresis 2 --num-workers 0 --deepspeed --deepspeed_config /home/chengli1/projects/Megatron-LM/examples/ds_zero_stage_2_config.json --zero-stage 0 --zero-reduce-bucket-size 50000000 --zero-allgather-bucket-size 5000000000 --zero-contigious-gradients --zero-reduce-scatter --checkpoint-activations --checkpoint-num-layers 1 --partition-activations --synchronize-each-layer --contigious-checkpointing
[2020-10-30 23:43:3
@cli99
cli99 / get_python_cmake_flags.py
Created September 8, 2018 22:12
get_python_cmake_flags.py
## @package get_python_cmake_flags
# Module scripts.get_python_cmake_flags
##############################################################################
# Use this script to find your preferred python installation.
##############################################################################
#
# You can use the following to build with your preferred version of python
# if your installation is not being properly detected by CMake.
#
# mkdir -p build && cd build