Skip to content

Instantly share code, notes, and snippets.

View LiutongZhou's full-sized avatar
🏠
Working

Liutong Zhou LiutongZhou

🏠
Working
  • Apple
  • New York
View GitHub Profile
@LiutongZhou
LiutongZhou / docker_tips.md
Last active May 1, 2024 00:36
Docker Tips

Docker Tips

Move default docker storage to another location

nano /etc/docker/daemon.json

## add this config
{
"data-root": "/newlocation"
}
@LiutongZhou
LiutongZhou / remote_jupyter_setup.md
Last active April 7, 2024 16:50
Setup Cloud9 on EC2 and remote Jupyter

Setup Cloud9 on EC2 and remote Jupyter

Create an EC2 instance on AWS

  • Launch: t3.2xlarge ($0.33/h) / m5d.4xlarge ($0.904) / g4dn.4xlarge ($1.2/h) / p3.2xlarge ($3.02/h)
  • Image: Deep Learning AMI (Ubuntu 22.04)
  • Configure Security Group:
    • open custom TCP and port 9999
    • open HTTPS, HTTP to anywhere
  • Attach an Elastic IP to the instance

ssh into EC2 from MobaXterm and run

@LiutongZhou
LiutongZhou / git_tips.md
Last active January 26, 2024 13:38
Git Tips
  1. Squash commits into a single commit and rebase feature branch onto upstream/develop

    git fetch upstream && git rebase -i $(git merge-base feature_name upstream/develop)
  2. Cleanup git repository aggressively

    use bfg https://rtyley.github.io/bfg-repo-cleaner/

    java -jar bfg.jar --delete-files your_unwanted_files
import functools
from typing import Any, Mapping
def singleton(cls):
'''a class decorator that wraps class definition so that only one class instance can exist'''
existing_instances = dict()
@functools.wraps(cls)
def singleton_class(*args, **kwargs):
if cls not in existing_instances:
@LiutongZhou
LiutongZhou / memory_efficient_training.md
Last active July 11, 2023 15:37
Memory Efficient Training of LLMs
"""Data Strutures that extend OrderedDict"""
from collections import Counter, OrderedDict
from typing import Any, Hashable, Optional, Tuple, List
from hypothesis import given, strategies as st
__all__ = ["OrderedDefaultDict", "MinMaxCounter"]
class OrderedDefaultDict(OrderedDict):
@LiutongZhou
LiutongZhou / dist-train.md
Last active February 10, 2023 15:55
Large-Scale Distributed Data and Model Parallel Training

Large-Scale Distributed Data and Model Parallel Training

Data Streaming

image

FastFile Mode

sagemaker.inputs.TrainingInput(S3_INPUT_FOLDER, input_mode='FastFile') 
@LiutongZhou
LiutongZhou / document_project.md
Last active August 18, 2022 19:52
Document a Project

How to document a project

How to update the docs and publish to the Home Page?

Prerequisites One-time installation of dependencies
python3 -m pip install -U jupyter-book ghp-import
@LiutongZhou
LiutongZhou / heap.py
Last active June 6, 2022 01:08
Priority queues
"""MinHeap and MaxHeap (Optimized Implementation)"""
from abc import ABC, abstractmethod
from collections import Counter, UserList
from functools import singledispatchmethod
from heapq import (
_heapify_max,
_heappop_max,
_heapreplace_max,
_siftdown,
_siftdown_max,
"""UnionFind (Disjoint Sets)"""
from typing import Optional, Iterable, Hashable, Any
class UnionFind:
def __init__(
self, initial_disjoint_items: Optional[Iterable[Hashable]] = None
):
"""Initialize a UnionFind of disjoint sets"""