Skip to content

Instantly share code, notes, and snippets.

View abcinje's full-sized avatar

Injae Kang abcinje

  • Seoul
  • 17:26 (UTC +09:00)
View GitHub Profile
@abcinje
abcinje / cluster.md
Last active May 18, 2020 09:52
bufferring cluster configuration

Host

  1. Install docker
$ sudo apt update
$ sudo apt install -y docker.io
$ sudo usermod -a -G docker $USER
$ sudo service docker restart
$ echo "Re-login required to start docker"
$ logout
@abcinje
abcinje / image.md
Last active May 23, 2020 06:50
bufferring image recipe
  1. Generate a container
$ docker run -it --name bufferring --network=host nvidia/cuda:10.2-devel /bin/bash
  1. Install packages
# apt update
# apt install -y build-essential git openssh-server python3 python3-pip vim wget

Computation Thread

computation_thread()

for each epoch {
  for each step {
    for each parameter in propagation_process {
      grad = compute_gradient(parameter);
      handle = asynchronous_allreduce(grad);
      handles[grad] = handle;
@abcinje
abcinje / ckptfs.md
Last active February 17, 2022 10:42

CkptFS

Running drainer

# 환경변수 CKPT, BB, PFS가 set되어 있는 경우
build/bin/drainer
# 그렇지 않은 경우
CKPT= BB= PFS= build/bin/drainer