Skip to content

Instantly share code, notes, and snippets.

View santurini's full-sized avatar
🔥
this emoji is fire

Arturo Ghinassi santurini

🔥
this emoji is fire
  • University of Rome, La Sapienza
  • Rome
  • LinkedIn in/santurini
View GitHub Profile
@santurini
santurini / neural-speed-installation.md
Last active August 8, 2024 06:26
Step-by-step installation procedure for Intel neural speed
  1. Setup WSL
    • Install wsl: wsl --install -d Ubuntu

    • Run Powershell as Administrator and enter:

      dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
      dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
      wsl --set-default-version 2
      
@santurini
santurini / ffmpeg-cheatsheet.md
Created June 4, 2023 17:49
Processing videos with ffmpeg.
compressing a video
ffmpeg -i "$input_video" -vf fps=$fps -c:v libx264 -an -preset veryslow -crf $crf "$output_video"
trimming a video to 20 seconds
ffmpeg -i "$input_video" -ss "00:00:00" -t "00:00:20" "$output_video"
@santurini
santurini / multinode-fastmoe.md
Created June 1, 2023 17:09
Tutorial to setup a Data and Model Parallel training with FastMoE.

One of the main reason mixture of Experts are gaining so much attention is due to their high degree of parallelization while allowing to scale exponentially the number of parameters. Usually this requires a lot of complex code and deep knowledge of distributed systems but we can get this for free with the FastMoE library.

First of all we need to define our Experts and specify in the expert_dp_comm attribute which type of gradient reduction we would like to use out of:

  • dp: reduced across the data-parallel groups, which means that in the model parallel group, they are not synchronized.
  • world: gradients are synchronized across all workers, regardless their model or data parallel group. This is extremely useful for shared layers like the gate.

Let's define our MoE layer by opting for the synchronization across all workers:

from fmoe.layers import FMoE
@santurini
santurini / helloworld-fastmoe.md
Last active June 1, 2023 17:09
Simple Tutorial to get started with the FastMoE library.

In this tutorial we are going to consider a simple model in which we are going to replace the MLP with a MoE. The starting model is defined like this:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
@santurini
santurini / fastmoe-installation.md
Last active May 22, 2023 08:31
Step-by-step tutorial for FastMoE installation

Step by step tutorial to install FastMoE on your local machine:

  1. First of all you'll need to check your torch and nccl version, make sure to have a CUDA version compatible to the one torch was compiled (in general if you have the latest torch version it works also with the latest CUDA):
# go in terminal and use this command, the output should be something like this:

python -c  'import torch; print(torch.__version__); print(torch.cuda.nccl.version())'
>>> 2.0.1+cu117
>>> (2, 14, 3)  # -> this means version 2.14.3
@santurini
santurini / torch-ddp.md
Last active May 22, 2023 08:32
Tutorial to setup a Distributed Data Parallel training in torch using mpirun instead of torchrun

To launch a distributed training in torch with mpirun we have to:

  1. Configure a passwordless ssh connection with the nodes
  2. Setup the distributed environment inside the training script, in this case train.py
  3. Launch the training from the MASTER node with mpirun

For the first step, this is the pipeline:

# generate a public/private ssh key and make sure to NOT insert a passphrase

ssh-keygen -t rsa
@santurini
santurini / deepspeed-ddp.md
Last active May 22, 2023 08:32
DeepSpeed Multi-node Training Setup

In this tutorial we assume to launch a distributed training on 2 nodes using DeepSpeed with the OpenMPI Launcher.

  1. First of all DeepSpeed needs a passwordless ssh connection with all the nodes, MASTER included:
# generate a public/private ssh key and make sure to NOT insert a passphrase

ssh-keygen -t rsa

# copy public key 'id_rsa' on the MASTER and SLAVE