Arturo Ghinassi santurini

## neural-speed-installation.md

      
              1 file
            
          
              0 forks
            
          
                1 comment
              
            
              1 star
            
          
                santurini
                / neural-speed-installation.md
            
            
              Last active
              August 8, 2024 06:26
            
              
                Step-by-step installation procedure for Intel neural speed
              
          
Setup WSL


Install wsl: wsl --install -d Ubuntu


Run Powershell as Administrator and enter:
dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
wsl --set-default-version 2


## ffmpeg-cheatsheet.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                santurini
                / ffmpeg-cheatsheet.md
            
            
              Created
              June 4, 2023 17:49
            
              
                Processing videos with ffmpeg.
              
          
    compressing a video

ffmpeg -i "$input_video" -vf fps=$fps -c:v libx264 -an -preset veryslow -crf $crf "$output_video"

trimming a video to 20 seconds

ffmpeg -i "$input_video" -ss "00:00:00" -t "00:00:20" "$output_video"


## multinode-fastmoe.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                santurini
                / multinode-fastmoe.md
            
            
              Created
              June 1, 2023 17:09
            
              
                Tutorial to setup a Data and Model Parallel training with FastMoE.
              
          
    One of the main reason mixture of Experts are gaining so much attention is due to their high degree of parallelization while allowing to scale exponentially the number of parameters.
Usually this requires a lot of complex code and deep knowledge of distributed systems but we can get this for free with the FastMoE library.
First of all we need to define our Experts and specify in the expert_dp_comm attribute which type of gradient reduction we would like to use out of:

dp: reduced across the data-parallel groups, which means that in the model parallel group, they are not synchronized.
world: gradients are synchronized across all workers, regardless their model or data parallel group. This is extremely useful for shared layers like the gate.

Let's define our MoE layer by opting for the synchronization across all workers:
from fmoe.layers import FMoE


## helloworld-fastmoe.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                santurini
                / helloworld-fastmoe.md
            
            
              Last active
              November 27, 2024 03:07
            
              
                Simple Tutorial to get started with the FastMoE library.
              
          
    In this tutorial we are going to consider a simple model in which we are going to replace the MLP with a MoE.
The starting model is defined like this:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)


## fastmoe-installation.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                santurini
                / fastmoe-installation.md
            
            
              Last active
              May 22, 2023 08:31
            
              
                Step-by-step tutorial for FastMoE installation
              
          
    Step by step tutorial to install FastMoE on your local machine:

First of all you'll need to check your torch and nccl version, make sure to have a CUDA version compatible to the one torch was compiled (in general if you have the latest torch version it works also with the latest CUDA):

# go in terminal and use this command, the output should be something like this:

python -c  'import torch; print(torch.__version__); print(torch.cuda.nccl.version())'
>>> 2.0.1+cu117
>>> (2, 14, 3)  # -> this means version 2.14.3


## torch-ddp.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                santurini
                / torch-ddp.md
            
            
              Last active
              May 22, 2023 08:32
            
              
                Tutorial to setup a Distributed Data Parallel training in torch using mpirun instead of torchrun
              
          
    To launch a distributed training in torch with mpirun we have to:

Configure a passwordless ssh connection with the nodes
Setup the distributed environment inside the training script, in this case train.py
Launch the training from the MASTER node with mpirun

For the first step, this is the pipeline:
# generate a public/private ssh key and make sure to NOT insert a passphrase

ssh-keygen -t rsa


## deepspeed-ddp.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                santurini
                / deepspeed-ddp.md
            
            
              Last active
              May 22, 2023 08:32
            
              
                DeepSpeed Multi-node Training Setup
              
          
    In this tutorial we assume to launch a distributed training on 2 nodes using DeepSpeed with the OpenMPI Launcher.

First of all DeepSpeed needs a passwordless ssh connection with all the nodes, MASTER included:

# generate a public/private ssh key and make sure to NOT insert a passphrase

ssh-keygen -t rsa

# copy public key 'id_rsa' on the MASTER and SLAVE