lb-97/vara_lakshmi_gsoc_2023_final_report.rst Secret

## vara_lakshmi_gsoc_2023_final_report.rst

      
    Raw
  

              vara_lakshmi_gsoc_2023_final_report.rst
            
          
Google Summer of Code Final Work Product


Name: Bayanagari Vara Lakshmi
Organization: Python Software Foundation
Sub-Organization: DIPY
Project: DIPY - Synthetic MRI generation

Proposed Objectives


Human Brain MRI preprocessing function
MRI reconstruction using VQVAE
Implement & train Diffusion Model on VQVAE latents
Implement conditional Diffusion Model
Generate conditional synthetic MRI
Evaluate synthetic generations in DIPY

Modified Objectives(Additional)


2D VQVAE on MNIST data
2D unconditional DDPM based LDM on MNIST data
3D VQVAE based on MONAI's PyTorch implementation
3D unconditional LDM based on MONAI's PyTorch implementation

Objectives Completed


Conducted Literature Review on limited existing diffusion modeling in Medical Imaging

Current literature [1, 2] utilized VQGAN& DDPM models on MRNet, ADNI, Breast Cancer MRI, lung CT datasets
MONAI is the latest open-source platform with repositories on deep learning applications on BRATS & other medical imaging datasets, implemented in Pytorch
Our project serves as a source for easy, understandable & accessible implementation of anatomical MRI generation using unconditional Diffusion Modelling in Tensorflow

Implemented 2D VQVAE & 2D DDPM based Latent Diffusion Model(LDM) on MNIST dataset & achieved perfect generations
Worked on CC359 & NFBS datasets, both consist of T1-weighted human brain MRI with 359 & 125 samples respectively. Preprocessed each input volume following the 3 steps below-

Skull-stripping the dataset, if required, using existing masks.
Pre-process using transform_img function - perform voxel resizing & affine transformation to obtain final (128,128,128,1) shape & (1,1,1) voxel shape
Neutralized background pixels to 0 using respective masks
MinMax normalization to rescale intensities to (0,1)

Implemented 3D versions of the above repositories from scratch

VQVAE3D

The encoder & decoder of 3D VQVAE are symmetrical with 3 Convolutional & 3 Transpose Convolutional layers respectively, followed by non-linear relu units
Vector Quantizer trains a learnable embedding matrix to identify closest latents for a given input based on L2 loss function
VQVAE gave superior results over VAE as shown in this paper, owing to the fact that quantizer addresses the problem of 'Posterior Collapse' seen in traditional VAEs
Trained the model for approximately 100 epochs using Adam optimizer with lr=1e-4, minimized reconstruction & quantizer losses together
Test dataset reconstructions-


3D LDM

Built unconditional Latent Diffusion Model(LDM) combining DDPM & Stable Diffusion implementations
U-Net of the reverse process consists of 3 downsampling & 3 upsampling layers each consisting of 2 residual layers and an optional attention layer
Trained the model using linear (forward)variance scaling & various diffusion steps - 200, 300


* Adopted algorithm 4 for sampling synthetic generations at 200 & 300 diffusion steps-.. image:: https://github.com/dipy/dipy/blob/master/doc/_static/dm3d-reconst-D200-D300.png

alt
3D LDM synthetic generations

width
800


Adopted MONAI's implementation

Replaced VQVAE encoder & decoder with a slightly complex architecture that includes residual connections alternating between convolutions
Carried out experiments with same training parameters with varying batch sizes & also used both datasets in a single experiment

Clearly the training curves show that the higher batch size & dataset length, the better the stability of the training metric for learning rate=1e-4
Plotted reconstructions for top two experiments - (Batch size=12, Both datasets) & (Batch size=5, NFBS dataset)

Existing diffusion model has been trained on these new latents to check for their efficacy on synthetic image generation
The training curves converged pretty quickly, but the sampled generations are still pure noise

To summarize, we've stretched the capability of our VQVAE model despite being less complex with only num_res_channels=(32, 64). We consistently achieved improved reconstruction results with every experiment. Our latest experiments are trained using a weighted loss function with lesser weight attached to background pixels owing to their higher number. This led to not just capturing the outer structure of a human brain but also the volumetric details resembling microstructural information inside the brain. This is a major improvement from all previous trainings.
For future work we should look into two things - debugging Diffusion Model, scaling VQVAE model.

As a first priority, we could analyze the reason for pure noise output in DM3D generations, this would help us rule out any implementation errors of the sampling process.
As a second step, we could also try scaling up both VQVAE as well as the Diffusion Model in terms of complexity, such as increasing intermediate channel dimensions from 64 to 128 or 256. This hopefully may help us achieve the state-of-art on NFBS & CC359 datasets.


Objectives in Progress


Unconditional LDM hasn't shown any progress in generations yet. Increasing model complexity with larger number of intermediate channels & increasing diffusion steps to 1000 is a direction of improvement
Implemented cross-attention module as part of U-Net, to accommodate conditional training such as tumor type, tumor location, brain age etc
Implementation of evaluation metrics such as FID(Frechet Inception Distance) & IS(Inception Score) will be useful in estimating the generative capabilities of our models

Timeline


Date
Description
Blog Post Link


Week 0 
(19-05-2023)
Journey of GSOC application & acceptance
DIPY


Week 1 
(29-05-2023)
Community bonding and Project kickstart
DIPY


Week 2 
(05-06-2023)
Deep Dive into VQVAE
DIPY


Week 3 
(12-06-2023)
VQVAE results and study on Diffusion models
DIPY


Week 4 
(19-06-2023)
Diffusion research continues
DIPY


Week 5 
(26-06-2023)
Carbonate HPC Account Setup, Experiment, Debug and Repeat
DIPY


Week 6 & Week 7 
(10-07-2023)
Diffusion Model results on pre-trained VQVAE latents of NFBS MRI Dataset
DIPY


Week 8 & Week 9 
(24-07-2023)
VQVAE MONAI models & checkerboard artifacts
DIPY


Week 10 & Week 11 
(07-08-2023)
HPC issues, GPU availability, Tensorflow errors: Week 10 & Week 11
DIPY


Week 12 & Week 13 
(21-08-2023)
Finalized experiments using both datasets
DIPY
Date	Description	Blog Post Link
Week 0 (19-05-2023)	Journey of GSOC application & acceptance	DIPY
Week 1 (29-05-2023)	Community bonding and Project kickstart	DIPY
Week 2 (05-06-2023)	Deep Dive into VQVAE	DIPY
Week 3 (12-06-2023)	VQVAE results and study on Diffusion models	DIPY
Week 4 (19-06-2023)	Diffusion research continues	DIPY
Week 5 (26-06-2023)	Carbonate HPC Account Setup, Experiment, Debug and Repeat	DIPY
Week 6 & Week 7 (10-07-2023)	Diffusion Model results on pre-trained VQVAE latents of NFBS MRI Dataset	DIPY
Week 8 & Week 9 (24-07-2023)	VQVAE MONAI models & checkerboard artifacts	DIPY
Week 10 & Week 11 (07-08-2023)	HPC issues, GPU availability, Tensorflow errors: Week 10 & Week 11	DIPY
Week 12 & Week 13 (21-08-2023)	Finalized experiments using both datasets	DIPY