- Name: Bayanagari Vara Lakshmi
- Organization: Python Software Foundation
- Sub-Organization: DIPY
- Project: DIPY - Synthetic MRI generation
- Human Brain MRI preprocessing function
- MRI reconstruction using VQVAE
- Implement & train Diffusion Model on VQVAE latents
- Implement conditional Diffusion Model
- Generate conditional synthetic MRI
- Evaluate synthetic generations in DIPY
- 2D VQVAE on MNIST data
- 2D unconditional DDPM based LDM on MNIST data
- 3D VQVAE based on MONAI's PyTorch implementation
- 3D unconditional LDM based on MONAI's PyTorch implementation
- Conducted Literature Review on limited existing diffusion modeling in Medical Imaging
- Current literature [1, 2] utilized VQGAN& DDPM models on MRNet, ADNI, Breast Cancer MRI, lung CT datasets
- MONAI is the latest open-source platform with repositories on deep learning applications on BRATS & other medical imaging datasets, implemented in Pytorch
- Our project serves as a source for easy, understandable & accessible implementation of anatomical MRI generation using unconditional Diffusion Modelling in Tensorflow
- Implemented 2D VQVAE & 2D DDPM based Latent Diffusion Model(LDM) on MNIST dataset & achieved perfect generations
- Worked on CC359 & NFBS datasets, both consist of T1-weighted human brain MRI with 359 & 125 samples respectively. Preprocessed each input volume following the 3 steps below-
- Skull-stripping the dataset, if required, using existing masks.
- Pre-process using
transform_img
function - perform voxel resizing & affine transformation to obtain final (128,128,128,1) shape & (1,1,1) voxel shape - Neutralized background pixels to 0 using respective masks
- MinMax normalization to rescale intensities to (0,1)
- Implemented 3D versions of the above repositories from scratch
VQVAE3D
- The encoder & decoder of 3D VQVAE are symmetrical with 3 Convolutional & 3 Transpose Convolutional layers respectively, followed by non-linear
relu
units - Vector Quantizer trains a learnable embedding matrix to identify closest latents for a given input based on L2 loss function
- VQVAE gave superior results over VAE as shown in this paper, owing to the fact that quantizer addresses the problem of 'Posterior Collapse' seen in traditional VAEs
- Trained the model for approximately 100 epochs using Adam optimizer with lr=1e-4, minimized reconstruction & quantizer losses together
- Test dataset reconstructions-
- The encoder & decoder of 3D VQVAE are symmetrical with 3 Convolutional & 3 Transpose Convolutional layers respectively, followed by non-linear
3D LDM
- Built unconditional Latent Diffusion Model(LDM) combining DDPM & Stable Diffusion implementations
- U-Net of the reverse process consists of 3 downsampling & 3 upsampling layers each consisting of 2 residual layers and an optional attention layer
- Trained the model using linear (forward)variance scaling & various diffusion steps - 200, 300
- * Adopted algorithm 4 for sampling synthetic generations at 200 & 300 diffusion steps-.. image:: https://github.com/dipy/dipy/blob/master/doc/_static/dm3d-reconst-D200-D300.png
- alt
3D LDM synthetic generations
- width
800
- Adopted MONAI's implementation
- Replaced VQVAE encoder & decoder with a slightly complex architecture that includes residual connections alternating between convolutions
Carried out experiments with same training parameters with varying batch sizes & also used both datasets in a single experiment
- Clearly the training curves show that the higher batch size & dataset length, the better the stability of the training metric for learning rate=1e-4
Plotted reconstructions for top two experiments - (Batch size=12, Both datasets) & (Batch size=5, NFBS dataset)
- Existing diffusion model has been trained on these new latents to check for their efficacy on synthetic image generation
The training curves converged pretty quickly, but the sampled generations are still pure noise
- To summarize, we've stretched the capability of our VQVAE model despite being less complex with only
num_res_channels=(32, 64)
. We consistently achieved improved reconstruction results with every experiment. Our latest experiments are trained using a weighted loss function with lesser weight attached to background pixels owing to their higher number. This led to not just capturing the outer structure of a human brain but also the volumetric details resembling microstructural information inside the brain. This is a major improvement from all previous trainings. - For future work we should look into two things - debugging Diffusion Model, scaling VQVAE model.
- As a first priority, we could analyze the reason for pure noise output in DM3D generations, this would help us rule out any implementation errors of the sampling process.
- As a second step, we could also try scaling up both VQVAE as well as the Diffusion Model in terms of complexity, such as increasing intermediate channel dimensions from 64 to 128 or 256. This hopefully may help us achieve the state-of-art on NFBS & CC359 datasets.
- Unconditional LDM hasn't shown any progress in generations yet. Increasing model complexity with larger number of intermediate channels & increasing diffusion steps to 1000 is a direction of improvement
- Implemented cross-attention module as part of U-Net, to accommodate conditional training such as tumor type, tumor location, brain age etc
- Implementation of evaluation metrics such as FID(Frechet Inception Distance) & IS(Inception Score) will be useful in estimating the generative capabilities of our models
Date | Description | Blog Post Link |
---|---|---|
Week 0 (19-05-2023) |
Journey of GSOC application & acceptance | DIPY |
Week 1 (29-05-2023) |
Community bonding and Project kickstart | DIPY |
Week 2 (05-06-2023) |
Deep Dive into VQVAE | DIPY |
Week 3 (12-06-2023) |
VQVAE results and study on Diffusion models | DIPY |
Week 4 (19-06-2023) |
Diffusion research continues | DIPY |
Week 5 (26-06-2023) |
Carbonate HPC Account Setup, Experiment, Debug and Repeat | DIPY |
Week 6 & Week 7 (10-07-2023) |
Diffusion Model results on pre-trained VQVAE latents of NFBS MRI Dataset | DIPY |
Week 8 & Week 9 (24-07-2023) |
VQVAE MONAI models & checkerboard artifacts | DIPY |
Week 10 & Week 11 (07-08-2023) |
HPC issues, GPU availability, Tensorflow errors: Week 10 & Week 11 | DIPY |
Week 12 & Week 13 (21-08-2023) |
Finalized experiments using both datasets | DIPY |
Great! Just a few comments.
'Current literature ~' is duplicated
'Skull stripping using STAPLE ~' sounds like you were the one doing STAPLE. We should say the dataset provided it. Also, was CC359 STAPLE? I thought NFBS was STAPLE. Can you double check?
If you say 'Carbonate' it does not ring a bell for anyone outside of IU. It should say something like HPC(High performance computing systems) or just GPU
It says on the timeline only NFBS dataset. When did you work on CC359?
What happened to the link on week8&9 and 12&13 on the timeline?
Some say VQ-VAE and some VQVAE
Proposed objective should include implementing it in DIPY.
Overall, I think it's super, but be careful using terms that we are used to but others might not be (e.g. what is transform_img function?). Some minor errors needs cleaning up(e.g. the date in the timeline is same for week 6~13, why is relu gray boxed, typos). Try checking typos on places such as word as shillipi suggested.
Also I think you have some place holders (e.g. --epochs). Don't forget to fill them out!