Skip to content

Instantly share code, notes, and snippets.

@andesappal
andesappal / high-resolution multi-modality model.py
Created November 12, 2023 02:44
High-resolution multi-modality model
"""
The provided code is a generic example to illustrate how one might structure a multi-modal model using a combination
of a pre-trained vision model (ResNet50) for image encoding and a transformer-based text encoder. OtterHD, on the other
hand, is a more sophisticated and specialized model designed for high-resolution multi-modality tasks, as discussed in
the paper.
"""
import torch
import torch.nn as nn
import torchvision.models as models