Skip to content

Instantly share code, notes, and snippets.

@tezansahu
Created February 12, 2022 19:27
Show Gist options
  • Save tezansahu/b1cfbb37e753e4b7f2d5e2ab57c1b59d to your computer and use it in GitHub Desktop.
Save tezansahu/b1cfbb37e753e4b7f2d5e2ab57c1b59d to your computer and use it in GitHub Desktop.
def createMultimodalVQACollatorAndModel(text='bert-base-uncased', image='google/vit-base-patch16-224-in21k'):
# Initialize the correct text tokenizer and image feature extractor, and use them to create the collator
tokenizer = AutoTokenizer.from_pretrained(text)
preprocessor = AutoFeatureExtractor.from_pretrained(image)
multimodal_collator = MultimodalCollator(tokenizer=tokenizer, preprocessor=preprocessor)
# Initialize the multimodal model with the appropriate weights from pretrained models
multimodal_model = MultimodalVQAModel(pretrained_text_name=text, pretrained_image_name=image).to(device)
return multimodal_collator, multimodal_model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment