skrish13/multimodal_llms.md

## multimodal_llms.md

      
    Raw
  

              multimodal_llms.md
            
          
    Models

LLMs, LVMs, VLMs which can take images as input, can be instruct-tuned etc.
Open


Llava (https://github.com/haotian-liu/LLaVA)
Qwen (https://huggingface.co/Qwen/Qwen-VL)
CogVLM (https://github.com/THUDM/CogVLM)
BLIP Family of models (BLIP, BLIP2, X-InstructBLIP)

Tiny ones


Moondream (https://github.com/vikhyat/moondream)
Bunny (https://github.com/BAAI-DCAI/Bunny)

Closed


Yasa (reka.ai)
gpt4 vision, gemini ultra

Companies


reka.ai
OpenAI, Google