LLMs, LVMs, VLMs which can take images as input, can be instruct-tuned etc.
- Llava (https://github.com/haotian-liu/LLaVA)
- Qwen (https://huggingface.co/Qwen/Qwen-VL)
- CogVLM (https://github.com/THUDM/CogVLM)
- BLIP Family of models (BLIP, BLIP2, X-InstructBLIP)
- Moondream (https://github.com/vikhyat/moondream)
- Bunny (https://github.com/BAAI-DCAI/Bunny)
- Yasa (reka.ai)
- gpt4 vision, gemini ultra
- reka.ai
- OpenAI, Google