4 items • Updated • 14
Model
llava-phi-3-mini is a LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.
Note: This model is in official LLaVA format.
Resources:
- GitHub: xtuner
- HuggingFace LLaVA format model: xtuner/llava-phi-3-mini-hf
- GGUF LLaVA model: xtuner/llava-phi-3-mini-gguf
- XTuner LLaVA format model: xtuner/llava-phi-3-mini-xtuner
Details
| Model | Visual Encoder | Projector | Resolution | Pretraining Strategy | Fine-tuning Strategy | Pretrain Dataset | Fine-tune Dataset | Pretrain Epoch | Fine-tune Epoch |
|---|---|---|---|---|---|---|---|---|---|
| LLaVA-v1.5-7B | CLIP-L | MLP | 336 | Frozen LLM, Frozen ViT | Full LLM, Frozen ViT | LLaVA-PT (558K) | LLaVA-Mix (665K) | 1 | 1 |
| LLaVA-Llama-3-8B | CLIP-L | MLP | 336 | Frozen LLM, Frozen ViT | Full LLM, LoRA ViT | LLaVA-PT (558K) | LLaVA-Mix (665K) | 1 | 1 |
| LLaVA-Llama-3-8B-v1.1 | CLIP-L | MLP | 336 | Frozen LLM, Frozen ViT | Full LLM, LoRA ViT | ShareGPT4V-PT (1246K) | InternVL-SFT (1268K) | 1 | 1 |
| LLaVA-Phi-3-mini | CLIP-L | MLP | 336 | Frozen LLM, Frozen ViT | Full LLM, Full ViT | ShareGPT4V-PT (1246K) | InternVL-SFT (1268K) | 1 | 2 |
Results
| Model | MMBench Test (EN) | MMMU Val | SEED-IMG | AI2D Test | ScienceQA Test | HallusionBench aAcc | POPE | GQA | TextVQA | MME | MMStar |
|---|---|---|---|---|---|---|---|---|---|---|---|
| LLaVA-v1.5-7B | 66.5 | 35.3 | 60.5 | 54.8 | 70.4 | 44.9 | 85.9 | 62.0 | 58.2 | 1511/348 | 30.3 |
| LLaVA-Llama-3-8B | 68.9 | 36.8 | 69.8 | 60.9 | 73.3 | 47.3 | 87.2 | 63.5 | 58.0 | 1506/295 | 38.2 |
| LLaVA-Llama-3-8B-v1.1 | 72.3 | 37.1 | 70.1 | 70.0 | 72.9 | 47.7 | 86.4 | 62.6 | 59.0 | 1469/349 | 45.1 |
| LLaVA-Phi-3-mini | 69.2 | 41.4 | 70.0 | 69.3 | 73.7 | 49.8 | 87.3 | 61.5 | 57.8 | 1477/313 | 43.7 |
Quickstart
Chat by LLaVA official library
- Install official LLaVA library
pip install git+https://github.com/haotian-liu/LLaVA.git
- Chat by below script
python ./cli.py --model-path xtuner/llava-phi-3-mini --image-file https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg --load-4bit
Reproduce
Please refer to docs.
Citation
@misc{2023xtuner,
title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
author={XTuner Contributors},
howpublished = {\url{https://github.com/InternLM/xtuner}},
year={2023}
}
- Downloads last month
- 19
Safetensors
Model size
4B params
Tensor type
F32
·
F16 ·
