VOOZH about

URL: https://huggingface.co/nvidia/Eagle2-9B

⇱ nvidia/Eagle2-9B · Hugging Face


Eagle-2

[📂 GitHub] [📜 Eagle2 Tech Report] [🗨️ Chat Demo] [🤗 HF Demo]

Introduction

We are thrilled to release our latest Eagle2 series Vision-Language Model. Open-source Vision-Language Models (VLMs) have made significant strides in narrowing the gap with proprietary models. However, critical details about data strategies and implementation are often missing, limiting reproducibility and innovation. In this project, we focus on VLM post-training from a data-centric perspective, sharing insights into building effective data strategies from scratch. By combining these strategies with robust training recipes and model design, we introduce Eagle2, a family of performant VLMs. Our work aims to empower the open-source community to develop competitive VLMs with transparent processes.

In this repo, we are open-sourcing Eagle2-9B, which strikes the perfect balance between performance and inference speed.

Model Zoo

We provide the following models:

model name LLM Vision Max Length HF Link
Eagle2-1B Qwen2.5-0.5B-Instruct Siglip 16K 🤗 link
Eagle2-2B Qwen2.5-1.5B-Instruct Siglip 16K 🤗 link
Eagle2-9B Qwen2.5-7B-Instruct Siglip+ConvNext 16K 🤗 link

Benchmark Results

Benchmark MiniCPM-Llama3-V-2_5 InternVL-Chat-V1-5 InternVL2-8B QwenVL2-7B Eagle2-9B
Model Size 8.5B 25.5B 8.1B 8.3B 8.9B
DocVQAtest 84.8 90.9 91.6 94.5 92.6
ChartQAtest - 83.8 83.3 83.0 86.4
InfoVQAtest - 72.5 74.8 74.3 77.2
TextVQAval 76.6 80.6 77.4 84.3 83.0
OCRBench 725 724 794 845 868
MMEsum 2024.6 2187.8 2210.3 2326.8 2260
RealWorldQA 63.5 66.0 64.4 70.1 69.3
AI2Dtest 78.4 80.7 83.8 - 83.9
MMMUval 45.8 45.2 / 46.8 49.3 / 51.8 54.1 56.1
MMBench_V11test 79.5 79.4 80.6
MMVetGPT-4-Turbo 52.8 55.4 54.2 62.0 62.2
SEED-Image 72.3 76.0 76.2 77.1
HallBenchavg 42.4 49.3 45.2 50.6 49.3
MathVistatestmini 54.3 53.5 58.3 58.2 63.8
MMstar - - 60.9 60.7 62.6

Quick Start

We provide a demo inference script to help you quickly start using the model. We support different input types:

  • pure text input
  • single image input
  • multiple image input
  • video input

0. Install the dependencies

pip install transformers==4.37.2
pip install flash-attn

Note: Latest version of transformers is not compatible with the model.

1. Prepare the Model worker

2. Prepare the Prompt

  • Single image input
prompt = [
 {'role': 'system', 'content': 'You are a helpful assistant.'},
 {'role': 'user', 'content': 'Describe this image in details.', 
 'image':[
 {'url': 'https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png'}
 ],
 }
 ]
  • Multiple image input
prompt = [
 {'role': 'system', 'content': 'You are a helpful assistant.'},
 {'role': 'user', 'content': 'Describe these two images in details.', 
 'image':[
 {'url': 'https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png'},
 {'url': 'https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png'}
 ],
 }
 ]
  • Video input
prompt = [
 {'role': 'system', 'content': 'You are a helpful assistant.'},
 {'role': 'user', 'content': 'Describe this video in details.', 
 'video':[
 'path/to/your/video.mp4'
 ],
 }
 ]

3. Generate the response

params = {
 'prompt': prompt,
 'max_input_tiles': 24,
 'temperature': 0.7,
 'top_p': 1.0,
 'max_new_tokens': 4096,
 'repetition_penalty': 1.0,
 }
worker.generate(params)

TODO

  • Support vLLM Inference
  • Provide AWQ Quantization Weights
  • Provide fine-tuning scripts

License/Terms of Use

Citation

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month
164
Safetensors
Model size
9B params
Tensor type
BF16
·

Model tree for nvidia/Eagle2-9B

Spaces using nvidia/Eagle2-9B 2

Collection including nvidia/Eagle2-9B

Paper for nvidia/Eagle2-9B