VOOZH about

URL: https://huggingface.co/InternRobotics/InternVLA-M1-Pretrain-RT-1-Bridge

⇱ InternRobotics/InternVLA-M1-Pretrain-RT-1-Bridge · Hugging Face


Model Card for InternVLA-M1-Pretrain-RT-1-Bridge

Description:

InternVLA-M1 is an open-source, end-to-end vision–language–action (VLA) framework for building and researching generalist robot policies. The checkpoints in this repository were trained on the RT-1 and Bridge datasets.

👁 image/png

Quick Start

# ===== system2 demo =====
from InternVLA.model.framework.M1 import InternVLA_M1
from PIL import Image
import requests
from io import BytesIO

def load_image_from_url(url: str) -> Image.Image:
 resp = requests.get(url, timeout=15)
 resp.raise_for_status()
 img = Image.open(BytesIO(resp.content)).convert("RGB")
 return img


saved_model_path = "/PATH//checkpoints/steps_50000_pytorch_model.pt"
internVLA_M1 = InternVLA_M1.from_pretrained(
 saved_model_path
)

image_url="https://github.com/InternRobotics/InternVLA-M1/blob/InternVLA-M1/assets/table.jpeg"
image = load_image_from_url(image_url)
question = "give the bbox for the apple."
response = internVLA_M1.chat_with_M1(image, question)

# ===== predict_action demo =====
# constuct input: batch size = 1, two views
view1 = load_image_from_url(image_url)
view2 = view1.copy()
batch_images = [[view1]] # List[List[PIL.Image]]
instructions = ["pick up the apple and place it on the plate."]

if torch.cuda.is_available():
 internVLA_M1 = internVLA_M1.to("cuda")

# action predict
pred = internVLA_M1.predict_action(
 batch_images=batch_images,
 instructions=instructions,
 cfg_scale=1.5,
 use_ddim=True,
 num_ddim_steps=10,
)
normalized_actions = pred["normalized_actions"] # [B, T, action_dim]

Citation

@misc{internvla2024,
 title = {InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy},
 author = {InternVLA-M1 Contributors},
 year = {2025},
 booktitle={arXiv},
}
Downloads last month
19
Video Preview
loading

Collection including InternRobotics/InternVLA-M1-Pretrain-RT-1-Bridge