VOOZH about

URL: https://huggingface.co/vectionlabs/Salience-1-9B

โ‡ฑ vectionlabs/Salience-1-9B ยท Hugging Face


Salience 1 โ€” 9B

๐Ÿ‘ VectionLabs Maestro 1 Banner

A 9B multimodal reasoning model, sharpened for code and agentic work โ€” that can see.

Vection Labs

Weights ยท Benchmarks ยท Quickstart ยท Fast inference ยท Limitations


Abstract

Salience 1 (9B) is a dense, 9-billion-parameter vision-language model built for hard, practical work: writing and debugging real code, driving tools and agents, multi-step mathematical reasoning, and visual understanding over images and video โ€” inside a single model with a context window of up to 1M tokens.

It is the successor of Maestro1-9B, engineered around a single goal: push the axis users ask for most โ€” code and agentic/tool use โ€” without giving up the deep reasoning, vision, and million-token context the family is known for.

It is designed for people who care less about chat pleasantries and more about whether the model can do the thing: ship the function, find the bug, call the right tool, read the diagram, finish the proof.

Highlights

  • Code & agentic first. Built with a coding/DevOps donor on top of a reasoning core; tuned to produce runnable code and well-formed tool calls.
  • Reasoning that shows its work. Structured, inspectable chains of thought for math, logic, code.
  • Genuinely multimodal. Images and video are first-class inputs, not bolted-on captioning.
  • Long context. Up to 1M tokens via interleaved multimodal RoPE โ€” whole repos, long papers, or long videos in a single prompt.
  • Fast on modest hardware. Runs on 2ร— T4 with no GGUF (fp16 sharded, or 4-bit on a single T4), with lossless speculative decoding and hybrid-thinking latency control.
  • Open weights. Apache-2.0, transformers-native, single-file deployment.

Model overview

Parameters 9B (dense)
Modalities text, image, video โ†’ text
Context window up to 1,000,000 tokens (interleaved multimodal RoPE)
Precision bfloat16 master weights
Architecture Qwen3-VL (Qwen3-8B language model, 36 layers) + native vision encoder
License Apache-2.0
Library ๐Ÿค— transformers (AutoModelForImageTextToText)

Architecture & capabilities

Salience 1 is a dense Qwen3-VL model: a 36-layer Qwen3-8B language model coupled to a native vision encoder, with interleaved multimodal RoPE carrying the context window from 256K up to 1M tokens.

Its capability profile is built around three pillars:

  • Code & agentic execution โ€” runnable code, repo-scale edits, and well-formed tool calls.
  • Deep reasoning โ€” structured, inspectable chains of thought for math and logic.
  • Multimodal perception โ€” images and video as first-class inputs, not bolted-on captioning.

The vision pathway and long-context behavior are preserved end to end, so the same reasoning that solves an olympiad problem also reads a chart, a UI screenshot, or a short clip.

Intended use

Salience 1 targets technical assistance, coding agents, and research:

  • Code generation, explanation, debugging, review, and repo-scale tasks.
  • Agentic / tool-using workflows that emit structured calls.
  • Step-by-step math and quantitative reasoning.
  • Visual question answering and document/diagram/chart understanding.
  • Video understanding over short clips, and long-document / long-context analysis.

It is not intended for high-stakes decisions without human review, nor as a source of truth for medical, legal, or financial advice.

Benchmarks

All results use a single reproducible evaluation harness with greedy/CoT settings; the Maestro1-9B column is run under the identical protocol for a like-for-like comparison.

Reasoning, math & code

Benchmark Setting Maestro1-9B Salience-1-9B
GSM8K 0-shot CoT, exact match โ€” โ€”
MATH-500 0-shot CoT, exact match โ€” โ€”
HumanEval 0-shot, pass@1 โ€” โ€”
MBPP 3-shot, pass@1 โ€” โ€”
MMLU 0-shot โ€” โ€”

Multimodal

Benchmark Setting Maestro1-9B Salience-1-9B
MMMU (val) 0-shot โ€” โ€”
MathVista (testmini) 0-shot โ€” โ€”
DocVQA (val) 0-shot, ANLS โ€” โ€”

The evaluation protocol, prompts, and answer-extraction logic are fixed and reproducible end-to-end.

Quickstart

from transformers import AutoModelForImageTextToText, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

model_id = "vectionlabs/Salience-1-9B"
proc = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
 model_id, dtype=torch.bfloat16, device_map="auto"
)

messages = [{
 "role": "user",
 "content": [
 {"type": "image", "image": "https://example.com/diagram.png"},
 {"type": "text", "text": "Explain what this diagram proves, step by step."},
 ],
}]

text = proc.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
imgs, vids = process_vision_info(messages)
inputs = proc(text=[text], images=imgs, videos=vids, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=1024)
print(proc.batch_decode(out[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0])

Text-only works the same way with a plain {"type": "text", ...} message.

Speed & efficiency

Salience 1 is built to be fast in production, not just accurate:

  • Speculative decoding delivers a 1.5โ€“2.5ร— speedup on code and structured text with no change to outputs โ€” a lightweight draft proposes tokens and the model verifies them in a single pass. Supported natively in transformers (assistant_model=) and in vLLM (--speculative-model).
  • Adaptive thinking. Append /no_think for instant direct answers, or /think to unlock deep step-by-step reasoning on hard math and multi-step agentic planning โ€” you spend latency only when the task is worth it.
  • Runs on consumer hardware. 4-bit quantization brings the full model onto a single consumer GPU; bf16/fp16 serves comfortably on one modern accelerator with room for long context.

Prompting tips

  • Code: specify language, constraints ("no external libraries"), and the exact I/O contract.
  • Agentic / tools: give the tool schema and ask for the call as strict JSON.
  • Math/logic: ask it to reason step by step; it is tuned to externalize its work.
  • Vision: put the image/video before the question in the message content.
  • Sampling (Qwen3 family): thinking โ†’ temperature=0.6, top_p=0.95, top_k=20; direct answers โ†’ temperature=0.7, top_p=0.8, top_k=20.

Deployment

  • Single-GPU: loads in bf16/fp16 with device_map="auto" on one modern accelerator; 4-bit quantization fits the model on a single consumer GPU.
  • Serving: integrates with standard transformers generation and vision-capable serving stacks such as vLLM (with optional speculative decoding) for high-throughput production use.
  • Quantized formats: GGUF and other community quantizations are supported.

Limitations & responsible use

  • Salience 1 can be confidently wrong. Verify mathematical and factual claims.
  • Generated code may be insecure or incorrect โ€” review before running, never execute untrusted output.
  • Long-context and long-video inputs increase latency and memory substantially.
  • It inherits the licenses, biases, and failure modes of all source models. Do not use it for surveillance, manipulation, or any use that violates applicable law or the Apache-2.0 terms.
  • No audio modality.

Citation

@misc{vectionlabs2026salience1,
 title = {Salience 1 (9B): A Multimodal Reasoning and Coding Model},
 author = {Vection Labs},
 year = {2026},
 url = {https://huggingface.co/vectionlabs/Salience-1-9B}
}

ยฉ 2026 Vection Labs ยท Apache-2.0
Downloads last month
60,059
Safetensors
Model size
9B params
Tensor type
BF16
ยท

Model tree for vectionlabs/Salience-1-9B

Quantizations
3 models