VOOZH about

URL: https://huggingface.co/knowledgator/retrico-lm-4b

⇱ knowledgator/retrico-lm-4b · Hugging Face


YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

retrico-lm-4b

retrico-lm-4b is a 4B-parameter language model built for universal structured information extraction. Give it any text and a JSON schema — it returns a valid, schema-conformant JSON object with no post-processing required.

The model handles the full spectrum of extraction tasks from a single interface: plain text, Markdown, HTML, and XML as input; flat facts, deeply nested objects, typed arrays, NER, and open relation extraction as output. There is no need to switch between specialized models — one template drives all extraction modes.

Built on Qwen3.5-4B, retrico-lm-4b is designed for production use and works best served via vLLM.


Key Features

  • Universal input — plain text, Markdown documents, HTML pages, XML feeds
  • Universal output — flat facts, nested objects, typed arrays, entity lists, relation triplets
  • Template-driven — define any JSON schema and the model populates it from the input
  • Typed fields — respects string, integer, float, nested objects, arrays of objects
  • Null-safe — missing values return as null or [], never hallucinated
  • Production-ready — optimized for vLLM with language_model_only=True

Training

The model was trained in two stages:

  1. Supervised fine-tuning on synthetic data — training examples were generated using a large teacher LLM across a diverse set of domains and schema types
  2. Post-training on human-annotated data — further refined on a high-quality human-annotated dataset to improve precision, grounding, and schema adherence

Usage

The model uses a hybrid attention architecture and requires language_model_only=True and trust_remote_code=True.

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import json

model_name = "knowledgator/retrico-lm-4b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

llm = LLM(
 model=model_name,
 language_model_only=True,
 gpu_memory_utilization=0.85,
 max_model_len=65536,
 trust_remote_code=True,
 dtype="bfloat16",
 enforce_eager=True,
)

sampling_params = SamplingParams(max_tokens=4096, temperature=0.0)

def build_prompt(text, template):
 if isinstance(template, (dict, list)):
 template = json.dumps(template, indent=1, ensure_ascii=False)
 content = (
 "/no_think\n"
 "Extract information from the following text according to the JSON template.\n\n"
 "Important rules:\n"
 "- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
 "- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
 "- If the template is completely unrelated to the text, return all fields as null.\n"
 "- For list fields with no values found, return [] not [null].\n"
 "- For dict/object fields with no values found, return {} not null.\n\n"
 f"Template:\n{template}\n\nText:\n{text}\n\n"
 "Return only the extracted JSON, nothing else."
 )
 return tokenizer.apply_chat_template(
 [{"role": "user", "content": content}],
 tokenize=False,
 add_generation_prompt=True,
 enable_thinking=False,
 )

def extract(text, template):
 prompt = build_prompt(text, template)
 output = llm.generate([prompt], sampling_params)[0]
 raw = output.outputs[0].text.strip()
 try:
 return json.loads(raw)
 except json.JSONDecodeError:
 return {"__raw__": raw}

Examples

Plain Text Extraction

Structured fact extraction from prose. The model handles deeply nested schemas, typed numeric fields, arrays of objects, and null-safe output for fields absent in the source.


James Webb Space Telescope — deeply nested schema with typed fields, array of instrument objects, and multi-agency list


NVIDIA FY2024 Financials — financial report with null-safe segment data: growth percentages absent in the source correctly return as null


Moderna mRNA-1273 Clinical Trial — dense numerical extraction: efficacy stats, confidence intervals, demographic breakdowns, and adverse event arrays from a single paragraph


Markdown Extraction

ML Reading List — repeated hierarchical entries: ### heading + bullet list blocks parsed into a uniform array of structured paper objects


HTML Extraction

NeurIPS 2024 — structured data from HTML markup: speaker objects from div.speaker elements, numeric attendance from span tags, nested objects


XML Extraction

Apollo Program — XML attributes mapped to JSON fields: <period start="1961" end="1972"/> and <member role="commander"> resolved into typed schema fields


Relation Extraction

Open-domain NER and relation extraction. No predefined label sets — entity types and relation types are inferred directly from the text.


ARM Holdings — multi-hop corporate chain: acquisition, licensing, blocked deal, IPO, and CEO succession all extracted from one paragraph


Higgs boson / CERN — diverse entity types (particle, facility, experiment) across institutions, experiments, and individuals


OpenAI — dense organizational history: founding, funding, role transitions, board conflict, and resignations into 20 entities and 19 relations from a multi-paragraph document


Constrained Relation Extraction

The open-domain template shown above infers entity and relation types freely from the text. For benchmarking and production pipelines where label sets are fixed, you can constrain the model by injecting allowed types directly into the prompt:

TEMPLATE = json.dumps({
 "entities": [{"entity": "string", "type": "string"}],
 "relations": [{"head": "string", "relation": "string", "tail": "string"}]
}, indent=1)

def build_prompt(text: str, entity_types: list[str], relation_types: list[str]) -> str:
 et_str = ", ".join(entity_types)
 rt_str = ", ".join(relation_types)
 return (
 "/no_think\n"
 "Extract entities and relations from the following text according to the JSON template.\n\n"
 "Important rules:\n"
 "- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
 "- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
 "- For list fields with no values found, return [] not [null].\n"
 "- Entity text must be exact substrings from the input text.\n"
 f"- Entity types must be one of: {et_str}\n"
 f"- Relation types must be one of: {rt_str}\n\n"
 f"Template:\n{TEMPLATE}\n\n"
 f"Text:\n{text}\n\n"
 "Return only the extracted JSON, nothing else."
 )

This is the setup used to produce the benchmark results below.


Benchmarks

All benchmarks are zero-shot — the model was not trained on any of these datasets.

Benchmark Charts

WL Graph F1 — overall
retrico-lm-4b0.761
gpt-oss-120b0.787
Llama-3.3-70B0.784
DeepSeek-V3.10.782
Qwen3-32B0.733
NuExtract30.730
ROUGE-L — overall
retrico-lm-4b0.532
Llama-3.3-70B0.550
DeepSeek-V3.10.525
gpt-oss-120b0.520
Qwen3-32B0.485
NuExtract30.375
Valid JSON rate — overall
retrico-lm-4b96.0%
Llama-3.3-70B98.9%
gpt-oss-120b98.6%
DeepSeek-V3.196.8%
Qwen3-32B93.9%
NuExtract392.5%
WL Graph F1 · 256–1023 tokens
retrico-lm-4b74.8%
NuExtract378.3%
DeepSeek-V3.176.2%
Llama-3.3-70B74.5%
Qwen3-32B61.2%
gpt-oss-120b34.3%
WL Graph F1 · 1024–3999 tokens
retrico-lm-4b82.5%
Llama-3.3-70B83.6%
DeepSeek-V3.183.5%
gpt-oss-120b82.2%
Qwen3-32B76.7%
NuExtract375.9%
WL Graph F1 · ≥4000 tokens
retrico-lm-4b34.1%
gpt-oss-120b76.4%
Qwen3-32B53.8%
Llama-3.3-70B25.1%
DeepSeek-V3.17.9%
NuExtract37.5%
Valid JSON · 1024–3999 tokens
retrico-lm-4b98.0%
gpt-oss-120b100%
Llama-3.3-70B98.7%
DeepSeek-V3.197.3%
Qwen3-32B97.1%
NuExtract392.1%
Valid JSON · ≥4000 tokens
retrico-lm-4b33.3%
gpt-oss-120b93.0%
Llama-3.3-70B66.7%
Qwen3-32B66.7%
NuExtract333.3%
DeepSeek-V3.120.0%
RE benchmarks — Micro-F1
CrossRE (test · ai/news/science)
retrico-lm-4b8.5
gliner2-large2.0
DocRED (validation)
retrico-lm-4b14.5
gliner2-large13.8

Relation Extraction

Evaluated on two standard RE benchmarks with constrained entity and relation type sets (see prompt format above).

CrossRE — cross-domain relation extraction. Evaluated on the test split, domains: ai, news, science.

DocRED — document-level relation extraction from Wikipedia and Wikidata. Evaluated on the validation split.

Dataset Model Micro-F1 Macro-F1 Precision Recall
CrossRE retrico-lm-4b 8.5 7.3 7.7 9.6
CrossRE fastino/gliner2-large-v1 2.0 2.0 2.3 1.7
DocRED retrico-lm-4b 14.5 6.7 13.3 15.9
DocRED fastino/gliner2-large-v1 13.8 6.9 13.0 14.6

Comparison with Large Language Models — Human-Annotated Eval Split

Evaluated on an internal held-out set with human-annotated ground truth. Metrics:

  • WL Graph F1 — graph-based metric that converts predicted and reference JSON into trees, computes semantic node embeddings, and propagates via Weisfeiler-Leman message passing. Captures both structural correctness and semantic similarity of extracted values.
  • ROUGE-L — longest common subsequence overlap between predicted and reference JSON strings.
  • Valid JSON Rate — fraction of outputs that parse as valid JSON.
Model WL Graph F1 ROUGE-L Valid JSON Rate
retrico-lm-4b 0.7606 0.5323 96.0%
openai/gpt-oss-120b 0.7868 0.5204 98.6%
Meta-Llama-3.3-70B-Instruct 0.7837 0.5503 98.9%
DeepSeek-V3.1 0.7821 0.5253 96.8%
Qwen3-32B 0.7329 0.4852 93.9%
numind/NuExtract3 0.7302 0.3747 92.5%

Valid JSON Rate by input length:

Token bucket gpt-oss-120b Llama-3.3-70B DeepSeek-V3.1 retrico-lm-4b Qwen3-32B NuExtract3
256–1023 100% 100% 100% 100% 100% 98.8%
1024–3999 100% 98.7% 97.3% 98.0% 97.1% 92.1%
≥4000 93.0% 66.7% 20.0% 33.3% 66.7% 33.3%

WL Graph F1 by input length:

Token bucket gpt-oss-120b Llama-3.3-70B DeepSeek-V3.1 retrico-lm-4b Qwen3-32B NuExtract3
256–1023 34.3% 74.5% 76.2% 74.8% 61.2% 78.3%
1024–3999 82.2% 83.6% 83.5% 82.5% 76.7% 75.9%
≥4000 76.4% 25.1% 7.9% 34.1% 53.8% 7.5%

Links


Citation

@misc{knowledgator2025retrico,
 title={retrico-lm: Schema-Guided Structured Information Extraction},
 author={Knowledgator Engineering},
 year={2025},
 url={https://huggingface.co/knowledgator}
}
Downloads last month
70
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for knowledgator/retrico-lm-4b