YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
retrico-lm-4b
retrico-lm-4b is a 4B-parameter language model built for universal structured information extraction. Give it any text and a JSON schema — it returns a valid, schema-conformant JSON object with no post-processing required.
The model handles the full spectrum of extraction tasks from a single interface: plain text, Markdown, HTML, and XML as input; flat facts, deeply nested objects, typed arrays, NER, and open relation extraction as output. There is no need to switch between specialized models — one template drives all extraction modes.
Built on Qwen3.5-4B, retrico-lm-4b is designed for production use and works best served via vLLM.
Key Features
- Universal input — plain text, Markdown documents, HTML pages, XML feeds
- Universal output — flat facts, nested objects, typed arrays, entity lists, relation triplets
- Template-driven — define any JSON schema and the model populates it from the input
- Typed fields — respects
string,integer,float, nested objects, arrays of objects - Null-safe — missing values return as
nullor[], never hallucinated - Production-ready — optimized for vLLM with
language_model_only=True
Training
The model was trained in two stages:
- Supervised fine-tuning on synthetic data — training examples were generated using a large teacher LLM across a diverse set of domains and schema types
- Post-training on human-annotated data — further refined on a high-quality human-annotated dataset to improve precision, grounding, and schema adherence
Usage
The model uses a hybrid attention architecture and requires language_model_only=True and trust_remote_code=True.
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import json
model_name = "knowledgator/retrico-lm-4b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
llm = LLM(
model=model_name,
language_model_only=True,
gpu_memory_utilization=0.85,
max_model_len=65536,
trust_remote_code=True,
dtype="bfloat16",
enforce_eager=True,
)
sampling_params = SamplingParams(max_tokens=4096, temperature=0.0)
def build_prompt(text, template):
if isinstance(template, (dict, list)):
template = json.dumps(template, indent=1, ensure_ascii=False)
content = (
"/no_think\n"
"Extract information from the following text according to the JSON template.\n\n"
"Important rules:\n"
"- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
"- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
"- If the template is completely unrelated to the text, return all fields as null.\n"
"- For list fields with no values found, return [] not [null].\n"
"- For dict/object fields with no values found, return {} not null.\n\n"
f"Template:\n{template}\n\nText:\n{text}\n\n"
"Return only the extracted JSON, nothing else."
)
return tokenizer.apply_chat_template(
[{"role": "user", "content": content}],
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
def extract(text, template):
prompt = build_prompt(text, template)
output = llm.generate([prompt], sampling_params)[0]
raw = output.outputs[0].text.strip()
try:
return json.loads(raw)
except json.JSONDecodeError:
return {"__raw__": raw}
Examples
Plain Text Extraction
Structured fact extraction from prose. The model handles deeply nested schemas, typed numeric fields, arrays of objects, and null-safe output for fields absent in the source.
James Webb Space Telescope — deeply nested schema with typed fields, array of instrument objects, and multi-agency list
NVIDIA FY2024 Financials — financial report with null-safe segment data: growth percentages absent in the source correctly return as null
Moderna mRNA-1273 Clinical Trial — dense numerical extraction: efficacy stats, confidence intervals, demographic breakdowns, and adverse event arrays from a single paragraph
Markdown Extraction
ML Reading List — repeated hierarchical entries: ### heading + bullet list blocks parsed into a uniform array of structured paper objects
HTML Extraction
NeurIPS 2024 — structured data from HTML markup: speaker objects from div.speaker elements, numeric attendance from span tags, nested objects
XML Extraction
Apollo Program — XML attributes mapped to JSON fields: <period start="1961" end="1972"/> and <member role="commander"> resolved into typed schema fields
Relation Extraction
Open-domain NER and relation extraction. No predefined label sets — entity types and relation types are inferred directly from the text.
ARM Holdings — multi-hop corporate chain: acquisition, licensing, blocked deal, IPO, and CEO succession all extracted from one paragraph
Higgs boson / CERN — diverse entity types (particle, facility, experiment) across institutions, experiments, and individuals
OpenAI — dense organizational history: founding, funding, role transitions, board conflict, and resignations into 20 entities and 19 relations from a multi-paragraph document
Constrained Relation Extraction
The open-domain template shown above infers entity and relation types freely from the text. For benchmarking and production pipelines where label sets are fixed, you can constrain the model by injecting allowed types directly into the prompt:
TEMPLATE = json.dumps({
"entities": [{"entity": "string", "type": "string"}],
"relations": [{"head": "string", "relation": "string", "tail": "string"}]
}, indent=1)
def build_prompt(text: str, entity_types: list[str], relation_types: list[str]) -> str:
et_str = ", ".join(entity_types)
rt_str = ", ".join(relation_types)
return (
"/no_think\n"
"Extract entities and relations from the following text according to the JSON template.\n\n"
"Important rules:\n"
"- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
"- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
"- For list fields with no values found, return [] not [null].\n"
"- Entity text must be exact substrings from the input text.\n"
f"- Entity types must be one of: {et_str}\n"
f"- Relation types must be one of: {rt_str}\n\n"
f"Template:\n{TEMPLATE}\n\n"
f"Text:\n{text}\n\n"
"Return only the extracted JSON, nothing else."
)
This is the setup used to produce the benchmark results below.
Benchmarks
All benchmarks are zero-shot — the model was not trained on any of these datasets.
Benchmark Charts
Relation Extraction
Evaluated on two standard RE benchmarks with constrained entity and relation type sets (see prompt format above).
CrossRE — cross-domain relation extraction. Evaluated on the test split, domains: ai, news, science.
DocRED — document-level relation extraction from Wikipedia and Wikidata. Evaluated on the validation split.
| Dataset | Model | Micro-F1 | Macro-F1 | Precision | Recall |
|---|---|---|---|---|---|
| CrossRE | retrico-lm-4b | 8.5 | 7.3 | 7.7 | 9.6 |
| CrossRE | fastino/gliner2-large-v1 | 2.0 | 2.0 | 2.3 | 1.7 |
| DocRED | retrico-lm-4b | 14.5 | 6.7 | 13.3 | 15.9 |
| DocRED | fastino/gliner2-large-v1 | 13.8 | 6.9 | 13.0 | 14.6 |
Comparison with Large Language Models — Human-Annotated Eval Split
Evaluated on an internal held-out set with human-annotated ground truth. Metrics:
- WL Graph F1 — graph-based metric that converts predicted and reference JSON into trees, computes semantic node embeddings, and propagates via Weisfeiler-Leman message passing. Captures both structural correctness and semantic similarity of extracted values.
- ROUGE-L — longest common subsequence overlap between predicted and reference JSON strings.
- Valid JSON Rate — fraction of outputs that parse as valid JSON.
| Model | WL Graph F1 | ROUGE-L | Valid JSON Rate |
|---|---|---|---|
| retrico-lm-4b | 0.7606 | 0.5323 | 96.0% |
| openai/gpt-oss-120b | 0.7868 | 0.5204 | 98.6% |
| Meta-Llama-3.3-70B-Instruct | 0.7837 | 0.5503 | 98.9% |
| DeepSeek-V3.1 | 0.7821 | 0.5253 | 96.8% |
| Qwen3-32B | 0.7329 | 0.4852 | 93.9% |
| numind/NuExtract3 | 0.7302 | 0.3747 | 92.5% |
Valid JSON Rate by input length:
| Token bucket | gpt-oss-120b | Llama-3.3-70B | DeepSeek-V3.1 | retrico-lm-4b | Qwen3-32B | NuExtract3 |
|---|---|---|---|---|---|---|
| 256–1023 | 100% | 100% | 100% | 100% | 100% | 98.8% |
| 1024–3999 | 100% | 98.7% | 97.3% | 98.0% | 97.1% | 92.1% |
| ≥4000 | 93.0% | 66.7% | 20.0% | 33.3% | 66.7% | 33.3% |
WL Graph F1 by input length:
| Token bucket | gpt-oss-120b | Llama-3.3-70B | DeepSeek-V3.1 | retrico-lm-4b | Qwen3-32B | NuExtract3 |
|---|---|---|---|---|---|---|
| 256–1023 | 34.3% | 74.5% | 76.2% | 74.8% | 61.2% | 78.3% |
| 1024–3999 | 82.2% | 83.6% | 83.5% | 82.5% | 76.7% | 75.9% |
| ≥4000 | 76.4% | 25.1% | 7.9% | 34.1% | 53.8% | 7.5% |
Links
Citation
@misc{knowledgator2025retrico,
title={retrico-lm: Schema-Guided Structured Information Extraction},
author={Knowledgator Engineering},
year={2025},
url={https://huggingface.co/knowledgator}
}
- Downloads last month
- 70
