YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

retrico-lm-4b

retrico-lm-4b is a 4B-parameter language model built for universal structured information extraction. Give it any text and a JSON schema — it returns a valid, schema-conformant JSON object with no post-processing required.

The model handles the full spectrum of extraction tasks from a single interface: plain text, Markdown, HTML, and XML as input; flat facts, deeply nested objects, typed arrays, NER, and open relation extraction as output. There is no need to switch between specialized models — one template drives all extraction modes.

Built on Qwen3.5-4B, retrico-lm-4b is designed for production use and works best served via vLLM.

Key Features

Universal input — plain text, Markdown documents, HTML pages, XML feeds
Universal output — flat facts, nested objects, typed arrays, entity lists, relation triplets
Template-driven — define any JSON schema and the model populates it from the input
Typed fields — respects string, integer, float, nested objects, arrays of objects
Null-safe — missing values return as null or [], never hallucinated
Production-ready — optimized for vLLM with language_model_only=True

Training

The model was trained in two stages:

Supervised fine-tuning on synthetic data — training examples were generated using a large teacher LLM across a diverse set of domains and schema types
Post-training on human-annotated data — further refined on a high-quality human-annotated dataset to improve precision, grounding, and schema adherence

Usage

The model uses a hybrid attention architecture and requires language_model_only=True and trust_remote_code=True.

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import json

model_name = "knowledgator/retrico-lm-4b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

llm = LLM(
 model=model_name,
 language_model_only=True,
 gpu_memory_utilization=0.85,
 max_model_len=65536,
 trust_remote_code=True,
 dtype="bfloat16",
 enforce_eager=True,
)

sampling_params = SamplingParams(max_tokens=4096, temperature=0.0)

def build_prompt(text, template):
 if isinstance(template, (dict, list)):
 template = json.dumps(template, indent=1, ensure_ascii=False)
 content = (
 "/no_think\n"
 "Extract information from the following text according to the JSON template.\n\n"
 "Important rules:\n"
 "- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
 "- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
 "- If the template is completely unrelated to the text, return all fields as null.\n"
 "- For list fields with no values found, return [] not [null].\n"
 "- For dict/object fields with no values found, return {} not null.\n\n"
 f"Template:\n{template}\n\nText:\n{text}\n\n"
 "Return only the extracted JSON, nothing else."
 )
 return tokenizer.apply_chat_template(
 [{"role": "user", "content": content}],
 tokenize=False,
 add_generation_prompt=True,
 enable_thinking=False,
 )

def extract(text, template):
 prompt = build_prompt(text, template)
 output = llm.generate([prompt], sampling_params)[0]
 raw = output.outputs[0].text.strip()
 try:
 return json.loads(raw)
 except json.JSONDecodeError:
 return {"__raw__": raw}

Examples

Plain Text Extraction

Structured fact extraction from prose. The model handles deeply nested schemas, typed numeric fields, arrays of objects, and null-safe output for fields absent in the source.

James Webb Space Telescope — deeply nested schema with typed fields, array of instrument objects, and multi-agency list

NVIDIA FY2024 Financials — financial report with null-safe segment data: growth percentages absent in the source correctly return as null

Moderna mRNA-1273 Clinical Trial — dense numerical extraction: efficacy stats, confidence intervals, demographic breakdowns, and adverse event arrays from a single paragraph

Markdown Extraction

ML Reading List — repeated hierarchical entries: ### heading + bullet list blocks parsed into a uniform array of structured paper objects

HTML Extraction

NeurIPS 2024 — structured data from HTML markup: speaker objects from div.speaker elements, numeric attendance from span tags, nested objects

XML Extraction

Apollo Program — XML attributes mapped to JSON fields: <period start="1961" end="1972"/> and <member role="commander"> resolved into typed schema fields

Relation Extraction

Open-domain NER and relation extraction. No predefined label sets — entity types and relation types are inferred directly from the text.

ARM Holdings — multi-hop corporate chain: acquisition, licensing, blocked deal, IPO, and CEO succession all extracted from one paragraph

Higgs boson / CERN — diverse entity types (particle, facility, experiment) across institutions, experiments, and individuals

OpenAI — dense organizational history: founding, funding, role transitions, board conflict, and resignations into 20 entities and 19 relations from a multi-paragraph document

Constrained Relation Extraction

The open-domain template shown above infers entity and relation types freely from the text. For benchmarking and production pipelines where label sets are fixed, you can constrain the model by injecting allowed types directly into the prompt:

TEMPLATE = json.dumps({
 "entities": [{"entity": "string", "type": "string"}],
 "relations": [{"head": "string", "relation": "string", "tail": "string"}]
}, indent=1)

def build_prompt(text: str, entity_types: list[str], relation_types: list[str]) -> str:
 et_str = ", ".join(entity_types)
 rt_str = ", ".join(relation_types)
 return (
 "/no_think\n"
 "Extract entities and relations from the following text according to the JSON template.\n\n"
 "Important rules:\n"
 "- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
 "- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
 "- For list fields with no values found, return [] not [null].\n"
 "- Entity text must be exact substrings from the input text.\n"
 f"- Entity types must be one of: {et_str}\n"
 f"- Relation types must be one of: {rt_str}\n\n"
 f"Template:\n{TEMPLATE}\n\n"
 f"Text:\n{text}\n\n"
 "Return only the extracted JSON, nothing else."
 )

This is the setup used to produce the benchmark results below.

Benchmarks

All benchmarks are zero-shot — the model was not trained on any of these datasets.

Benchmark Charts

WL Graph F1 — overall

retrico-lm-4b0.761

gpt-oss-120b0.787

Llama-3.3-70B0.784

DeepSeek-V3.10.782

Qwen3-32B0.733

NuExtract30.730

ROUGE-L — overall

retrico-lm-4b0.532

Llama-3.3-70B0.550

DeepSeek-V3.10.525

gpt-oss-120b0.520

Qwen3-32B0.485

NuExtract30.375

Valid JSON rate — overall

retrico-lm-4b96.0%

Llama-3.3-70B98.9%

gpt-oss-120b98.6%

DeepSeek-V3.196.8%

Qwen3-32B93.9%

NuExtract392.5%

WL Graph F1 · 256–1023 tokens

retrico-lm-4b74.8%

NuExtract378.3%

DeepSeek-V3.176.2%

Llama-3.3-70B74.5%

Qwen3-32B61.2%

gpt-oss-120b34.3%

WL Graph F1 · 1024–3999 tokens

retrico-lm-4b82.5%

Llama-3.3-70B83.6%

DeepSeek-V3.183.5%

gpt-oss-120b82.2%

Qwen3-32B76.7%

NuExtract375.9%

WL Graph F1 · ≥4000 tokens

retrico-lm-4b34.1%

gpt-oss-120b76.4%

Qwen3-32B53.8%

Llama-3.3-70B25.1%

DeepSeek-V3.17.9%

NuExtract37.5%

Valid JSON · 1024–3999 tokens

retrico-lm-4b98.0%

gpt-oss-120b100%

Llama-3.3-70B98.7%

DeepSeek-V3.197.3%

Qwen3-32B97.1%

NuExtract392.1%

Valid JSON · ≥4000 tokens

retrico-lm-4b33.3%

gpt-oss-120b93.0%

Llama-3.3-70B66.7%

Qwen3-32B66.7%

NuExtract333.3%

DeepSeek-V3.120.0%

RE benchmarks — Micro-F1

CrossRE (test · ai/news/science)

retrico-lm-4b8.5

gliner2-large2.0

DocRED (validation)

retrico-lm-4b14.5

gliner2-large13.8

Relation Extraction

Evaluated on two standard RE benchmarks with constrained entity and relation type sets (see prompt format above).

CrossRE — cross-domain relation extraction. Evaluated on the test split, domains: ai, news, science.

DocRED — document-level relation extraction from Wikipedia and Wikidata. Evaluated on the validation split.

Dataset	Model	Micro-F1	Macro-F1	Precision	Recall
CrossRE	retrico-lm-4b	8.5	7.3	7.7	9.6
CrossRE	fastino/gliner2-large-v1	2.0	2.0	2.3	1.7
DocRED	retrico-lm-4b	14.5	6.7	13.3	15.9
DocRED	fastino/gliner2-large-v1	13.8	6.9	13.0	14.6

Comparison with Large Language Models — Human-Annotated Eval Split

Evaluated on an internal held-out set with human-annotated ground truth. Metrics:

WL Graph F1 — graph-based metric that converts predicted and reference JSON into trees, computes semantic node embeddings, and propagates via Weisfeiler-Leman message passing. Captures both structural correctness and semantic similarity of extracted values.
ROUGE-L — longest common subsequence overlap between predicted and reference JSON strings.
Valid JSON Rate — fraction of outputs that parse as valid JSON.

Model	WL Graph F1	ROUGE-L	Valid JSON Rate
retrico-lm-4b	0.7606	0.5323	96.0%
openai/gpt-oss-120b	0.7868	0.5204	98.6%
Meta-Llama-3.3-70B-Instruct	0.7837	0.5503	98.9%
DeepSeek-V3.1	0.7821	0.5253	96.8%
Qwen3-32B	0.7329	0.4852	93.9%
numind/NuExtract3	0.7302	0.3747	92.5%

Valid JSON Rate by input length:

Token bucket	gpt-oss-120b	Llama-3.3-70B	DeepSeek-V3.1	retrico-lm-4b	Qwen3-32B	NuExtract3
256–1023	100%	100%	100%	100%	100%	98.8%
1024–3999	100%	98.7%	97.3%	98.0%	97.1%	92.1%
≥4000	93.0%	66.7%	20.0%	33.3%	66.7%	33.3%

WL Graph F1 by input length:

Token bucket	gpt-oss-120b	Llama-3.3-70B	DeepSeek-V3.1	retrico-lm-4b	Qwen3-32B	NuExtract3
256–1023	34.3%	74.5%	76.2%	74.8%	61.2%	78.3%
1024–3999	82.2%	83.6%	83.5%	82.5%	76.7%	75.9%
≥4000	76.4%	25.1%	7.9%	34.1%	53.8%	7.5%

Citation

@misc{knowledgator2025retrico,
 title={retrico-lm: Schema-Guided Structured Information Extraction},
 author={Knowledgator Engineering},
 year={2025},
 url={https://huggingface.co/knowledgator}
}

Downloads last month: 70

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for knowledgator/retrico-lm-4b

Paper • 1906.06127 • Published Jun 14, 2019 • 1

URL: https://huggingface.co/knowledgator/retrico-lm-4b

⇱ knowledgator/retrico-lm-4b · Hugging Face

retrico-lm-4b

Key Features

Training

Usage

Examples

Plain Text Extraction

Markdown Extraction

HTML Extraction

XML Extraction

Relation Extraction

Constrained Relation Extraction

Benchmarks

Benchmark Charts

Relation Extraction

Comparison with Large Language Models — Human-Annotated Eval Split

Links

Citation

Paper for knowledgator/retrico-lm-4b