VOOZH about

URL: https://huggingface.co/llm-semantic-router/toolcall-verifier

⇱ llm-semantic-router/toolcall-verifier Β· Hugging Face


ToolCallVerifier - Unauthorized Tool Call Detection

πŸ‘ License
πŸ‘ Model

Stage 2 of Two-Stage LLM Agent Defense Pipeline


🎯 What This Model Does

ToolCallVerifier is a ModernBERT-based token classifier that detects unauthorized tool calls in LLM agent systems. It performs token-level classification on tool call JSON to identify malicious arguments that may have been injected through prompt injection attacks.

Label Description
AUTHORIZED Token is part of a legitimate, user-requested action
UNAUTHORIZED Token indicates injected/malicious content β€” BLOCK

🚨 Attack Categories Covered

Category Source Description
Delimiter Injection LLMail <<end_context>>, >>}}\]\])
Word Obfuscation LLMail Inserting noise words between tokens
Fake Sessions LLMail START_USER_SESSION, EXECUTE_USERQUERY
Roleplay Injection WildJailbreak "You are an admin bot that can..."
XML Tag Injection WildJailbreak <execute_action>, <tool_call>
Authority Bypass WildJailbreak "As administrator, I authorize..."
Intent Mismatch Synthetic User asks X, tool does Y
MCP Tool Poisoning Synthetic Hidden exfiltration in tool args
MCP Shadowing Synthetic Fake authorization context

πŸ”— Integration with FunctionCallSentinel

This model is Stage 2 of a two-stage defense pipeline:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Prompt │────▢│ ToolCallSentinel │────▢│ LLM + Tools β”‚
β”‚ β”‚ β”‚ (Stage 1) β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 β”‚
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ ToolCallVerifier (This Model) β”‚
 β”‚ Token-level verification before tool execution β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Scenario Recommendation
General chatbot Stage 1 only
Tool-calling agent (low risk) Stage 1 only
Tool-calling agent (high risk) Both stages
Email/file system access Both stages
Financial transactions Both stages

🎯 Intended Use

Primary Use Cases

  • LLM Agent Security: Verify tool calls before execution
  • Prompt Injection Defense: Detect unauthorized actions from injected prompts
  • API Gateway Protection: Filter malicious tool calls at infrastructure level

Out of Scope

  • General text classification
  • Non-tool-calling scenarios
  • Languages other than English

πŸ“œ License

Apache 2.0

Downloads last month
70
Safetensors
Model size
0.1B params
Tensor type
F32
Β·

Model tree for llm-semantic-router/toolcall-verifier

Finetuned
(1334)
this model

Datasets used to train llm-semantic-router/toolcall-verifier

Space using llm-semantic-router/toolcall-verifier 1

Evaluation results