NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
- Updated
- Python
![]() |
VOOZH | about |
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
DeepTeam is a framework to red team LLMs and LLM systems.
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
Protect every action your agent takes.
Decrypted Generative Model safety files for Apple Intelligence containing filters
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
Attack to induce LLMs within hallucinations
Papers about red teaming LLMs and Multimodal models.
Static security scanner for LLM agents — prompt injection, MCP config auditing, taint analysis. 49 rules mapped to OWASP Agentic Top 10 (2026). Works with LangChain, CrewAI, AutoGen.
Reading list for adversarial perspective and robustness in deep reinforcement learning.
[NeurIPS 2025] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
AISecOps (AI Security Operations) framework for deterministic verification of AI systems. QWED verifies LLM outputs using math, logic, and symbolic execution — creating an auditable trust boundary for agentic AI systems. Not generation. Verification.
Papers from our SoK on Red-Teaming (Accepted at TMLR)
NeurIPS'24 - LLM Safety Landscape
Restore safety in fine-tuned language models through task arithmetic
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series for real-world applications.
Code and dataset for the paper: "Can Editing LLMs Inject Harm?"
Add a description, image, and links to the llm-safety topic page so that developers can more easily learn about it.
To associate your repository with the llm-safety topic, visit your repo's landing page and select "manage topics."