llm-safety

Static security scanner for LLM agents — prompt injection, MCP config auditing, taint analysis. 49 rules mapped to OWASP Agentic Top 10 (2026). Works with LangChain, CrewAI, AutoGen.

python cli security mcp static-analysis owasp taint-analysis vulnerability-detection vulnerability-scanner ai-security ai-agent langchain prompt-injection llm-security llm-safety crewai ai-security-tool langchain-security-

Updated
Python

EzgiKorkmaz / adversarial-reinforcement-learning

Star

Reading list for adversarial perspective and robustness in deep reinforcement learning.

deep-reinforcement-learning ai-safety machine-learning-security ai-security robust-machine-learning ai-alignment safe-reinforcement-learning robust-reinforcement-learning responsible-ai adversarial-reinforcement-learning secure-ai ml-security llm-security reinforcement-learning-safety llm-safety artificial-intelligence-security artificial-intelligence-alignment robust-deep-reinforcement-learning reinforcement-learning-security large-language-model-safety

Updated

Buyun-Liang / SECA

Star

[NeurIPS 2025] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

adversarial-attacks large-language-models llm-safety llm-hallucination

Updated
Python

Babelscape / ALERT

Star

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

nlp benchmark ai artificial-intelligence nlp-machine-learning red-teaming bias-detection safety-monitoring transformers-models llm llm-evaluation llm-safety llm-safety-benchmark

Updated
Python

👁 qwed-verification

AISecOps (AI Security Operations) framework for deterministic verification of AI systems. QWED verifies LLM outputs using math, logic, and symbolic execution — creating an auditable trust boundary for agentic AI systems. Not generation. Verification.

python nlp machine-learning sympy smt-solver formal-verification ai-safety ai-security hallucination code-security ai-accuracy generative-ai enterprise-ai neurosymbolic-ai llm-safety hallucination-detection z3-prover deterministic-ai aisecops llm-verification

Updated
Python

dapurv5 / awesome-red-teaming-llms

Star

Papers from our SoK on Red-Teaming (Accepted at TMLR)

awesome awesome-list ai-safety adversarial-attacks red-teaming ai-security llm-security llm-safety

Updated

poloclub / llm-landscape

Star

NeurIPS'24 - LLM Safety Landscape

llm llm-safety safety-basin llm-safety-landscape llm-landscape

Updated
Python

declare-lab / resta

Star

Restore safety in fine-tuned language models through task arithmetic

alignment safety alignment-algorithm llm llms llm-safety llms-benchmarking llm-safety-benchmark

Updated
Python

yihedeng9 / DuoGuard

Star

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

llm llm-safety guardrail-models

Updated
Python

copyleftdev / ai-testing-prompts

Star

Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series for real-world applications.

ai-explainability llm-safety large-language-models-testing ai-testing-best-practices gpt-model-reliability ai-bias-testing ai-security-testing model-compliance ai-performance-optimization machine-learning-testing-frameworks

Updated

llm-editing / editing-attack

Star

Code and dataset for the paper: "Can Editing LLMs Inject Harm?"

llms knowledge-editing llm-safety

Updated
Python

Improve this page

Add a description, image, and links to the llm-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-safety topic, visit your repo's landing page and select "manage topics."

Learn more

URL: https://github.com/topics/llm-safety

⇱ llm-safety · GitHub Topics · GitHub

llm-safety

Here are 134 public repositories matching this topic...

NVIDIA-NeMo / Guardrails

confident-ai / deepteam

cvs-health / uqlm

wuyoscar / ISC-Bench

openguardrails / openguardrails

BlueFalconHD / apple_generative_model_safety_decrypted

CryptoAILab / JailbreakEval

PKU-YuanGroup / Hallucination-Attack

Libr-AI / OpenRedTeaming

HeadyZhang / agent-audit

EzgiKorkmaz / adversarial-reinforcement-learning

Buyun-Liang / SECA

Babelscape / ALERT

QWED-AI / qwed-verification

dapurv5 / awesome-red-teaming-llms

poloclub / llm-landscape

declare-lab / resta

yihedeng9 / DuoGuard

copyleftdev / ai-testing-prompts

llm-editing / editing-attack

Improve this page

Add this topic to your repo