VOOZH

URL: https://github.com/topics/llm-steering

⇱ llm-steering · GitHub Topics · GitHub

#

llm-steering

Here are 5 public repositories matching this topic...

stanfordnlp / axbench

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

intervention interpretability large-language-models mechanistic-interpretability llm-steering

Updated
Python

IBM / activation-steering

[ICLR 2025] General-purpose activation steering library

alignment steering refusal representation-engineering activation-steering llm-steering

Updated
Python

codelion / pts

Pivotal Token Search

tokens dataset-generation sae sparse-autoencoder dpo llm reasoning-agent llm-inference direct-preference-optimization llm-steering steering-vector phi-4 reasoning-language-models phi4 reasoning-models mech-interp phi4-mini phi-4-mini pivotal-token-search pivotal-tokens

Updated
Python

Pomilon-Intelligence-Lab / ALSI

Early baby steps towards a long-term vision regarding Mamba-2's state interpretability.

python machine-learning research deep-learning transformers pytorch mamba state-space-models ai-research latent-space alsi non-linear-dynamics mechanistic-interpretability semantic-control representation-engineering llm-steering mamba-2 latent-steering phi-projector inference-time-intervention

Updated
Python

Jason-Wang313 / RISER

A closed-loop control system for Large Language Models that steers internal activation states in real-time to prevent mode collapse and toxicity

reinforcement-learning pytorch control-theory ai-safety riser mechanistic-interpretability llm-steering activation-engineering

Updated
Python

Improve this page

Add a description, image, and links to the llm-steering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-steering topic, visit your repo's landing page and select "manage topics."

You can’t perform that action at this time.