This repo contains pre-generated control vectors in GGUF format for use with llama.cpp:
- IMPORTANT: These new control vectors must use their respective de-bias control vector(s).
- The code used to generate these can now be found at github.com/jukofyork/control-vectors.
- All were generated with
'--num_prompt_samples'set to the model's hidden state dimension.
Control vectors allow fine-tuned control over LLMs, enabling more precise/targeted text generation.
Table of Contents
Applying Control Vectors
To "de-bias" the model only:
Use the '--control-vector' option as follows:
llama-cli --model <model>.gguf [other CLI arguments] \
--control-vector mistral-large:123b-language__debias.gguf
Alternatively for server mode:
llama-server --model <model>.gguf [other CLI arguments] \
--control-vector mistral-large:123b-language__debias.gguf
This will apply the "language" de-bias control vector to the Mistral-Large-Instruct-2407 model.
You can apply multiple de-bias control vectors simultaneously like so:
llama-cli --model <model>.gguf [other CLI arguments] \
--control-vector mistral-large:123b-language__debias.gguf \
--control-vector mistral-large:123b-storytelling__debias.gguf \
--control-vector mistral-large:123b-character_focus__debias.gguf
This will apply all 3 of the "writing style" de-bias control vectors.
To fully apply a positive or negative axis control vector with the default scale-factor:
Use the '--control-vector' option as follows:
llama-cli --model <model>.gguf [other CLI arguments] \
--control-vector mistral-large:123b-language__debias.gguf \
--control-vector mistral-large:123b-language__ornate.gguf
This will fully apply (ie: with a scale-factor of 1.0) the (positive-axis) "ornate language" control vector.
IMPORTANT: The positive and negative axis control vectors must be used along with the relevant de-bias control vector - they cannot be used on their own!
You can fully apply multiple positive or negative axis control vectors like so:
llama-cli --model <model>.gguf [other CLI arguments] \
--control-vector mistral-large:123b-language__debias.gguf \
--control-vector mistral-large:123b-language__ornate.gguf \
--control-vector mistral-large:123b-storytelling__debias.gguf \
--control-vector mistral-large:123b-storytelling__descriptive.gguf \
--control-vector mistral-large:123b-character_focus__debias.gguf \
--control-vector mistral-large:123b-character_focus__dialogue.gguf
This will fully apply (ie: with a scale-factor of 1.0) all 3 of the (positive-axis) "writing style" control vectors.
NOTE: Fully applying too many positive or negative axis control vector simultaneously may damage the model's output.
To partially apply a positive or negative axis control vector using a custom scale-factor:
llama-cli --model <model>.gguf [other CLI arguments] \
--control-vector mistral-large:123b-language__debias.gguf \
--control-vector-scaled mistral-large:123b-language__ornate.gguf 0.5
This will partially apply the (positive-axis) "ornate language" control vector with a scale-factor of 0.5 (ie: half the full effect).
IMPORTANT: The positive and negative axis control vectors must be used along with the relevant de-bias control vector - they cannot be used on their own!
You can partially apply multiple positive or negative axis control vectors like so:
llama-cli --model <model>.gguf [other CLI arguments] \
--control-vector mistral-large:123b-language__debias.gguf \
--control-vector-scaled mistral-large:123b-language__ornate.gguf 0.5 \
--control-vector mistral-large:123b-storytelling__debias.gguf \
--control-vector-scaled mistral-large:123b-storytelling__descriptive.gguf 0.3 \
--control-vector mistral-large:123b-character_focus__debias.gguf \
--control-vector-scaled mistral-large:123b-character_focus__dialogue.gguf 0.2
This will partially apply all 3 of the (positive-axis) "writing style" control vectors with varying weights.
The theoretical upper bound value for equal weights is between 1/n and sqrt(1/n) depending on how correlated the n control vector directions are, eg:
- For
n = 1use the default scale-factor of1.0for comparison with the values below. - For
n = 2is between1/2 ≈ 0.5andsqrt(1/2) ≈ 0.707. - For
n = 3is between1/3 ≈ 0.333andsqrt(1/3) ≈ 0.577. - For
n = 4is between1/4 ≈ 0.25andsqrt(1/4) ≈ 0.5. - For
n = 5is between1/5 ≈ 0.2andsqrt(1/5) ≈ 0.447.
and so on.
The way the positive and negative axis control vectors are calibrated means you can negate the scale-factors too, eg:
llama-cli --model <model>.gguf [other CLI arguments] \
--control-vector mistral-large:123b-language__debias.gguf \
--control-vector-scaled mistral-large:123b-language__ornate.gguf -0.5
is equivalent to:
llama-cli --model <model>.gguf [other CLI arguments] \
--control-vector mistral-large:123b-language__debias.gguf \
--control-vector-scaled mistral-large:123b-language__simple.gguf 0.5
NOTE: It is possible to use scale-factors greater than 1.0, but if too large it will eventually damage the model's output.
Important Notes
- Always include the relevant "de-bias" control vector as well as the positive-axis/negative-axis control vector - they cannot be used on their own!
- Do not mix both sides of a positive/negative axis at the same time (eg:
'--control-vector language__simple.gguf'and'--control-vector language__ornate.gguf'will just cancel out and have no effect...). - Ensure your
llama.cppversion is up to date (multi-vector support added 27/06/24 in #8137).
Command Line Generator
Courtesy of gghfez, a utility to easily generate command line options for llama.cpp:
You can run this tool directly on GitHub Pages.
Direct Links
Very Large Models
- c4ai-command-r-plus
- c4ai-command-r-plus-08-2024
- Eurux-8x22b-nca
- Lumimaid-v0.2-123B
- magnum-v2-123b
- Mistral-Large-Instruct-2407
- Mixtral-8x22B-Instruct-v0.1
- Qwen1.5-110B-Chat
- WizardLM-2-8x22B
Large Models
- Athene-70B
- aurelian-alpha0.1-70b-rope8-32K-fp16
- aurelian-v0.5-70b-rope8-32K-fp16
- daybreak-miqu-1-70b-v1.0-hf
- deepseek-llm-67b-chat
- dolphin-2.9.2-qwen2-72b
- Hermes-3-Llama-3.1-70B
- L3-70B-Euryale-v2.1
- L3.1-70B-Euryale-v2.2
- Llama-3-70B-Instruct-Storywriter
- Llama-3-Lumimaid-70B-v0.1
- Llama-3.1-70B-ArliAI-RPMax-v1.1
- Lumimaid-v0.2-70B
- magnum-72b-v1
- magnum-v2-72b
- Meta-Llama-3-70B-Instruct
- Meta-Llama-3.1-70B-Instruct
- miqu-1-70b
- Qwen1.5-72B-Chat
- Qwen2-72B-Instruct
- Qwen2.5-72B-Instruct
- turbcat-instruct-72b
Medium Models
- 35b-beta-long
- aya-23-35B
- c4ai-command-r-v01
- c4ai-command-r-08-2024 (***READ THIS FIRST***)
- Divergence-33B
- gemma-2-27b-it
- gemma-2-27b-it-SimPO-37K
- gemma2-gutenberg-27B
- internlm2_5-20b-chat
- magnum-v1-32b
- magnum-v2-32b
- magnum-v3-27b-kto
- magnum-v3-34b
- Mistral-Small-Instruct-2409
- Mixtral-8x7B-Instruct-v0.1
- Nous-Capybara-34B
- Qwen1.5-32B-Chat
- Qwen2.5-32B-Instruct
- Yi-34B-Chat
- Yi-1.5-34B-Chat
- Yi-1.5-34B-Chat-16K
Small Models
- aya-23-8B
- gemma-2-9b-it
- gemma-2-9b-it-SimPO
- Gemma-2-9B-It-SPPO-Iter3
- gemma-2-Ifable-9B
- Llama-3-Instruct-8B-SPPO-Iter3
- Llama-3.1-8B-ArliAI-RPMax-v1.1
- Meta-Llama-3-8B-Instruct
- Meta-Llama-3.1-8B-Instruct
- Mistral-7B-Instruct-v0.2
- Mistral-7B-Instruct-v0.3
- Mistral7B-PairRM-SPPO-Iter3
- Mistral-Nemo-12B-ArliAI-RPMax-v1.1
- mistral-nemo-gutenberg-12B
- mistral-nemo-gutenberg-12B-v2
- Mistral-Nemo-Instruct-2407
- romulus-mistral-nemo-12b-simpo
- Qwen1.5-14B-Chat
- Qwen2-7B-Instruct
- Qwen2.5-7B-Instruct
- Qwen2.5-14B-Instruct
- WizardLM-2-7B
Algorithm Details
1. First we create a set of pre/post "prompt stems":
The Cartesian product of these gives us 2500 (ie: 50 x 50) different "You are an author" type sentences.
2. Then we create several different creative-writing axis "continuations":
A set of 3 different "writing style" axis:
The 4 elements of the Dark Tetrad:
An "Optimism vs Nihilism" axis to compliment the Dark Tetrad axis:
3. Then we collect a large number of creative-writing prompts:
- I used Sao10K/Short-Storygen-v2 and a couple of other sources to get 11835 creative-writing prompts in total (see the
'writing_prompts.txt'file). - The jq command is very useful for extracting the prompts only from these datasets.
4. Run the model on a random sample of (prompt-stem, continuation, creative-writing prompts) combinations:
The Cartesian product of: 2500 prompt-stem sentences x 10 continuation sentences x 11835 story prompts ≈ 300M possible combinations.
- It is important that the same prompt-stem sample sentence be used with each (
"baseline","negative","positive") triplet. - It is also important that the same (prompt-stem, continuation) sample sentence be used with the
"negative"and"positive"members of the same triplet. - The suggested value of
"hidden_size"for the--num_prompt_samplesoption is because the theory regarding estimation of covariance matrices shows we need at the very least a minimum of one sample per feature (this may be overkill due to us only retaining the top Eigenvectors though...).
5. Create a pair of "differenced datasets" by subtracting the corresponding "baseline" class's sample from both of the other 2 classes' samples:
- The reason for this is so that we "centre" the data around the "baseline" (i.e., set the "baseline" as the origin and look for vector directions that point away from it).
- This is in contrast to assuming the difference of the means is the "centre" for a 2-class version of this using PCA on the covariance matrix of the differences (i.e., the "standard" method of creating control vectors).
6. Now we take our two "differenced datasets" held in data matrices A and B (with rows as samples and columns as features):
- Create the cross-covariance matrix,
C = A^T * B. - Next we symmetrise,
C' = (C^T + C) / 2. - Perform an eigendecomposition,
C' = Q * Λ * Q^(-1). - Since we symmetrised the matrix, the eigenvectors (
Q) and eigenvalues (Λ) will all be real-valued. - Arrange the eigenvectors in descending order based on their corresponding eigenvalues.
- Once the eigenvectors are sorted, discard the eigenvalues as they won't be needed again.
The reason for using the cross-covariance matrix instead of the covariance matrix:
- The covariance matrix of a differenced dataset exemplifies directions in A or B (ie: think about the expansion of
(a-b)² = a² + b² -2×a×b). - The cross-covariance matrix of a differenced dataset exemplifies directions in A and B (ie: akin to
a×b, with noa²orb²terms).
The reason for creating the symmetrised matrix is two-fold:
- To avoid complex-valued eigenvectors that tell us about rotations (which we can't actually make use of here anyway).
- To specifically try to find opposing/balanced "axis" for our different traits (i.e., we don't want to find positively correlated directions nor unbalanced directions).
7. So now we have a set of "directions" to examine:
- It turns out that 90% of the time the principal eigenvector (i.e., the eigenvector with the largest corresponding eigenvalue) is the one you want.
- In the ~10% of cases where it is not the principal eigenvector or split between a couple of different eigenvectors, we (greedily) create a "compound direction" by examining the discriminant ratio of each direction.
8. Finally, we project the "direction" to reorient and scale as necessary:
- There is no reason the eigenvectors point in the direction we want, so 50% of the time we have to flip all the signs by projecting our (differenced) "desired" dataset on to the (unit norm) direction and then test the sign of the mean.
- Due to the way the LLMs work via the "residual stream", the hidden states tend to get larger and larger as the layers progress, so to normalize this we also scale by the magnitude of the mean of the same projection as above.
- To better separate the "bias" effect from the positive/negative axis (and to make the positive/negative end equidistant from the model's "baseline" behaviour) we store the mid point of these means in the de-bias control vector and then subtract the midpoint from both the positive and negative axis' control vectors.
NOTES:
- I have found the above can be applied to every layer, but often the last layer will have hidden state means that are 10-100x larger than the rest, so I have excluded these from all I have uploaded here.
- I have tried many other different eigendecompositions: PCA on the 2-class differenced datasets, PCA on the joined 2-class/3-class datasets, solving generalized eigensystems similar to CCA, and so on.
- The "balanced" directions / "axis" this method finds are the exact opposite of those needed for the Refusal in LLMs is mediated by a single direction paper.
Changelog
- 28/08/24 - Added Qwen2-72B-Instruct.
- 29/08/24 - Added Qwen1.5-72B-Chat, Mistral-7B-Instruct-v0.2, Mistral-7B-Instruct-v0.3, miqu-1-70b, Mixtral-8x7B-Instruct-v0.1 and Yi-1.5-34B-Chat-16K.
- 30/08/24 - Added Meta-Llama-3-8B-Instruct, Meta-Llama-3-70B-Instruct, Meta-Llama-3.1-8B-Instruct and Meta-Llama-3.1-70B-Instruct.
- 31/08/24 - Added aya-23-35B, Gemma-2-9B-It-SPPO-Iter3 and Qwen1.5-14B-Chat.
- 01/09/24 - Added Mixtral-8x22B-Instruct-v0.1 and Qwen1.5-110B-Chat.
- 02/09/24 - Added c4ai-command-r-plus-08-2024.
- 03/09/24 - Added c4ai-command-r-08-2024 (***READ THIS FIRST***), Yi-1.5-34B-Chat, gemma-2-27b-it-SimPO-37K, aya-23-8B, gemma-2-9b-it-SimPO, Qwen2-7B-Instruct and Yi-34B-Chat.
- 04/09/24 - Added deepseek-llm-67b-chat, internlm2_5-20b-chat, Athene-70B, Llama-3-Instruct-8B-SPPO-Iter3, magnum-v2-32b, Mistral7B-PairRM-SPPO-Iter3 and Nous-Capybara-34B.
- 05/09/24 - Added Llama-3-70B-Instruct-Storywriter, 35b-beta-long and magnum-v3-34b.
- 06/09/24 - Added Hermes-3-Llama-3.1-70B, magnum-v2-72b, magnum-v1-32b and L3.1-70B-Euryale-v2.2.
- 08/09/24 - Added aurelian-v0.5-70b-rope8-32K-fp16, aurelian-alpha0.1-70b-rope8-32K-fp16, L3-70B-Euryale-v2.1, Llama-3-Lumimaid-70B-v0.1, magnum-72b-v1 and turbcat-instruct-72b.
- 09/09/24 - Added daybreak-miqu-1-70b-v1.0-hf, dolphin-2.9.2-qwen2-72b and Lumimaid-v0.2-70B.
- 11/09/24 - Added Lumimaid-v0.2-123B.
- 12/09/24 - Added magnum-v2-123b.
- 13/09/24 - Added Eurux-8x22b-nca.
- 14/09/24 - Added Divergence-33B, gemma2-gutenberg-27B, gemma-2-Ifable-9B, mistral-nemo-gutenberg-12B, mistral-nemo-gutenberg-12B-v2, romulus-mistral-nemo-12b-simpo, Llama-3.1-8B-ArliAI-RPMax-v1.1, Mistral-Nemo-12B-ArliAI-RPMax-v1.1 and Llama-3.1-70B-ArliAI-RPMax-v1.1.
- 20/09/24 - Added Qwen2.5-7B-Instruct, Qwen2.5-14B-Instruct, Qwen2.5-32B-Instruct, Qwen2.5-72B-Instruct, magnum-v3-27b-kto and Mistral-Small-Instruct-2409.
- Downloads last month
- 32,365
We're not able to determine the quantization variants.
