Gemma 4 E4B — TIES merge of three instruct-subspace-masked adapters
This is a TIES merge of three independently-trained LoRA adapters for google/gemma-4-E4B-it, all of which were trained with an instruct-subspace gradient mask designed to prevent the fine-tune from overwriting the directions in weight space that carry instruction-following behavior.
The point of the merge is to combine multiple style/domain signals (conversational marvin SFT + marvin CPT + spring-dragon adventure CPT) into a single adapter-on-base that retains instruct-following performance.
Components
All three adapters were trained on google/gemma-4-E4B-it at LoRA r=64, α=16 with the instruct-subspace mask at rank 64 (mask rank = LoRA rank).
| Slot | Adapter | Source corpus | Samples | Tokens (1 ep) | Recipe |
|---|---|---|---|---|---|
| 1 | T2 INSTRUCT r=64 | marvin (instruct form) — back-translated instruct pairs over the marvin corpus: a synthetic "write a passage about X" instruction generated for each real marvin passage, then paired as instruction → passage | 4439 | 22.27M | 1k ctx, 2 ep, lr 5e-5 |
| 2 | T2 CPT-marvin r=64 | marvin (raw) — a curated collection of genre-fiction novel prose, used as continued-pretraining (no chat template) | 5701 | 33.83M | 1k ctx, 2 ep, lr 5e-5 |
| 3 | SD CPT r=64 | Spring Dragon — old AI-Dungeon-era text-adventure transcripts (> you do x second-person interactive fiction), as CPT |
2325 | 5.68M | 6k ctx, 2 ep, lr 5e-5 |
Total tokens seen across all three components (counting 2 epochs each): ~123M.
Merge method
Standard TIES merge via PEFT's add_weighted_adapter with:
combination_type = "ties"
density = 0.2 # keep top 20% magnitude per parameter, zero the rest
weights = [1.0, 1.0, 1.0] # equal weighting
TIES (Yadav et al., 2023) reduces destructive interference between adapters by:
- Trimming — keep only the top-
density% magnitude weights per parameter, zero the rest - Sign resolution — for each parameter, pick the dominant sign across adapters and zero contributions of the opposite sign
- Disjoint-mean merge — average only the surviving weights
The merged adapter was applied to base with merge_and_unload() and the result is the full safetensors model published here.
Instruct-subspace mask (what the components share)
Each component adapter was trained with a per-step gradient projection that strips out the directions of weight movement that overlap with the instruct-vs-base subspace. The subspace is computed once per linear weight matrix W via SVD:
U_k, V_k = top-k SVD of (W_instruct − W_pretrained)
g_safe = g − U_k (U_k.ᵀ g) − (g V_k.ᵀ) V_k # for each parameter gradient g
This is applied as a post_accumulate_grad_hook on every LoRA target weight before the optimizer step. The intuition: directions that distinguish E4B-it from E4B-pt carry instruction-following behavior; gradient components along those directions are dropped, so the fine-tune is constrained to learn style in the orthogonal complement of the instruct subspace.
Subspace-mask IFEval recovery (component-level results)
Across all three components, the mask preserves a large fraction of base-it IFEval performance that the unmasked recipe would have lost:
| Recipe | Unmasked drop (pp) | Masked drop (pp) | Recovery |
|---|---|---|---|
| T2 INSTRUCT r=64 | -14.97 | -7.76 | 48% |
| T2 CPT-marvin r=64 | -14.42 | -8.50 | 41% |
| SD CPT r=64 | -12.20 | -6.84 | 44% |
Base google/gemma-4-E4B-it strict-prompt IFEval = 83.36.
Merge-level evaluation
| Variant | IFEval Δpp ↓ | V3 humanness h_delta ↑* | Δh_delta vs base ↑ | q_prob ↑ | dialogue_frac ↑ | slop/1k ↓ | rep3g/1k ↓ |
|---|---|---|---|---|---|---|---|
| base-it | 0.00 (anchor) | -16.74 | 0.00 (anchor) | 0.098 | 0.42 | 19.94 | 7.39 |
| SD masked r=64 | -6.84 | -14.69 | +2.05 | 0.121 | 0.42 | 17.77 | 12.17 |
| SD masked r=256 | -2.03 | -15.46 | +1.28 | 0.098 | 0.31 | 16.38 | 11.18 |
| T2 INSTRUCT masked r=64 | -7.76 | -12.83 | +3.91 | 0.112 | 0.41 | 13.96 | 21.05 |
| TIES (this model) | -0.92 | -13.10 | +3.64 | 0.343 | 0.51 | 16.12 | 11.58 |
Metric definitions
These come from the GRPO-prototype eval pipeline (grpo_prototype_v12d.py), run over 10 short-story writing prompts at temp=1, then averaged per variant:
- h_delta — output of the V3 humanness classifier (logit of marvin-real minus logit of gemma-emulation, per sample, then mean). Higher = output looks more like a real book passage from marvin than like Gemma 4 31B-it's same-prompt attempt. Always negative for any gemma-family model because the classifier is gemma-aware; the direction of movement vs base is the signal.
- Δh_delta vs base — the movement in h_delta vs base
E4B-it. Positive = the fine-tune is shifting outputs further from gemma's default and toward marvin-like prose. - q_prob — P(chosen) from a separate binary quality classifier (a ModernBERT-base fine-tuned on ~1500 DPO pairs from the marvin training pipeline, where "chosen" was the preferred response and "rejected" the worse one). Higher = the output is more like the kind of response a marvin-pipeline reward model would prefer over an alternative. Note this is independent of h_delta — h_delta asks "is this a real marvin passage or gemma's emulation?" while q_prob asks "would the marvin reward model prefer this over a typical bad output?" The two can move independently.
- dialogue_frac — fraction of generated lines that contain dialogue (heuristic: presence of paired quote characters with a speaker-attribution shape nearby). Marvin/genre-fiction prose has a stable rate; gemma's default is to skip dialogue, so a rising fraction means the model is producing more novel-like scene structure rather than essay-summary structure.
- slop/1k — count per 1000 tokens of n-grams from a curated AI-slop n-gram list (the "barely a whisper", "in a voice that was barely above a whisper", "the air was thick with", etc., compiled from observed LLM tics). Lower = less AI-ese.
- rep3g/1k — count per 1000 tokens of repeated 3-grams within the sample (each 3-gram that occurs ≥2× contributes its repeat count). Lower = less local repetition. Gemma-family models can drift into repetitive loops on long generations; this catches that.
* The V3 humanness classifier is trained to discriminate real marvin passages (the curated genre-fiction corpus, see Components) from gemma's best attempt to emulate those passages given the same writing prompt. The "gemma side" was specifically generated with Gemma 4 31B-it (Q4_K_M GGUF, temp=1, no top-p/k/rep-pen). This model (E4B) is a different size than the classifier's reference gemma, but E4B is gemma-family and behaves similarly enough that V3 is still measuring the right axis — a prior version of the classifier built against a non-gemma AI side gave weirdly uninformative scores on gemma fine-tunes; V3 fixes that. So h_delta reads as: "is this output more like an actual book passage from the marvin corpus, or more like gemma's slop version of one?" Higher = more book-like. The single-adapter components score around -12 to -15. The TIES merge scoring -13.10 is in the same range as the singles, which means it is not just averaging back toward base; it is preserving the style signal of the components.
Most striking single number: TIES q_prob = 0.343 vs ~0.10 for any single adapter and base — the V3 classifier's "marvin-style" head fires ~3× more strongly on TIES outputs than on any constituent. Combined with dialogue_frac jumping to 0.51 (vs base 0.42, singles 0.41-0.44) this suggests genuine emergent style mixing rather than cancellation.
Caveats
- V3 humanness classifier is task-specific, NOT a general AI-vs-human detector. It is binary: real marvin-corpus book prose vs. gemma's same-prompt output. So a high
h_deltaonly tells you the model's output looks more like a real book passage than gemma's default style — it does not generalize to "more human." Direction-of-movement from base is the load-bearing signal. - IFEval was run on the merged adapter applied to base-it; the -0.92pp drop is the merged-model drop, not the sum of component drops.
- No SFT data leak control: each component adapter had access to its own training corpus. There is no held-out evaluation set for component-level style metrics beyond the standard IFEval prompts.
- TIES density=0.2 was a single shot, not tuned. Lower/higher density values were not swept.
- The three components were chosen because they were what existed at the time (the same r=64 mask rank); this is not a designed-for-merge selection.
Files
| Path | Size | Description |
|---|---|---|
model.safetensors |
15.9 GB | Full bf16 merged weights (base + applied TIES adapter) |
config.json |
— | HF config |
tokenizer.json, tokenizer_config.json, chat_template.jinja |
— | Gemma 4 tokenizer + chat template |
A bf16.gguf is also published at ToastyPigeon/Gemma4-Test-GGUFs/Gemma4-E4B-TIES-3mask-r64.bf16.gguf for llama.cpp use.
Use
Standard AutoModelForCausalLM.from_pretrained(...) works; this is a normal merged Gemma 4 E4B-it model. Use the same chat template as base google/gemma-4-E4B-it. Sampling: temperature=1.0, no top-p / top-k / rep-penalty (gemma-4 defaults).
Provenance
- Trained and merged on a Threadripper 3960X + 2× RTX 3090 box, May 2026
- Mask SVD computed once from
gemma-4-E4B-it − gemma-4-E4B-ptat rank 64/256 - Author: ToastyPigeon
Citation
If you use this technique:
@misc{toastypigeon2026instructmask,
author = {ToastyPigeon},
title = {Instruct-subspace gradient mask for style fine-tunes of instruction-tuned models},
year = {2026},
note = {Method: per-parameter gradient projection orthogonal to the top-k SVD of (W_instruct - W_pretrained)}
}
TIES reference:
@article{yadav2023ties,
title = {TIES-Merging: Resolving Interference When Merging Models},
author = {Yadav, Prateek and Tam, Derek and Choshen, Leshem and Raffel, Colin and Bansal, Mohit},
journal = {NeurIPS},
year = {2023}
}
- Downloads last month
- 8
