VOOZH about

URL: https://huggingface.co/nbeerbower/Gemma4-Gutenberg-31B-Heretic

⇱ nbeerbower/Gemma4-Gutenberg-31B-Heretic · Hugging Face


Gemma4-Gutenberg-31B-Heretic

coder3101/gemma-4-31B-it-heretic finetuned for literary, novelistic prose — a Gemma 4 entry in the Gutenberg series.

This pushes the (already strong) base toward a literary-fiction register: story and interiority over static description, controlled pacing over relentless adjective-stacking, and an active dispreference for "AI slop" phrasing.

Heretic variant

This is the Heretic (abliterated / decensored) base with the same Gutenberg ORPO adapter merged in. The literary voice is identical to the standard Gemma4-Gutenberg-31B — the adapter dominates the prose style regardless of base — but the abliterated base is far less prone to refuse dark or mature creative prompts. Intended for fiction that the standard instruct model balks at. Use responsibly.

Method

ORPO (Odds Ratio Preference Optimization) on the full schneewolflabs/Athanorlite-DPO (14,816 preference pairs) — a superset of the Gutenberg "Encore" recipe that bundles jondurbin/gutenberg-dpo-v0.1, nbeerbower/gutenberg2-dpo, gutenberg-moderne-dpo, human-writing-dpo, synthetic-fiction-dpo, Arkhaios-DPO, Purpura-DPO, Schule-DPO, sam-paech/gutenberg3, plus truthy / physical-reasoning / theory-of-mind balance sets.

Method ORPO, β = 0.1
Adapter LoRA r=64 (text decoder only), merged to full
LR 5e-5, cosine, 0.05 warmup
Epochs 1
Effective batch 32
Max length 2048
Optimizer paged_adamw_8bit, bf16, grad-checkpointing
Hardware 1× NVIDIA GB10 (DGX Spark, 128 GB unified)
Trainer Merlina (grimoire ORPO)

Training trajectory (clean convergence over ~3.5 days, 454 steps):

start end
eval/loss 2.4445 2.2052
reward_accuracy 0.175 0.9125
reward_margin −0.076 +0.455

The reward_accuracy arc (0.18 → 0.50 → 0.91) reflects the model learning to prefer the literary chosen text while actively suppressing the rejected slop — the intended Gutenberg dynamic.

Notes

  • Gemma 4 31B is a unified multimodal model; the vision/audio towers are left frozen and intact, so this remains a drop-in replacement for the base. Only the text decoder was tuned.
  • This is a refinement of an already-capable writer, not a rescue — expect a consistent literary lean rather than a night-and-day transformation.

License

Apache-2.0 (matching the Gemma 4 base). Constituent training datasets carry their own licenses (see the Athanorlite-DPO card).

Downloads last month
7
Safetensors
Model size
31B params
Tensor type
BF16
·

Model tree for nbeerbower/Gemma4-Gutenberg-31B-Heretic

Finetuned
(2)
this model
Quantizations
5 models

Datasets used to train nbeerbower/Gemma4-Gutenberg-31B-Heretic