gemma-4-12B-it-heretic_decensored

gemma-4-12B-it-heretic_decensored is a reasoning-capable language model built on top of google/gemma-4-12B-it and modified using the Heretic abliteration toolkit. The model applies refusal-direction analysis and targeted weight-space interventions to reduce internal refusal behaviors while preserving instruction-following, reasoning capabilities, and general conversational performance.

This model is intended strictly for research and learning purposes. Due to reduced internal refusal mechanisms, it may generate sensitive or unrestricted content. Users assume full responsibility for how the model is used. The authors and hosting platform disclaim any liability for generated outputs.

This model is experimental and may generate unexpected behaviors or artifacts in certain scenarios.

Key Highlights

Heretic-Based Abliteration: Modified using the Heretic toolkit to identify and alter refusal-related representations within the model.
Reduced Refusal Behavior: Optimized to minimize internal refusal tendencies while maintaining instruction-following capabilities.
Gemma 4 Backbone: Built directly on top of google/gemma-4-12B-it.
Reasoning-Oriented Performance: Preserves multi-step reasoning and analytical capabilities after abliteration.
Research-Focused Release: Designed for alignment research, model behavior analysis, and evaluation of refusal-direction modifications.
12B Scale Deployment: Suitable for local inference, research environments, and optimized deployment setups.

Abliteration Parameters

Parameter	Value
direction_index	29.56
attn.o_proj.max_weight	1.18
attn.o_proj.max_weight_position	39.94
attn.o_proj.min_weight	0.81
attn.o_proj.min_weight_distance	25.73
mlp.down_proj.max_weight	1.37
mlp.down_proj.max_weight_position	46.27
mlp.down_proj.min_weight	0.97
mlp.down_proj.min_weight_distance	21.63

Performance

Metric	This model	Original model (google/gemma-4-12B-it)
KL divergence	0.0366	0 (by definition)
Refusals	34/100	99/100

Quick Start with Transformers

pip install transformers
pip install accelerate

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
 "prithivMLmods/gemma-4-12B-it-heretic_decensored",
 torch_dtype="auto",
 device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
 "prithivMLmods/gemma-4-12B-it-heretic_decensored"
)

messages = [
 {
 "role": "user",
 "content": "Explain how a transformer model processes text."
 }
]

inputs = tokenizer.apply_chat_template(
 messages,
 tokenize=True,
 add_generation_prompt=True,
 return_tensors="pt"
).to(model.device)

outputs = model.generate(
 inputs,
 max_new_tokens=512
)

print(
 tokenizer.decode(
 outputs[0][inputs.shape[-1]:],
 skip_special_tokens=True
 )
)

GGUF Model Files

Resource	Link
`prithivMLmods/gemma-4-12B-it-heretic_decensored-GGUF`	https://huggingface.co/prithivMLmods/gemma-4-12B-it-heretic_decensored-GGUF
Quick Start with llama.cpp (Docker)	https://huggingface.co/prithivMLmods/gemma-4-12B-it-heretic_decensored-GGUF#quick-start-with-llamacpp-docker

Intended Use

Alignment Research: Studying refusal-direction analysis and behavior modification techniques.
Model Evaluation: Benchmarking reasoning, instruction-following, and safety-related behaviors.
Red Teaming: Analyzing model responses under reduced-refusal conditions.
Local Deployment: Running high-capacity Gemma 4 models in research and experimentation environments.
Abliteration Studies: Exploring the effects of targeted weight-space modifications on model behavior.

Limitations & Risks

Important Note: This model intentionally reduces built-in refusal mechanisms.

Sensitive Content Risk: May generate unrestricted, controversial, or unsafe outputs.
User Responsibility: Requires careful and ethical use.
Experimental Modifications: Behavior may differ significantly from the original model.
Alignment Trade-offs: Reduced refusal behavior may impact safety filtering and response constraints.
Potential Artifacts: Certain prompts may expose unexpected outputs resulting from the abliteration process.

Acknowledgements

Heretic: Fully automatic censorship removal framework for language models. This project was used to perform the refusal-direction analysis and ablation procedures that form the foundation of this model.
Model Trials & Evaluation: Experimental evaluations, refusal measurements, and optimization trials were conducted and documented at: https://huggingface.co/strangeropshf/demo-TERM-hf-job-01

Downloads last month: 14

Safetensors

Model size

12B params

Tensor type

BF16

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/gemma-4-12B-it-heretic_decensored

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it

Finetuned

(73)

this model

Quantizations

3 models

Collection including prithivMLmods/gemma-4-12B-it-heretic_decensored

Collection of Gemma 4 Abliterated Models • 4 items • Updated about 10 hours ago • 2

URL: https://huggingface.co/prithivMLmods/gemma-4-12B-it-heretic_decensored

⇱ prithivMLmods/gemma-4-12B-it-heretic_decensored · Hugging Face