VOOZH about

URL: https://huggingface.co/pszemraj/franken-gemma-4-dense-1b-untrained

⇱ pszemraj/franken-gemma-4-dense-1b-untrained · Hugging Face


franken-gemma-4-dense-1b: untrained

A frankenstein-init Gemma 4 (dense) image/text model with ~1b params:

  • assembled by weight-transplant from Gemma 3 1B (text backbone) and Gemma 4 E2B-IT (vision tower + tokenizer + processor).

  • Architecturally mirrors google/gemma-4-31B-it (hybrid attention head-dim, no MoE, no PLE, no shared KV) but smol

This is a trained model.. NOT!

  • It will not produce coherent text out of the box.

  • It is intended for testing fine-tuning frameworks/configurations (Axolotl, TRL, DeepSpeed, FSDP) at a 'pilot' scale

  • should train more easily than.. random weights though

Architecture

component value
hidden_size 1152
intermediate_size 6912
num_hidden_layers 18 (15 sliding + 3 full, pattern 5:1)
num_attention_heads 4
num_key_value_heads 1
head_dim (sliding) 256
head_dim (global) 512
sliding_window 1024
max_position_embeddings 32768
attention_k_eq_v True (global layers)
final_logit_softcapping 30.0
vocab_size 262148 (Gemma 4 tokenizer)

Vision tower: hidden=768, 16 layers, head_dim=64 (copied from Gemma 4 E2B-IT)

As parameter counts/modules:

=============================================================
Layer (type) Param # Trainable
=============================================================
 Gemma4TextScaledWordEmbedding 301,989,888 True
 ModuleList 490,237,440 True
 Gemma4RMSNorm 1,152 True
 Gemma4TextRotaryEmbedding -- False
 Gemma4TextModel 792,228,480 True
 Gemma4VisionPatchEmbedder 16,318,464 True
 Gemma4VisionEncoder 151,046,144 True
 Gemma4VisionPooler -- False
 Gemma4VisionModel 167,364,608 True
 Linear 884,736 True
 Gemma4RMSNorm -- False
 Gemma4MultimodalEmbedder 884,736 True
 Gemma4Model 960,477,824 True
 Linear 301,989,888 True
Gemma4ForConditionalGeneration 960,477,824 True
=============================================================
Total params: 960,477,824
Trainable params: 960,477,824
Non-trainable params: --
=============================================================

frankenstein component inventory

Component Source Method
Text embeddings gemma-3-1b-it Direct copy + 4 rows mean-resized for Gemma 4 special tokens
Text MLP weights gemma-3-1b-it Direct copy
Sliding-attention Q/K/V/O gemma-3-1b-it Direct copy
Global-attention Q/K gemma-3-1b-it Per-head tile (256 → 512)
Global-attention O gemma-3-1b-it Per-head split-halves (preserves O @ V = O_old @ V_old at init)
Global-attention V --- Dropped (attention_k_eq_v=True; V reuses K)
RMSNorm weights gemma-3-1b-it Convention-converted (1.0 + w)
q_norm / k_norm gemma-3-1b-it Rescaled by 1/√head_dim to compensate for Gemma 4's scaling=1.0
Vision tower gemma-4-e2b-it Direct copy
embed_vision projection --- Fresh init (shape mismatch 768→1536 vs 768→1152)
Tokenizer + processor gemma-4-e2b-it Wholesale

License

Gemma Terms of Use apply. This is a derivative of Gemma 3 1B and Gemma 4 E2B-IT weights. See https://ai.google.dev/gemma/terms

Downloads last month
12
Safetensors
Model size
1.0B params
Tensor type
BF16
·

Model tree for pszemraj/franken-gemma-4-dense-1b-untrained

Finetuned
(558)
this model
Finetunes
1 model