Gemma-4-E2B-IT-SFT-RLVR-Medical

Gemma-4-E2B-it fine-tuned on PubMedQA using SFT and RLVR.
Also check out the training code on GitHub.

Setup

# !pip install llama-cpp-python
from llama_cpp import Llama

llm = Llama.from_pretrained(
 repo_id="lukasdrews/Gemma-4-E2B-IT-SFT-RLVR-Medical-GGUF",
 filename="gemma-4-E2B-it-sft-rlvr-medical-Q4_K_M.gguf",
 verbose=False,
)
messages = [
 {
 "role": "user",
 "content": [
 {"type": "text", "text": "Do GEC produce and bear factor H under complement attack?"}
 ]
 },
]

outputs = llm.create_chat_completion(messages, max_tokens=1024)
print(outputs["choices"][0]["message"]["content"])

Benchmarks

Model	Quantization	PubMedQA (In-Domain)	MedQA-USMLE (Zero-Shot Transfer)
Gemma-4-E2B-it (base model)	-	58.10 %	29.54 %
Gemma-4-E2B-it + SFT + RLVR	-	73.10 %	43.05 %
Gemma-4-E2B-it + SFT + RLVR	Q8_0	72.40 %	43.00 %
Gemma-4-E2B-it + SFT + RLVR	Q6_K	72.10 %	42.18 %
Gemma-4-E2B-it + SFT + RLVR	Q5_K_M	72.00 %	38.88 %
Gemma-4-E2B-it + SFT + RLVR	Q4_K_M	71.80 %	38.88 %

Downloads last month: 1,639

GGUF

Model size

5B params

Architecture

gemma4

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for lukasdrews/Gemma-4-E2B-IT-SFT-RLVR-Medical-GGUF

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Finetuned

lukasdrews/Gemma-4-E2B-IT-SFT-RLVR-Medical

Quantized

(1)

this model

Dataset used to train lukasdrews/Gemma-4-E2B-IT-SFT-RLVR-Medical-GGUF

Collection including lukasdrews/Gemma-4-E2B-IT-SFT-RLVR-Medical-GGUF

2 items • Updated Apr 24