ALLaM-7B-Instruct — Curated 500K

LoRA adapter for ALLaM-7B-Instruct-preview, fine-tuned on the human-curated Arabic instruction dataset CIDAR under a fixed budget of 500K training tokens. One of six adapters from a controlled study comparing human-curated versus synthetic Arabic instruction data under matched token budgets.

Model Details

Base model: humain-ai/ALLaM-7B-Instruct-preview
Adapter type: LoRA (QLoRA, 4-bit NF4)
Training data: CIDAR (human-curated)
Token budget: 500K tokens
Language: Arabic

Training Configuration

Setting	Value
Quantization	4-bit NF4 (QLoRA)
LoRA rank / alpha	16 / 32
LoRA dropout	0.05
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Optimizer	Paged AdamW (8-bit)
Learning rate	2e-4
LR scheduler	cosine, 100 warmup steps
Epochs	3
Effective batch size	16 (2 × 8 grad. accum.)
Max sequence length	512
Precision	fp16
Seed	42
Hardware	NVIDIA A100 (40GB)

Evaluation

Evaluated with lm-evaluation-harness on seven Arabic benchmarks (ACVA 5-shot; others zero-shot). Accuracy:

Benchmark	Score
Arab Culture	0.361
AlGhafa	0.594
AraDiCE	0.590
ACVA	0.770
Arabic Exams	0.508
ArabicMMLU	0.645
OpenAI MMLU (Ar)	0.412

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "humain-ai/ALLaM-7B-Instruct-preview"
tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base, trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(model, "ManarAlrabie/arabic-llm-curated-500k")

Intended Use & Limitations

Research artifact for studying instruction-data quality vs. quantity in Arabic LLM fine-tuning. Not intended for production. As a single-seed fine-tune of a 7B model, outputs may contain inaccuracies.

Citation

Associated paper is under review; citation will be added upon publication. Until then, please link to this repository.

Downloads last month: 30

Model tree for ManarAlrabie/arabic-llm-curated-500k

Base model

humain-ai/ALLaM-7B-Instruct-preview

Adapter

(18)

this model

URL: https://huggingface.co/ManarAlrabie/arabic-llm-curated-500k

⇱ ManarAlrabie/arabic-llm-curated-500k · Hugging Face