ALLaM-7B-Instruct — Curated 2M

LoRA adapter for ALLaM-7B-Instruct-preview, fine-tuned on the human-curated Arabic instruction dataset CIDAR under a fixed budget of 2M training tokens. One of six adapters from a controlled study comparing human-curated versus synthetic Arabic instruction data under matched token budgets.

Model Details

Base model: humain-ai/ALLaM-7B-Instruct-preview
Adapter type: LoRA (QLoRA, 4-bit NF4)
Training data: CIDAR (human-curated)
Token budget: 2M tokens
Language: Arabic

Training Configuration

Setting	Value
Quantization	4-bit NF4 (QLoRA)
LoRA rank / alpha	16 / 32
LoRA dropout	0.05
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Optimizer	Paged AdamW (8-bit)
Learning rate	2e-4
LR scheduler	cosine, 100 warmup steps
Epochs	3
Effective batch size	16 (2 × 8 grad. accum.)
Max sequence length	512
Precision	fp16
Seed	42
Hardware	NVIDIA A100 (40GB)

Evaluation

Evaluated with lm-evaluation-harness on seven Arabic benchmarks (ACVA 5-shot; others zero-shot). Accuracy:

Benchmark	Score
Arab Culture	0.366
AlGhafa	0.593
AraDiCE	0.587
ACVA	0.776
Arabic Exams	0.501
ArabicMMLU	0.640
OpenAI MMLU (Ar)	0.435

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "humain-ai/ALLaM-7B-Instruct-preview"
tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base, trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(model, "ManarAlrabie/arabic-llm-curated-2m")

Intended Use & Limitations

Research artifact for studying instruction-data quality vs. quantity in Arabic LLM fine-tuning. Not intended for production. As a single-seed fine-tune of a 7B model, outputs may contain inaccuracies.

Citation

Associated paper is under review; citation will be added upon publication. Until then, please link to this repository.

Downloads last month: 11

Model tree for ManarAlrabie/arabic-llm-curated-2m

Base model

humain-ai/ALLaM-7B-Instruct-preview

Adapter

(18)

this model

URL: https://huggingface.co/ManarAlrabie/arabic-llm-curated-2m

⇱ ManarAlrabie/arabic-llm-curated-2m · Hugging Face